koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	32102c2064	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-07-07 14:15:39 +08:00
Howard Su	481f793acc	Fix opencl by wrap #if-else-endif with \n (#2086 )	2023-07-07 05:34:18 +02:00
Concedo	69add28324	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml	2023-07-04 18:51:42 +08:00
Govlzkoy	14a2cc71f6	[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088 )	2023-07-04 07:50:00 +08:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
Concedo	6d718525c4	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-23 23:56:31 +08:00
Concedo	f7b096374d	fixed string too long CI issue	2023-06-23 23:56:22 +08:00
Concedo	da668e685f	fixing address spaces	2023-06-20 22:46:11 +08:00
Concedo	cce6e67f44	fixing address spaces	2023-06-20 22:45:16 +08:00
Concedo	1f1735f5ad	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 21:39:35 +08:00
Concedo	6b75fc48b9	fixed global const struct types	2023-06-20 21:38:48 +08:00
Concedo	c5ae3f50a7	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 18:41:13 +08:00
Concedo	a6e8b0216d	remove old dot kernels and template	2023-06-20 18:37:48 +08:00
Concedo	93247a11cd	ported q2k and q5k speedups	2023-06-20 18:37:41 +08:00
Concedo	029bed6446	ported q3k speedup successfully	2023-06-20 18:37:26 +08:00
Concedo	d754915269	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 17:26:39 +08:00
0cc4m	8d816d19d1	Add q6_k fast matmul kernel	2023-06-20 08:41:35 +02:00
0cc4m	34a4917984	Use preprocessor for QK_K	2023-06-20 08:04:16 +02:00
0cc4m	069cbe530d	Fix q2_k fast kernel	2023-06-20 08:01:40 +02:00
Concedo	c94a438328	xx + ib0	2023-06-19 23:01:49 +08:00
Concedo	266d436746	Added broken new q4k quant	2023-06-19 22:41:35 +08:00
Concedo	278427d9a4	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md	2023-06-18 15:29:44 +08:00
Howard Su	3d59ec5935	ggml : fix warnings under MSVC (#1908 )	2023-06-17 18:46:15 +03:00
0cc4m	d411968e99	opencl : support k-quants (#1836 ) * Porting q2_k kernel to OpenCL * Set global and local sizes for kernel calls for dequantizing k-quants * Added q6_k kernel * Fix q4_k opencl struct order * Replace uchar with uint8_t * Finish dequant kernels * Added OpenCL DMMV kernels * Fix q2_k, improve code * Fix q3_k * Shorten switch statements * Improve code formatting --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-06-16 21:59:49 +03:00
Concedo	2b4a286e56	Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental	2023-06-14 11:34:53 +08:00
0cc4m	0e3cc8e6f7	Improve code formatting	2023-06-13 16:10:25 +02:00
0cc4m	f1ac03ed37	Shorten switch statements	2023-06-13 15:21:44 +02:00
0cc4m	2a972f3649	Fix q3_k	2023-06-13 12:05:35 +02:00
0cc4m	fc8c823f34	Fix q2_k, improve code	2023-06-13 12:05:35 +02:00
Concedo	6e20827f93	Added OpenCL DMMV kernels	2023-06-13 12:05:35 +02:00
Concedo	f558e4c297	Finish dequant kernels	2023-06-13 12:05:35 +02:00
Concedo	56151bb875	Replace uchar with uint8_t	2023-06-13 12:05:35 +02:00
0cc4m	a4ee2b89d2	Fix q4_k opencl struct order	2023-06-13 12:05:35 +02:00
Concedo	1506affd0a	Added q6_k kernel	2023-06-13 12:05:35 +02:00
0cc4m	44422fd567	Set global and local sizes for kernel calls for dequantizing k-quants	2023-06-13 12:05:35 +02:00
Concedo	9b41865312	Porting q2_k kernel to OpenCL	2023-06-13 12:05:23 +02:00
Concedo	9830871d0f	pulled all Occam's fixes and the kquants are all working now	2023-06-13 16:15:13 +08:00
Concedo	215edf420b	Merge branch 'master' into concedo_experimental	2023-06-12 21:53:13 +08:00
Concedo	9c08017051	this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not.	2023-06-12 21:47:57 +08:00
Howard Su	58970a4c39	Leverage mmap for offloading tensors to GPU (#1597 ) * Rebase to latest * Show progress * Add assert to make sure we only allocate temp buffer for non-CPU backend tensor Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2023-06-12 14:44:16 +02:00
Concedo	b9f74db89e	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile	2023-06-10 21:07:20 +08:00
Robert Sung-wook Shin	98ed165574	OpenCL: Add release memory (#1741 ) * Add opencl release memory * Rename function name	2023-06-09 18:24:40 +02:00
Concedo	e78c675a6e	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # flake.lock # flake.nix # ggml-opencl.cpp	2023-06-07 15:23:29 +08:00
Johannes Gäßler	17366df842	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703 ) * CUDA multi GPU + scratch ggml_cuda_compute_forward Tensor parallelism ggml_cuda_add ggml_cuda_rms_norm ggml_cuda_silu CUDA scratch buffer --main-gpu CLI option	2023-06-06 21:33:23 +02:00
LostRuins	d5b111f53d	Clblast fixes + enhancements to save VRAM and offload more layers (#1675 ) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation * Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it. * change max value size_t to use limits * removed flags from the CL pool malloc, apply code tidying suggestions.	2023-06-06 19:00:01 +02:00
Concedo	54dc75ce73	Merge branch 'concedo-opencl-dev' into concedo_experimental	2023-06-05 13:31:53 +08:00
Concedo	f6431ded5d	removed flags from the CL pool malloc, apply code tidying suggestions.	2023-06-05 13:31:37 +08:00
Concedo	1ddbb9acd9	Merge branch 'concedo-opencl-dev' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-06-04 18:07:27 +08:00
Concedo	64e3e74556	change max value size_t to use limits	2023-06-04 18:04:52 +08:00
LostRuins	2b700749e5	Merge branch 'master' into concedo-opencl-dev	2023-06-04 18:00:06 +08:00

1 2 3

106 commits