koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	ed09a854f0	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # README.md # ci/run.sh # ggml-opencl.cpp # tests/CMakeLists.txt	2024-01-27 11:45:07 +08:00
0cc4m	a1d6df129b	Add OpenCL add kernel (#5151 ) * Add OpenCL add kernel * Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results	2024-01-26 23:07:32 +01:00
Concedo	2a4a7241e6	Merge branch 'vulkan_test' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # llama.cpp	2024-01-25 23:01:44 +08:00
Concedo	7b3866f211	vulkan implementation from occam (early access, squashed)	2024-01-25 18:13:19 +08:00
Concedo	71e9a64171	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/nix-ci.yml # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-opencl.cpp # llama.cpp	2024-01-20 23:27:42 +08:00
slaren	e7e4df031b	llama : ggml-backend integration (#4766 ) * llama : ggml-backend integration * ggml-backend : add names to buffers * fix unmap after loading * batched-bench : add tensor_split param * llama : check for null tensor_split * ggml-backend : increase GGML_MAX_BACKENDS * improve graph splitting, partial fix for --no-kv-offload * cuda : add ggml-backend split buffer support * cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available) * ggml : fix null backend dereference (#4807) * ggml : fix null backend dereference * ggml : also check ggml_backend_is_cpu * test-backend-ops : check buffer allocation failures * llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row) * ggml : fix mul_mat_id work size * llama : rewrite session kv load/set without graphs * minor * llama : only initialize used backends, free backends on context free * llama : abort ctx if cuda backend init fails * llama : rewrite lora with ggml-backend and compute on CPU ggml-ci * llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer * opencl : add ggml-backend buffer type * cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf) * llama : on Metal, by default offload the full model ggml-ci * metal : page align the data ptr (#4854) * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix split buffer free * address review comments * llama-bench : add split-mode parameter * fix whitespace * opencl : fix double initialization * server : add --split-mode parameter * use async copy and compute to improve multi-gpu performance ggml-ci * use async memcpys to copy the graph outputs to the CPU * fix opencl * use a host buffer for the cpu compute buffer for faster copies to the gpu --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-01-12 20:07:38 +01:00
Concedo	4f40c226a0	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # .gitignore # CMakeLists.txt # Makefile # README.md	2023-12-01 23:46:59 +08:00
Jared Van Bortel	15f5d96037	build : fix build info generation and cleanup Makefile (#3920 ) * cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae.	2023-12-01 00:23:08 +02:00
Concedo	f344a99425	causallm is not working well on clblast, running out of mem wth blas. this helps a bit but doesnt fix the problem.	2023-10-26 23:36:35 +08:00
Concedo	5db89b90b7	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # CMakeLists.txt # Makefile # README.md # build.zig # ggml-opencl.cpp # tests/CMakeLists.txt # tests/test-double-float.cpp # tests/test-sampling.cpp	2023-10-25 23:58:15 +08:00
shibe2	465219b914	CLBlast: Add outer loops over src0 for broadcasting in mulmat Reduce repeated dequantization of the same data.	2023-10-20 22:30:52 +04:00
Concedo	957e245285	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md	2023-10-19 23:32:52 +08:00
shibe2	1117d06607	opencl : fix element-wise multiplication (#3656 )	2023-10-18 15:09:22 +03:00
Concedo	700951dbd4	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-10-18 16:33:09 +08:00
shibe2	40e5ce054f	CLBlast: Fix temporary buffer size for f16 conversion (wsize) Fix buffer overflow. Reduce the size to fit just one 2D slice. Assert sufficient size.	2023-10-17 21:02:30 +04:00
Concedo	5cfabaee25	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # docs/BLIS.md	2023-10-15 15:50:20 +08:00
shibe2	1e0e873c37	CLBlast: Fix matrix-vector multiplication (#3544 )	2023-10-12 21:59:47 +02:00
Concedo	b5cd935cdb	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ggml-opencl.cpp	2023-10-06 17:58:08 +08:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Concedo	c249f7dbc5	Merge branch 'master' into concedo_experimental # Conflicts: # .dockerignore # .gitignore # CMakeLists.txt # Makefile # tests/CMakeLists.txt	2023-10-03 23:51:30 +08:00
shibe2	665018c749	CLBlast: Add broadcast support for matrix multiplication (#3402 ) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.	2023-10-02 21:26:15 +02:00
Concedo	bd2500db36	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # build.zig # flake.nix	2023-09-23 10:51:34 +08:00
shibe2	36b904e200	ggml-opencl.cpp: Make private functions static (#3300 )	2023-09-21 14:10:26 -04:00
Ycros	f6ba36dff6	Reduce warnings. (#439 )	2023-09-16 18:52:09 +08:00
Concedo	a0aa620718	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # README.md	2023-09-05 21:49:24 +08:00
slaren	bd33e5ab92	ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994 )	2023-09-04 14:59:52 +02:00
Wentai Zhang	6460f758db	opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955 ) Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>	2023-09-03 11:46:44 +03:00
Concedo	48c27a9ce1	hotfix for 70b broadcast issues	2023-07-25 01:32:47 +08:00
Concedo	32102c2064	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-07-07 14:15:39 +08:00
Howard Su	481f793acc	Fix opencl by wrap #if-else-endif with \n (#2086 )	2023-07-07 05:34:18 +02:00
Concedo	69add28324	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml	2023-07-04 18:51:42 +08:00
Govlzkoy	14a2cc71f6	[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088 )	2023-07-04 07:50:00 +08:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
Concedo	6d718525c4	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-23 23:56:31 +08:00
Concedo	f7b096374d	fixed string too long CI issue	2023-06-23 23:56:22 +08:00
Concedo	da668e685f	fixing address spaces	2023-06-20 22:46:11 +08:00
Concedo	cce6e67f44	fixing address spaces	2023-06-20 22:45:16 +08:00
Concedo	1f1735f5ad	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 21:39:35 +08:00
Concedo	6b75fc48b9	fixed global const struct types	2023-06-20 21:38:48 +08:00
Concedo	c5ae3f50a7	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 18:41:13 +08:00
Concedo	a6e8b0216d	remove old dot kernels and template	2023-06-20 18:37:48 +08:00
Concedo	93247a11cd	ported q2k and q5k speedups	2023-06-20 18:37:41 +08:00
Concedo	029bed6446	ported q3k speedup successfully	2023-06-20 18:37:26 +08:00
Concedo	d754915269	Merge branch 'optimize_quants_upstream' into concedo_experimental	2023-06-20 17:26:39 +08:00
0cc4m	8d816d19d1	Add q6_k fast matmul kernel	2023-06-20 08:41:35 +02:00
0cc4m	34a4917984	Use preprocessor for QK_K	2023-06-20 08:04:16 +02:00
0cc4m	069cbe530d	Fix q2_k fast kernel	2023-06-20 08:01:40 +02:00
Concedo	c94a438328	xx + ib0	2023-06-19 23:01:49 +08:00
Concedo	266d436746	Added broken new q4k quant	2023-06-19 22:41:35 +08:00
Concedo	278427d9a4	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md	2023-06-18 15:29:44 +08:00

1 2 3

134 commits