koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-12 05:52:26 +00:00

Author	SHA1	Message	Date
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	b154bd3671	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build.md # docs/development/HOWTO-add-model.md # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-10 17:57:38 +08:00
0cc4m	c3f9d25706	Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161 ) * Vulkan: Remove float16 use in shaders * Fix validation error about subgroup_size_control extension	2025-01-10 06:39:33 +01:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00
Akarshan Biswas	c6860cc734	SYCL: Refactor ggml_sycl_compute_forward (#11121 ) * SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability	2025-01-10 08:13:03 +08:00
Concedo	d3d7dae82b	prevent crash if we ever want to build without coopmat	2025-01-09 17:31:38 +08:00
Concedo	1b49dc305f	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # examples/run/run.cpp # examples/server/README.md # scripts/sync-ggml.last	2025-01-09 16:50:29 +08:00
Concedo	5cce8a5fbc	define coopmat or it will segfault	2025-01-09 16:38:21 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
hydai	8d59d91171	fix: add missing msg in static_assert (#11143 ) Signed-off-by: hydai <z54981220@gmail.com>	2025-01-08 20:03:28 +00:00
Concedo	dcfa1eca4e	Merge commit '`017cc5f446`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # CODEOWNERS # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # examples/gritlm/gritlm.cpp # examples/llama-bench/llama-bench.cpp # examples/passkey/passkey.cpp # examples/quantize-stats/quantize-stats.cpp # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/tokenize/tokenize.cpp # ggml/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-autorelease.cpp # tests/test-model-load-cancel.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2025-01-08 23:15:21 +08:00
Radoslav Gerganov	c792dcf488	ggml : allow loading backend with env variable (ggml/1059) ref: #1058	2025-01-08 13:40:18 +02:00
amritahs-ibm	8cef75c743	llamafile : ppc64le MMA INT8 implementation (#10912 ) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2025-01-08 12:54:19 +02:00
Mathieu Baudier	02f0430141	Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117 ) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive	2025-01-08 09:18:13 +01:00
ag2s20150909	bec2183f2c	fix: Vulkan shader gen binary path when Cross-compiling (#11096 ) * fix: Vulkan shader gen binary path when cross compiling	2025-01-08 09:17:29 +01:00
Johannes Gäßler	53ff6b9b9f	GGUF: C++ refactor, backend support, misc fixes (#11030 ) * GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types	2025-01-07 18:01:58 +01:00
Diego Devesa	017cc5f446	ggml-backend : only offload from host buffers (fix) (#11124 )	2025-01-07 16:11:57 +01:00
Diego Devesa	a3d50bc022	ggml-backend : only offload from host buffers (#11120 )	2025-01-07 12:38:05 +01:00
Radoslav Gerganov	a4dd490069	rpc : code cleanup (#11107 ) Remove duplicated macros, use GGML_LOG_ERROR for errors	2025-01-07 08:37:02 +02:00
Akarshan Biswas	c0d6f790d0	SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087 ) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6	2025-01-07 14:26:07 +08:00
Johannes Gäßler	46e3556e01	CUDA: add BF16 support (#11093 ) * CUDA: add BF16 support	2025-01-06 02:33:52 +01:00
0cc4m	b56f079e28	Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074 ) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check	2025-01-04 21:09:59 +01:00
matt23654	f922a9c542	[GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047 ) * Added init tensor calling code * Added get_alloc_size forwarding * Cleaned up and improved type/error handling. * fix: remove trailing whitespaces. * Cleanup and use GGML error logging functions. * Handle potentially dangerous edge cases. * Apply suggestions from code review Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-04 17:10:30 +01:00
Georgi Gerganov	5e3b08d606	ggml : do not install metal source when embed library (ggml/1054)	2025-01-04 16:09:53 +02:00
Daniel Bevenius	db68c93b57	ggml : improve inputs log sched_print_assignments (ggml/1053) This commit attempts to improve the log message for the inputs of the splits in the sched_print_assignments function. The motivation for this change is that currently even if there are no inputs a colon is displayed at the end of the line, which can make it a little confusing when reading the output as it could be interpreted as the line below are inputs when they are in fact nodes. With this change the colon will only be printed if there actually are inputs.	2025-01-04 16:09:53 +02:00
Gilad S.	c31fc8b966	fix: Vulkan shader gen binary path (#11037 )	2025-01-04 09:17:31 +01:00
Concedo	f9f1585a7f	broken merge - kcpp changes will be applied above this commit for better tracking.	2025-01-03 23:49:17 +08:00
Georgi Gerganov	e7da954ecc	metal : avoid uint (#11019 )	2025-01-03 11:26:14 +02:00
Concedo	911da8765f	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/bench/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # tests/test-backend-ops.cpp	2025-01-03 11:56:20 +08:00
Srihari-mcw	0827b2c1da	ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027 ) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-31 15:23:33 +01:00
Peter	6e1531aca5	common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013 ) In common/common.cpp: * Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature) * Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2) In examples/run/run.cpp: * Add io.h header inclusion (error cannot find function _get_osfhandle) * Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members) * Add initialiser for hFile (warning it may be uninitialised) * Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int) In ggml/src/ggml-opencl/ggml-opencl.cpp: * Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)	2024-12-31 01:46:06 +01:00
Jeff Bolz	716bd6dec3	vulkan: optimize mul_mat for small values of N (#10991 ) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.	2024-12-30 18:27:11 +01:00
Jeff Bolz	a813badbbd	vulkan: im2col and matmul optimizations for stable diffusion (#10942 ) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup	2024-12-29 10:16:34 +01:00
Jeff Bolz	fdd2188912	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
Concedo	7c671f289e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # examples/cvector-generator/mean.hpp # examples/cvector-generator/pca.hpp # examples/export-lora/export-lora.cpp # examples/rpc/rpc-server.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/compare-llama-bench.py # scripts/hf.sh # tests/test-chat-template.cpp	2024-12-28 12:48:34 +08:00
Eve	d79d8f39b4	vulkan: multi-row k quants (#10846 ) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default	2024-12-26 16:54:44 +01:00
Peter	d283d02bf2	examples, ggml : fix GCC compiler warnings (#10983 ) Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)	2024-12-26 14:59:11 +01:00
Concedo	263d49d0d5	qkv warning	2024-12-25 15:15:38 +08:00
Djip007	2cd43f4900	ggml : more perfo with llamafile tinyblas on x86_64 (#10714 ) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2024-12-24 18:54:49 +01:00
Diego Devesa	60cfa728e2	ggml : use wstring for backend search paths (#10960 ) ggml-ci	2024-12-24 04:05:27 +01:00
Diego Devesa	3327bb0f8d	ggml : fix arm enabled features check (#10961 )	2024-12-24 04:05:17 +01:00
Diego Devesa	32d6ee6385	ggml : fix const usage in SSE path (#10962 )	2024-12-23 20:25:52 +01:00
yuri@FreeBSD	7024d59e6a	ggml : fix run-time on FreeBSD in get_executable_path() (#10948 )	2024-12-23 01:20:11 +01:00
Jeff Bolz	ebdee9478c	vulkan: build fixes for 32b (#10927 ) * vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables	2024-12-22 10:44:01 +01:00
Jeff Bolz	a91a41364b	vulkan: optimize coopmat2 dequant functions (#10855 ) Change the code to do 16b loads when possible and extract the appropriate component late, so the code is effectively decoding a pair of elements and then selecting one. This can allow more commoning to happen in the compiler when neighboring elements are loaded.	2024-12-21 08:04:45 +01:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Adrien Gallouët	e34c5af43f	ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874 ) * ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml-cpu: format code Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2024-12-21 00:33:37 +01:00
Akarshan Biswas	eb5c3dc64b	SYCL: Migrate away from deprecated ggml_tensor->backend (#10840 ) * Migrate to tensor->buffer for checking backend buffer type: 1 * SYCL: common.cpp try to migrate away from tensor->backend * SYCL: fix assertions and add proper comments * SYCL: remove extra space * SYCL: Add back static to ggml_backend_buffer_is_sycl_split function * SYCL: Add pragma directive to suppress warning spam * SYCL: Integrate debug logs with GGML_LOG and other fixes * Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97. Let's keep the current SYCL specific logging mechanism for now * SYCL: Use GGML_SYCL_DEBUG after reverting * SYCL: reg_get_proc_address func, update to the current func signature * SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d	2024-12-20 23:31:28 +08:00
Diego Devesa	21ae3b9be8	ggml : add test for SVE and disable when it fails (#10906 )	2024-12-20 13:31:28 +01:00
Concedo	50648de0af	rephrase tensor moved warning, cleanup and prepare for ci	2024-12-19 22:57:43 +08:00

1 2 3 4 5 ...

572 commits