koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-14 02:49:41 +00:00

Author	SHA1	Message	Date
Diego Devesa	e0e912f49b	llama : add option to override model tensor buffers (#11397 ) * llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes	2025-04-02 14:52:01 +02:00
Georgi Gerganov	a10b36c91a	llama : refactor kv cache guard (#12695 ) * llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning	2025-04-02 14:32:59 +03:00
Concedo	7f1003be44	warning for max tokens being too high	2025-04-02 18:58:38 +08:00
Sigbjørn Skjæret	83a88bd6af	vocab : BailingMoE : change possessive quantifiers to greedy (#12677 )	2025-04-02 11:21:48 +02:00
Xuan-Son Nguyen	42eb248f46	common : remove json.hpp from common.cpp (#12697 ) * common : remove json.hpp from common.cpp * fix comment	2025-04-02 09:58:34 +02:00
Chenguang Li	9bacd6b374	[CANN] get_rows and dup optimization (#12671 ) * [CANN]get_rows and dup optimization. Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]GET_ROWS and CPY/DUP optimization Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>	2025-04-02 15:22:13 +08:00
Concedo	669311365c	fixed gemma system prompt	2025-04-02 13:58:51 +08:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Junil Kim	f423981ac8	opencl : fix memory allocation size (#12649 ) issue: https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283 This patch fixes the memory allocation size not exceeding the maximum size of the OpenCL device.	2025-04-01 09:54:34 -07:00
Concedo	fbf5c04c3c	silly me	2025-04-02 00:51:05 +08:00
Concedo	30e3d24ead	embd include name	2025-04-02 00:40:38 +08:00
Concedo	e37f27632f	clear cpu flag manually for templates, added truncation for embeddings	2025-04-02 00:18:30 +08:00
jklincn	e39e727e9a	llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key (#12672 )	2025-04-01 14:54:28 +02:00
Sigbjørn Skjæret	5936a616e4	convert : BailingMoE : fix qkv split when head_dim is 0 (#12687 ) NOTE: Ling-lite-base is broken, see https://huggingface.co/inclusionAI/Ling-lite-base/discussions/2	2025-04-01 14:37:13 +02:00
Concedo	8a4a9b8c19	Merge branch 'upstream' into concedo_experimental	2025-04-01 20:16:16 +08:00
Concedo	9e182b3e78	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/sync-ggml.last # tests/test-chat-template.cpp	2025-04-01 20:16:07 +08:00
Georgi Gerganov	3fd072a540	metal : use F32 prec in FA kernels (#12688 ) * metal : use F32 prec in FA kernels ggml-ci * cont : fix FA vec kernel ggml-ci	2025-04-01 14:57:19 +03:00
Concedo	0fd94e19f3	made tool calls more robust and allowed tool call template customization	2025-04-01 19:16:45 +08:00
R0CKSTAR	a6f32f0b34	Fix clang warning in gguf_check_reserved_keys (#12686 ) * Fix clang warning in gguf_check_reserved_keys Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-01 13:12:53 +02:00
Wagner Bruna	2bb3597e42	vulkan: fix build when glslc doesn't support coopmat (#12683 )	2025-04-01 11:38:07 +02:00
Romain Biessy	8293970542	SYCL: Rename oneMKL to oneMath (#12192 ) * Rename oneMKL Interface to oneMath * Use oneMath for Intel vendor * Rename occurences to mkl * clang-format * Silence verbose warnings * Set oneMath HIP_TARGETS * Fix silence warnings * Remove step to build oneMath from build instructions * Use fixed oneMath version * Remove INTEL_CPU * Fold CMake oneDNN conditions * Use Intel oneMKL for Intel devices * Improve CMake message * Link against MKL::MKL_SYCL::BLAS only * Move oneMath documentation to Nvidia and AMD sections	2025-04-01 16:24:29 +08:00
Akarshan Biswas	8bbf26083d	SYCL: switch to SYCL namespace (#12674 )	2025-04-01 10:11:39 +02:00
henk717	4291e1575b	Fix tool spec, this spec is kinda.... (#1458 )	2025-04-01 10:39:02 +08:00
Sigbjørn Skjæret	35782aeedb	convert : BailingMoE : avoid setting rope_dim to 0 (#12678 )	2025-03-31 23:09:48 +02:00
Daniel Bevenius	c80a7759da	vocab : add special infill tokens for CodeLlama (#11850 ) * vocab : add special infill tokens for CodeLlama The commit adds the following special tokens for CodeLlama infill: - `▁<PRE>` - `▁<SUF>` - `▁<MID>` The motivation for this is that currently the infill example uses CodeLlama as a suggested model. But when using this model the following error is generated: ```console /llama.cpp-debug/examples/infill/infill.cpp:165: GGML_ASSERT(llama_vocab_fim_pre(vocab) >= 0) failed Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. 305251 Aborted (core dumped) ./build/bin/llama-infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf \ -c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 \ --in-prefix "def helloworld():\n print(\"hell" \ --in-suffix "\n print(\"goodbye world\")\n " ``` * squash! vocab : add special infill tokens for CodeLlama Add _<EOT> as well.	2025-03-31 18:40:56 +02:00
Concedo	c0adaabfa4	Revert "try fix owui" This reverts commit `12e5b8abdb`.	2025-04-01 00:27:31 +08:00
Concedo	12e5b8abdb	try fix owui	2025-04-01 00:23:45 +08:00
a3sh	250d7953e8	ggml : faster ssm scan (#10558 ) * faster ssm_scan * delete unused commnet * clang format * add space * modify unnecessary calculations * faster ssm conv implementatioin * modify file name with dash	2025-03-31 18:05:13 +02:00
Concedo	0ed95fcccc	fixed l3 template, add index	2025-03-31 23:59:06 +08:00
Sigbjørn Skjæret	403fbacbbc	convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present (#12667 )	2025-03-31 16:36:25 +02:00
0cc4m	a8a1f33567	Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 ) * Vulkan: Add DP4A MMQ and Q8_1 quantization shader * Add q4_0 x q8_1 matrix matrix multiplication support * Vulkan: Add int8 coopmat MMQ support * Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code * Add GL_EXT_integer_dot_product check * Remove ggml changes, fix mmq pipeline picker * Remove ggml changes, restore Intel coopmat behaviour * Fix glsl compile attempt when integer vec dot is not supported * Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq * Remove redundant comment * Fix integer dot check * Fix compile issue with unsupported int dot glslc * Update Windows build Vulkan SDK version	2025-03-31 14:37:01 +02:00
Georgi Gerganov	1790e73157	cmake : fix whitespace (#0 )	2025-03-31 15:07:32 +03:00
Georgi Gerganov	0114a32da0	sync : ggml ggml-ci	2025-03-31 15:07:32 +03:00
Sandro Hanea	a7724480fd	cmake: improve Vulkan cooperative matrix support checks (whisper/2966) Co-authored-by: Sandro Hanea <me@sandro.rocks>	2025-03-31 15:07:32 +03:00
Sigbjørn Skjæret	1a85949067	llava : proper description fix (#12668 )	2025-03-31 11:28:30 +02:00
Akarshan Biswas	6c02a032fa	SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 ) * SYCL: Remove misleading ggml_sycl_op_flatten function * remove trailing whitespace * Fix L2 norm from rebase * remove try catch block from element_wise.cpp * remove comment from common.hp * ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward * norm.cpp: remove try catch exception block	2025-03-31 11:25:24 +02:00
Sigbjørn Skjæret	f52d59d771	llava : fix clip loading GGUFs with missing description (#12660 )	2025-03-31 11:07:07 +02:00
Concedo	1ebadc515e	add streaming support for oai tools (+2 squashed commit) Squashed commit: [4d080b37] qwen2.5vl surgery script [4bebe7e5] add streaming support for oai tools	2025-03-31 16:49:15 +08:00
marcoStocchi	52de2e5949	tts : remove printfs (#12640 ) * tts.cpp : llama tokens console output is done using LOG_INF instead of printf(). Therefore the options '--log-disable' and '--log-file' have now uniform impact on all output.	2025-03-31 11:20:30 +03:00
henk717	091eb367fc	More robust tool calling prompt (#1455 ) * More robust tool checking prompt * Inform UI we want a tool	2025-03-31 14:43:03 +08:00
Sigbjørn Skjæret	2c3f8b850a	llama : support BailingMoE (Ling) (#12634 )	2025-03-30 22:21:03 +02:00
Georgi Gerganov	4663bd353c	metal : use constexpr in FA kernels + fix typedef (#12659 ) * metal : use constexpr in FA kernels ggml-ci * cont ggml-ci * cont : fix typedef ggml-ci	2025-03-30 22:04:04 +03:00
Juyoung Suk	b3de7cac73	llama : add Trillion 7B model support (#12556 ) * Support Trillion 7B * Update llama.h * Update llama.h * Update llama-vocab.cpp for Trillion * Update llama-vocab.cpp	2025-03-30 20:38:33 +02:00
Sergei Vorobyov	7242dd9675	llama-chat : Add Yandex instruct model template support (#12621 ) * add yandex template * update yandex chat template * fix tests * adjust chat template * fix style * fix tool macro in template * add clarify comment --------- Co-authored-by: Sergei Vorobev <serv01@yandex-team.ru> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-30 20:12:03 +02:00
Concedo	77bd24a0b1	updated lite	2025-03-30 21:35:40 +08:00
Concedo	621aa7c825	fixed clblast. but this part might not actually be helpful speed wise	2025-03-30 21:27:52 +08:00
Concedo	e1d3c19673	clblast not working correctly	2025-03-30 21:02:30 +08:00
Concedo	e6337ff957	Merge commit '`e408d4351a`' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt	2025-03-30 18:26:02 +08:00
Concedo	ce05aa722d	Merge commit '`0bb2919335`' into concedo_experimental # Conflicts: # ggml/src/CMakeLists.txt # src/llama-model.cpp	2025-03-30 18:18:20 +08:00
Concedo	61a73347c6	fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [bb78db1e] wip fixing mrope	2025-03-30 17:23:58 +08:00

... 13 14 15 16 17 ...

8128 commits