koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-16 19:59:16 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	228f724d9c	kv-cache : fix seq_rm with seq_id == -1 (#15226 ) * kv-cache : fix seq_rm with seq_id == -1 ggml-ci * cont : iterate over streams ggml-ci	2025-08-11 13:58:24 +03:00
Daniel Bevenius	cd3069dfcb	kv-cache : log (debug) all streams in find_slot (#15176 ) This commit updates `llama_kv_cache_unified::find_slot` to log information for all streams when debug is enabled. The motivation for this change is that currently if a non-unified kv-cache is used, then only one stream will be logged because the code was currently uses `seq_to_stream[1]`.	2025-08-11 11:21:19 +02:00
Sigbjørn Skjæret	50e81bdf5d	convert : fix merge conflicts (#15229 )	2025-08-11 11:15:44 +02:00
Daniel Bevenius	1ebbaddff2	perplexity : update comments/error msg to use decode [no ci] (#15227 ) This commit updates comments and error messages to use "decode" instead of "eval" in perplexity.cpp. The motivation for this is that `llama_eval` was renamed to `llama_decode` a while ago, but the comments and error messages still referred to "eval". This change ensures consistency and clarity.	2025-08-11 11:21:24 +03:00
Julien Denize	a3a7874272	convert : improve Mistral models integration (#14737 ) * Improve Mistral models integration with llama.cpp * Revert changes and fix gguf * Revert change * refactor convert_mistral_to_gguf.py in convert_hf_to_gguf.py * Revert collateral * Rename model name * refactor * revert * remove duplicate * Remove duplication code * Fixes * Fix flake issues * Apply comments * Apply comments * Apply comments * Fix remote * add default chat template * Revert * nit	2025-08-11 10:07:49 +02:00
Charles Xu	002cb1bb33	kleidiai: fix unsigned overflow bug (#15150 ) * kleidiai: fix unsigned overflow bug * address review comments	2025-08-11 09:59:26 +02:00
Concedo	30e2f25c05	alias tensorsplit , fixed python error	2025-08-10 22:38:14 +08:00
Concedo	300e20be6c	allow termux to launch existing downloaded models	2025-08-10 21:29:51 +08:00
Concedo	8e6d27f629	handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists	2025-08-10 16:51:34 +08:00
kallewoof	204739e7f1	Adapter fixes (#1659 ) * test adapters * add assistant_gen adapter key * add support for chat templates stored as .jinja files * removed mistakenly commited gated-tokenizers link * autoguess: Harmony: add missing newline prefixes to system_end	2025-08-10 16:19:50 +08:00
Concedo	57db0ce9cd	allow uploading tagged pinned versions for rocm	2025-08-10 11:04:49 +08:00
Concedo	1515d67c2c	oldpc build is now fixed (+2 squashed commit) Squashed commit: [d11ac6cef] temp test [cfbc008b1] test no f16 as well	2025-08-10 10:52:45 +08:00
David Zhao	79c1160b07	cuda: refactored ssm_scan and use CUB (#13291 ) * cuda: refactored ssm_scan to use CUB * fixed compilation error when when not using CUB * assign L to constant and use size_t instead of int * deduplicated functions * change min blocks per mp to 1 * Use cub load and store warp transpose * suppress clang warning	2025-08-09 20:29:43 +02:00
Concedo	89266ac6b8	autoguess adapter make case insensitive	2025-08-10 00:58:47 +08:00
Concedo	487d509b44	try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits) Squashed commits: [940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95	2025-08-10 00:10:37 +08:00
Concedo	4c1faf61b2	increment version (+1 squashed commits) Squashed commits: [6e5080ad2] increment version	2025-08-09 20:53:26 +08:00
Concedo	0fb25bb165	Merge branch 'upstream' into concedo_experimental	2025-08-09 20:31:36 +08:00
Concedo	5f95fc1122	update lite	2025-08-09 20:31:15 +08:00
Aman Gupta	34c9d765bf	CUDA: add attention sinks for tile and wmma (#15178 ) * CUDA: add attention sinks for tile and wmma * Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma	2025-08-09 20:00:24 +08:00
Concedo	ced98823a1	kai api tool calling	2025-08-09 10:51:10 +08:00
Concedo	4c7b82e982	Merge branch 'upstream' into concedo_experimental # Conflicts: # scripts/server-bench.py	2025-08-09 10:34:24 +08:00
Concedo	fc551470d4	updated lite	2025-08-09 10:33:59 +08:00
compilade	e54d41befc	gguf-py : add Numpy MXFP4 de/quantization support (#15111 ) * gguf-py : add MXFP4 de/quantization support * ggml-quants : handle zero amax for MXFP4	2025-08-08 17:48:26 -04:00
Johannes Gäßler	4850b52aed	server-bench: external OAI servers, sqlite (#15179 ) * server-bench: external OAI servers, sqlite * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * raise_for_status --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-08 23:04:36 +02:00
Concedo	9e7a940ce4	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/softmax_4_f16.cl # ggml/src/ggml-opencl/kernels/softmax_4_f32.cl # ggml/src/ggml-opencl/kernels/softmax_f16.cl # ggml/src/ggml-opencl/kernels/softmax_f32.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp	2025-08-09 01:24:52 +08:00
Concedo	7087aeb4bc	anti bsod only for nvidia	2025-08-09 01:23:38 +08:00
Concedo	67e0072245	fixed clblast repacking	2025-08-09 01:08:02 +08:00
Concedo	3468c2834d	fixed adv mode	2025-08-08 22:26:36 +08:00
kallewoof	866cc346ab	tweak OpenAI Harmony autoguess developer prefix and assistant end token (#1673 ) * tweak OpenAI Harmony autoguess developer prefix * use <\|end\|> for adapter end	2025-08-08 21:15:11 +08:00
AN Long	cd6983d56d	ggml : fix field name when new ggml_backend (#14944 )	2025-08-08 14:37:22 +02:00
Olivier Chafik	6c7e9a5440	vendor: sync minja (#15161 ) * vendor: sync minja * Update minja.hpp * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-08 10:45:18 +01:00
Johannes Gäßler	1425f587a8	CUDA: attention sinks for mma FlashAttention (#15157 )	2025-08-08 08:19:58 +02:00
lhez	aaa3d07ae7	opencl: support sink in `soft_max` (attn sinks) (#15152 )	2025-08-07 21:47:03 -07:00
Concedo	d5b5e79035	should fix vulkan bsod	2025-08-08 10:57:50 +08:00
Wagner Bruna	eed5577aaa	fix unintended sd model quantization (#1672 ) The recent ggml update added another quant type, GGML_TYPE_MXFP4, which got the same value as SD_TYPE_COUNT. That made the embedded sd.cpp quantize to GGML_TYPE_MXFP4 by default. Photomaker in particular ends up crashing due to "Missing CPY op for types: f32 mxfp4".	2025-08-08 10:19:58 +08:00
Xuan-Son Nguyen	50aa938901	convert : support non-mxfp4 HF model (#15153 ) * convert : support non-mxfp4 HF model * rm redundant check * disable debug check	2025-08-07 23:26:03 +02:00
Jeff Bolz	c4f53563df	vulkan: support fattn sinks (#15126 )	2025-08-07 22:44:20 +02:00
Jeff Bolz	a0552c8bee	vulkan: Add env var to disable host visible vidmem (#15109 )	2025-08-07 22:07:11 +02:00
RunningLeon	99acbc9921	llama : Support intern-s1 (#14875 ) * support internvl * support interns1 * resolve comments * put interns1 in tensor mapping * resolve comment * move tokenizer changes to sub class	2025-08-07 18:20:40 +02:00
Concedo	8f15461bea	updated lite	2025-08-08 00:04:47 +08:00
uvos	7ad67ba9fe	HIP: add cmake option to enable compiler output of kernel resource usage metrics (#15103 )	2025-08-07 16:44:14 +02:00
Concedo	8a71eb03c0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ggml/cmake/ggml-config.cmake.in # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # requirements/requirements-convert_hf_to_gguf.txt # scripts/compare-llama-bench.py # tests/test-chat-template.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2025-08-07 21:23:09 +08:00
Concedo	338b1fe97e	readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode	2025-08-07 21:14:48 +08:00
Christian Kastner	9a96389544	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 ) Any available libraries are found and loaded dynamically at runtime.	2025-08-07 13:45:41 +02:00
Johannes Gäßler	1d72c84188	CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131 ) * CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16	2025-08-07 10:53:21 +02:00
Johannes Gäßler	20638e4f16	scripts: fix crash when --tool is not set (#15133 )	2025-08-07 08:50:30 +02:00
Daniel Bevenius	36d3f00e14	requirements : fix PyTorch uint64 compatibility (#15134 ) This commit addresses an issue with the convert_hf_to_gguf script which is currently failing with: ```console AttributeError: module 'torch' has no attribute 'uint64' ``` This occurred because safetensors expects torch.uint64 to be available in the public API, but PyTorch 2.2.x only provides limited support for unsigned types beyond uint8 it seems. The torch.uint64 dtype exists but is not exposed in the standard torch namespace (see pytorch/pytorch#58734). PyTorch 2.4.0 properly exposes torch.uint64 in the public API, resolving the compatibility issue with safetensors. This also required torchvision to updated to =0.19.0 for compatibility. Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/186#68938de803e47d990aa087fb Refs: https://github.com/pytorch/pytorch/issues/58734	2025-08-07 05:31:48 +02:00
Reese Levine	5fd160bbd9	ggml: Add basic SET_ROWS support in WebGPU (#15137 ) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments	2025-08-06 15:14:40 -07:00
rmatif	756cfea826	fix profiling crash (#15072 )	2025-08-06 14:17:51 -07:00
lhez	e725a1a982	opencl: add `swiglu_oai` and `add_id` (#15121 ) * opencl: add `swiglu-oai` * opencl: add `add_id` * opencl: add missing `add_id.cl`	2025-08-06 12:12:17 -07:00

... 31 32 33 34 35 ...

10692 commits