koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-13 15:39:11 +00:00

Author	SHA1	Message	Date
Concedo	05834eecb3	Merge commit '`1ca3d1de15`' into concedo_experimental # Conflicts: # tools/server/README.md	2026-02-26 19:55:06 +08:00
Concedo	adebf63877	ace converter	2026-02-26 19:53:02 +08:00
Georgi Gerganov	1ca3d1de15	gguf : avoid too many file size calls (#19919 )	2026-02-26 12:46:32 +02:00
yggdrasil75	bd72300591	server : fix typo in server README.md (#19900 ) fix typo	2026-02-26 11:26:16 +01:00
Concedo	ac8f12f259	still a bit wonky	2026-02-26 17:50:49 +08:00
Concedo	81fb4d773c	swap resampling function	2026-02-26 17:37:53 +08:00
Concedo	749a606374	whisper broke	2026-02-26 16:45:04 +08:00
Concedo	44182ebefe	Merge commit '`8c2c0108dd`' into concedo_experimental # Conflicts: # examples/model-conversion/Makefile # examples/model-conversion/scripts/utils/inspect-org-model.py # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/get-rows-ops.c # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/set-rows-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-mtmd.sh # scripts/snapdragon/windows/run-cli.ps1 # scripts/sync_vendor.py # tests/test-backend-sampler.cpp	2026-02-26 16:30:37 +08:00
Concedo	7e53bfd28d	Merge commit '`2b6dfe824d`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # examples/save-load-state/save-load-state.cpp # src/llama-context.cpp # tools/cli/cli.cpp	2026-02-26 15:07:23 +08:00
Wagner Bruna	d400b37215	config file saving enhancements (#1994 ) * process --exportconfig and --exporttemplate after --config This allows using `--config oldfile.kcpps --exportconfig newfile.kcpps` to update old config items, copy a config file with changed parameters, download and save a remote config, etc. * filter out command flags from the saved config files Also ident files saved by command-line.	2026-02-26 14:55:01 +08:00
Concedo	fb3f7d92bc	reenable cfg	2026-02-26 14:51:15 +08:00
Concedo	b7d2fe68e7	adjust	2026-02-26 14:46:41 +08:00
Concedo	edbc4fe592	music lm finally working	2026-02-26 14:00:58 +08:00
Concedo	cf042af701	Revert "still not working" This reverts commit `a1305ffff9`.	2026-02-26 10:55:55 +08:00
Concedo	a1305ffff9	still not working	2026-02-26 10:48:21 +08:00
Neo Zhang	2943210c1e	support permuted, remove check s0/s10 (#19889 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2026-02-26 10:27:20 +08:00
Jeff Bolz	3769fe6eb7	vulkan: check for memory overlap before doing fusion (#19768 ) * vulkan: check for memory overlap before doing fusion * Update ggml/src/ggml-vulkan/ggml-vulkan.cpp * address feedback	2026-02-25 18:25:38 +01:00
Concedo	5c5fe55f7d	bump kv overrides max (+1 squashed commits) Squashed commits: [9bc8212a0] bump kv overrides max	2026-02-26 00:24:53 +08:00
Concedo	d8746a851f	still bugged	2026-02-26 00:07:04 +08:00
Concedo	8a3ccfcba5	some fixes but some issues	2026-02-25 23:41:32 +08:00
ddh0	832aa94762	common : add more aliases for sampler CLI params (#19797 ) * common : add more aliases for sampler CLI params	2026-02-25 16:34:25 +01:00
Slobodan Josic	3af34b9ff5	ci : update the ROCm/HIP toolchain versions [no ci] (#19891 ) * [HIP] Update ROCm build container to rocm/dev-ubuntu-22.04:7.2 and HIP_SDK to 26.Q1 * revert container version --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-25 15:54:49 +01:00
Georgi Gerganov	f20469d919	server : enable multi-modal prompt caching (#19877 )	2026-02-25 15:15:42 +02:00
Georgi Gerganov	d7d826b3c1	server : support multi-modal context checkpoints (#19849 ) * Modify llama-memory-hybrid-iswa.cpp * Modify llama-memory-recurrent.cpp * Modify server-common.cpp * Modify server-common.h * Modify server-context.cpp * Modify server-task.h * Added comment to llama-memory-hybrid-iswa.cpp * Remove comment from server-context.cpp * Stylistic fix server-context.cpp * Fix an issue when seqrm isn't called in server-context.cpp * cont : alternative impl * cont : cleanup * cont : n_tokens -> int64_t --------- Co-authored-by: timkhronos <timkhronos@gmail.com>	2026-02-25 15:14:27 +02:00
Xuan-Son Nguyen	c747294b2d	scripts: update corpus of compare-logprobs (#19326 ) * scripts: update corpus of compare-logprobs * fix	2026-02-25 12:57:34 +01:00
Mario Limonciello	8fdf269dad	ci : update Windows ROCm build to 26.Q1 [no ci] (#19810 ) * Update build command to build llama-* tools not just ggml-hip * Update rocWMMA headers to 7.2 * Add GFX1150 target * Correct library paths for AMD libraries in 26.Q1	2026-02-25 12:30:19 +01:00
Aldehir Rojas	a96a1120b4	gguf : fix ftell/fseek for Windows (#19870 )	2026-02-25 06:58:11 +02:00
Georgi Gerganov	244641955f	models : fix graph splits (#19866 )	2026-02-25 00:01:13 +02:00
Pascal	47eb12b953	server: fix query params lost when proxying requests in multi-model router mode (#19854 ) * server: fix query params lost when proxying requests in multi-model router mode * server: re-encode query params using httplib::encode_query_component in proxy	2026-02-24 21:46:06 +01:00
Georgi Gerganov	418dea39ce	ggml/gguf : prevent integer overflows (#19856 ) * gguf : prevent integer overflow for ggml_context mem size * ggml : fix int overflows in ggml_new_object() * gguf : prevent string exhaustion * gguf : prevent array elements exhaustion * ggml : fix negative tensor type oob * py : assert that alignment is non-zero power of 2 * ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object * gguf-py : error on duplicate keys when reading * py : restore tensor_fields * enforce proper alignment in add_custom_alignment * gguf : better name * gguf : fix ctx size for no_alloc == true * gguf : minor print fix * ggml : print values when overflow * ggml : remove deprecated ggml_type_sizef() * ggml : relax ggml_type asserts to debug-only * gguf : add mem_size overflow test * gguf : add file size check for arrays * ggml : relax asseerts for ggml_get_type_traits() * flake8 fix --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-24 20:17:11 +02:00
Concedo	0eafc3cf2d	ace step lowvram mode done, improved	2026-02-24 23:12:26 +08:00
Concedo	11a85d62fc	lowvram for music lm	2026-02-24 22:21:17 +08:00
Concedo	aa58d1ed3b	all working, but needs to optimize vram	2026-02-24 21:55:57 +08:00
Tarek Dakhran	da426cb250	model : update label for LFM2-24B-A2B (#19848 ) * model : Update label for LFM2-24B-A2B ``` ❯ build/bin/llama-bench -m /data/playground/checkpoints/LFM2-24B-A2B-Preview-Q4_0.gguf,/data/playground/checkpoints/LFM2-8B-A1B-Q4_0.gguf -p 1 -n 0 \| model \| size \| params \| backend \| threads \| test \| t/s \| \| ------------------------------ \| ---------: \| ---------: \| ---------- \| ------: \| --------------: \| -------------------: \| \| lfm2moe 24B.A2B Q4_0 \| 12.54 GiB \| 23.84 B \| CPU \| 10 \| pp1 \| 30.35 ± 2.49 \| \| lfm2moe 8B.A1B Q4_0 \| 4.41 GiB \| 8.34 B \| CPU \| 10 \| pp1 \| 49.24 ± 1.93 \| ``` * Remove extra line	2026-02-24 14:27:42 +01:00
Concedo	488c431331	not yet working	2026-02-24 17:47:50 +08:00
Radoslav Gerganov	c830f99cfa	server : support max_completion_tokens request property (#19831 ) "max_tokens" is deprectated in favor of "max_completion_tokens" which sets the upper bound for reasoning+output token. Closes: #13700	2026-02-24 10:30:00 +02:00
Ruben Ortlam	aa6f918c1c	Vulkan Scalar Flash Attention Refactor (#19625 ) * vulkan: allow using fp16 in scalar flash attention shader * split rows inside of subgroups for faster synchronization * use row_split when Br >= 4, change reductions to use shared memory if row_split == 1 * use f32 scalar FA if f16 is not supported by device * fix amd workgroup size issue * optimize masksh use * add medium rows FA shader Br size * fixes * add padding to mask shmem buffer * cache q values into registers for KQ * fuse lf accumulation, pf and v accumulation into a loop * stage K loads through shmem * stage V loads through shmem * only stage through shmem on Nvidia * default to Bc 32 * also stage V through shmem when this is done for K * dynamic subgroups for intel * use vectorized stores * use float_type for dequantize4 functions * use smaller scalar rows size for smaller rows count * relax flash attention split_k condition to allow non-gqa use * use minimal subgroup size on Intel * fix shmem support function * fix rebase issues * fixes * Bc 4 for scalar FA is not a valid configuration * Use wave32 on AMD RDNA for scalar FA * add Intel shader core count lookup-table * fix regressions * device tuning * tmpsh size fix * fix editorconfig * refactor fa tuning logic into a single place * fix gqa opt logic * fix block_rows with small n_rows * amd tuning * fix hsk=72/80 issue * tuning * allow condition skipping for column check * use float16 for Of if available * address feedback * fix bad RDNA performance on head size <= 128 by limiting occupancy * allow printing pipeline stats * cleanup and fixes * limit occupancy for GCN for small batch FA with large HSK * disable f16 FA for GCN AMD GPUs on the proprietary driver	2026-02-24 08:35:48 +01:00
Concedo	0fd7d2c0e5	ace step diffusion loading	2026-02-24 15:24:15 +08:00
Jeff Bolz	8c2c0108dd	vulkan: fix coopmat1 without bf16 support (#19793 )	2026-02-24 07:48:32 +01:00
Jeff Bolz	3ea5360c00	vulkan: fix data race in mul_mat_id shader (#19790 )	2026-02-24 07:43:12 +01:00
Max Krasnyansky	39fb81f875	hexagon refactor all Ops to use local context struct (#19819 ) * hexagon: refactor set/get/sum-rows ops to use local context * hexagon: refactor ROPE and Softmax Ops to use local context Improves performance a bit by precomputing things and saving in the context. * hexagon: refactor activation ops to use local context struct * hexagon: refactor unary ops to use local context struct and DMA/VTCM * hexagon: use aligned hvx_scale function * hexagon: remove unused fields from op_context * hexagon: rewrite ROPE to use DMA and VTCM scratchpad * hex-rope: keep N rows in scratchpad (instead of just two) * hex-rope: introduce rowidx cache * hex-rope: remove unused fields * hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute also removes the need for fastdiv. * hex-rope: minor formatting * hex-rope: use indices and unroll the loops * hex-rope: more updates to cleanup rope-block handling * hexagon: cleanup supported type/dims checks * hexagon: all reduce funcs replicated across lanes There is no need to explicitly replicate the first value. * snapdragon: update adb and windows scripts to use ubatch-size 256 Updated Ops support handles larger ubatches.	2026-02-23 16:32:14 -08:00
Aleksander Grygier	5eb0ea32f0	feat: Add code blocks full height setting to parameter sync service (#19835 )	2026-02-23 22:30:13 +01:00
Adrien Gallouët	b68a83e641	vendor : update cpp-httplib to 0.34.0 (#19830 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-23 21:05:48 +01:00
Concedo	749536f464	fixed wav header wrong size	2026-02-24 01:13:44 +08:00
Daniel Bevenius	d8aeb65cee	tests : fix typos in comments in test-backend-sampler [no ci] (#19824 ) * tests : fix typos in comments in test-backend-sampler [no ci]	2026-02-23 17:12:02 +01:00
askmyteapot	062e361968	Update ace-qwen3.cpp to build on MSVC (#1992 ) need to include <sstream> otherwise build fails with lots of the below errors: ``` C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2297: '<<': not valid as right operand has type 'const cha r [26]' [C:\koboldcpp\build\music_adapter.vcxproj] (compiling source file '../otherarch/acestep/music_adapter.cpp') C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2679: binary '<<': no operator found which takes a right-h and operand of type 'std::string' (or there is no acceptable conversion) [C:\koboldcpp\build\music_adapter.vcxproj] (compiling source file '../otherarch/acestep/music_adapter.cpp') C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717\include\__msvc_int128.hpp( 753,46): could be 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept' [found us ing argument-dependent lookup] C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept': cannot conver t argument 2 from 'std::string' to 'const std::_Base128 &' C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57): Reason: cannot convert from 'std::string' to 'const std::_Base128' C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57): No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called ```	2026-02-23 23:03:07 +08:00
Concedo	5311997581	updated ace step cpp	2026-02-23 23:01:10 +08:00
Concedo	2e713cfff5	fixed compile issue, trying out 8bit pcm	2026-02-23 21:19:03 +08:00
Aleksander Grygier	9051663d5d	webui: Add setting to have full height Code Blocks in Chat Messages (#19829 )	2026-02-23 14:16:50 +01:00
Daniel Bevenius	72b44c0d21	model-conversion : merge inspect-org-model.py with tensor-info.py (#19823 ) This commit replaces/merges the inspect-org-model.py script with the contents tensor-info.py script. The merged script has also been updated to also print tensor sizes which was the only thing that was not done before (by tensor-info.py that is). The motivation for this is that tensor-info.py does not load the tensor weights which can be time consuming for larger models. And also now that both are doing almost the same thing it makes sense to just have one and not two scripts to maintain.	2026-02-23 14:15:16 +01:00

1 2 3 4 5 ...

11868 commits