koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

Author	SHA1	Message	Date
Concedo	a395af65db	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-riscv.yml # .github/workflows/build.yml # ggml/src/ggml-hexagon/htp/argsort-ops.c # ggml/src/ggml-sycl/fattn-tile.hpp # tools/mtmd/CMakeLists.txt	2026-04-06 20:56:02 +08:00
Concedo	82cc19e055	calculate some fields before autofit for more accurate estimate	2026-04-06 20:44:37 +08:00
Concedo	f6e712d919	universal gemma4 fix, add memory check	2026-04-06 19:20:44 +08:00
Georgi Gerganov	400ac8e194	convert : set "add bos" == True for Gemma 4 (#21500 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / python type-check (push) Waiting to run Details * convert : set "add bos" == True for Gemma 4 * cont : handle old GGUFs	2026-04-06 13:52:07 +03:00
Concedo	a309086735	Revert "increase debug mode truncation limit" This reverts commit `59f863746d`.	2026-04-06 18:51:12 +08:00
henk717	4e30294cb1	Henk's Gemma4 31B Magic (#2096 )	2026-04-06 18:49:19 +08:00
Neo Zhang	f51fd36d79	sycl : handle other FA case (#21377 )	2026-04-06 13:28:00 +03:00
Concedo	59f863746d	increase debug mode truncation limit	2026-04-06 17:57:44 +08:00
Yarden Tal	25eec6f327	hexagon: slight optimization for argosrt output init (#21463 )	2026-04-05 18:30:25 -07:00
anchortense	58190cc84d	llama : correct platform-independent loading of BOOL metadata (#21428 ) * model-loader : fix GGUF bool array conversion * model-loader : fix remaining GGUF bool pointer uses	2026-04-06 01:40:38 +02:00
Richard Davison	af76639f72	model : add HunyuanOCR support (#21395 ) * HunyuanOCR: add support for text and vision models - Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge - Add separate HUNYUAN_OCR chat template (content-before-role format) - Handle HunyuanOCR's invalid pad_token_id=-1 in converter - Fix EOS/EOT token IDs from generation_config.json - Support xdrope RoPE scaling type - Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.) - Register HunYuanVLForConditionalGeneration for both text and mmproj conversion * fix proper mapping * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * address comments * update * Fix typecheck * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-05 23:32:14 +02:00
Ludovic Henry	761797ffdf	ci : use default RISE RISC-V Runners (#21263 )	2026-04-05 20:29:48 +02:00
Concedo	63ca37e62a	fix assistant prefill logic (+1 squashed commits) Squashed commits: [f4963baf5] fix prefills	2026-04-05 23:25:44 +08:00
ddh0	5d3a4a7da5	server : fix logging of build + system info (#21460 ) This PR changes the logging that occurs at startup of llama-server. Currently, it is redundant (including CPU information twice) and it is missing the build + commit info.	2026-04-05 16:14:02 +02:00
Concedo	53b3bf46e4	fixed a typo	2026-04-05 18:46:30 +08:00
Concedo	9b1f1bbf35	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-vulkan.yml # .github/workflows/docker.yml # embd_res/templates/google-gemma-4-31B-it-interleaved.jinja # embd_res/templates/google-gemma-4-31B-it.jinja # tests/test-chat.cpp	2026-04-05 18:46:23 +08:00
Concedo	e555b16549	updated lite better gemma handling	2026-04-05 18:34:22 +08:00
Concedo	49941b6268	handle think streaming for gemma4	2026-04-05 13:48:07 +08:00
Concedo	dc2e6ca2e3	fix header path	2026-04-05 11:02:08 +08:00
Concedo	13e932b241	more fixes for gemma4	2026-04-05 10:34:40 +08:00
M1DNYT3	c08d28d088	ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438 ) Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>	2026-04-05 09:04:00 +08:00
Nicholas Sparks	661e9acb36	ci: fix vulkan workflow referencing non-existent action (#21442 )	2026-04-05 08:59:51 +08:00
Aldehir Rojas	b8635075ff	common : add gemma 4 specialized parser (#21418 ) * common : add gemma4 dedicated parser * cont : add '<\|tool_response>' as eog * cont : emit JSON from Gemma4 tool call AST * cont : more fixes * cont : refactor convert function * cont : refine rules and mapping * cont : add more tests * cont : clean up * cont : remove autoparser gemma4 implementation * cont : more cleanup * cont : rename gemma4.jinja to match the others * cont : add custom template to support interleaved thinking * cont : preserve reasoning in model turns * cont : fix initializer error * cont : fix unused vars * cont : fix accidental static * cont : fix specialized_template signature * fix extra semicolon * remove debug line and extra space [no ci]	2026-04-04 20:39:00 +02:00
Eso	11bc83229a	fix: Autoswap with override configs (#2091 ) * fix: Autoswap with overrides * fix: Autoswap with overrides	2026-04-05 00:43:19 +08:00
Concedo	376aaf258c	Merge branch 'upstream' into concedo_experimental	2026-04-04 23:56:02 +08:00
Concedo	6c937c05d9	improve ncmoe / moecpu regex	2026-04-04 23:53:13 +08:00
Concedo	f7c9029668	change env var KOBOLDCPP_PASSWORD to KCPP_PASSWORD names for consistency, same for KOBOLDCPP_ADMINPASSWORD to KCPP_ADMINPASSWORD	2026-04-04 23:36:30 +08:00
Concedo	db8bc40731	add some warnings if shifting fails	2026-04-04 23:16:26 +08:00
Concedo	d3d50a7b3c	fixed reasoning content response in fakestreaming tools	2026-04-04 23:03:33 +08:00
Concedo	ac92ac22d7	tool call fix	2026-04-04 22:35:03 +08:00
Concedo	eb3422996a	BOS fix for gemma4	2026-04-04 22:15:01 +08:00
Dan Hoffman	9c699074c9	server: Fix undefined timing measurement errors in server context (#21201 ) Co-authored-by: Dan Hoffman <dhoffman@cyket.net>	2026-04-04 22:11:19 +08:00
Adrien Gallouët	d01f6274c0	common : respect specified tag, only fallback when tag is empty (#21413 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-04-04 15:08:03 +02:00
SamareshSingh	650bf14eb9	llama-model: read final_logit_softcapping for Gemma 4 (#21390 )	2026-04-04 13:05:10 +02:00
Aman Gupta	b7ad48ebda	llama: add custom newline split for Gemma 4 (#21406 )	2026-04-04 15:06:34 +08:00
Concedo	2e4f94822e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-self-hosted.yml # .github/workflows/docker.yml # ci/run.sh # docs/build.md # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # src/llama-vocab.cpp # tests/test-chat.cpp # tests/test-jinja.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README.md	2026-04-04 14:27:23 +08:00
Concedo	235ec9a1b9	updated lite	2026-04-04 14:24:05 +08:00
Concedo	a33eda3842	more template fixes for the gemma4 31b	2026-04-04 14:23:16 +08:00
Concedo	1c834fcbd3	try to match template more closely (+2 squashed commit) Squashed commit: [466808010] try to match template more closely [9f805e753] try to match template more closely	2026-04-04 13:50:04 +08:00
Reese Levine	d006858316	ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278 ) Some checks failed Python Type-Check / python type-check (push) Has been cancelled Details * Work towards removing bitcast * Move rest of existing types over * Add timeout back to wait and remove synchronous set_tensor/memset_tensor * move to unpackf16 for wider compatibility * cleanup * Remove deadlock condition in free_bufs * Start work on removing parameter buffer pools * Simplify and optimize further * simplify profile futures * Fix stride * Try using a single command buffer per batch * formatting	2026-04-03 11:40:14 -07:00
Masato Nakasaka	e439700992	ci: Add Windows Vulkan backend testing on Intel (#21292 ) * experimenting CI * Experimenting CI fix for MinGW * experimenting CI on Windows * modified script for integration with VisualStudio * added proxy handling * adding python version for Windows execution * fix iterator::end() dereference * fixed proxy handling * Fix errors occurring on Windows * fixed ci script * Reverted to master * Stripping test items to simplify Windows test * adjusting script for windows testing * Changed shell * Fixed shell * Fixed shell * Fix CI setting * Fix CI setting * Fix CI setting * Experimenting ci fix * Experimenting ci fix * Experimenting ci fix * Experimenting ci fix * experimenting fix for unit test error * Changed to use BUILD_LOW_PERF to skip python tests * Fix CI * Added option to specify Ninja generator * Reverted proxy related changes	2026-04-03 20:16:44 +03:00
Yes You Can Have Your Own	50e0ad08fb	server: save and clear idle slots on new task (`--clear-idle`) (#20993 ) * server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE) * server: move idle slot KV clearing to slot release The save "cost" is now paid by the finishing request. * server: add --kv-clear-idle flag, enable by default * server: skip clearing last idle slot, clear on launch * server: test --no-kv-clear-idle flag * server: simplify on-release clearing loop * server: remove on-release KV clearing, keep launch-only * cont : clean-up * tests: update log strings after --clear-idle rename * tests: use debug tags instead of log message matching * test: fix Windows CI by dropping temp log file unlink --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-03 19:02:27 +02:00
Piotr Wilkin (ilintar)	f1f793ad06	common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230 ) * Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers * Rename * Update common/chat-auto-parser-generator.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-03 17:51:52 +02:00
Samanvya Tripathi	af5c13841f	common : fix tool call type detection for nullable and enum schemas (#21327 ) * common : fix tool call type detection for nullable and enum schemas * common, tests : fix grammar delegation for nullable/enum schemas and add tests Fix enum type inference to scan all enum values (not just index 0) so schemas like {"enum": [0, "celsius"]} correctly detect string type. Fix schema_delegates in peg-parser to handle nullable type arrays (["string", "null"]) and typeless enum schemas in raw mode, allowing the tagged parser to use raw text instead of JSON-formatted strings. Add test cases for Qwen3-Coder (TAG_WITH_TAGGED format): - nullable string ["string", "null"] - nullable string with null first ["null", "string"] - nullable integer ["integer", "null"] - enum without explicit type key	2026-04-03 17:51:23 +02:00
M1DNYT3	277ff5fff7	docker : bump cuda12 to 12.9.1 (#20920 ) Some checks failed Python Type-Check / python type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan> Co-authored-by: CISC <CISC@users.noreply.github.com>	2026-04-03 15:06:45 +02:00
jeromew	384c0076bc	docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331 ) The `HSA_OVERRIDE_GFX_VERSION` variable can be used in ROCm to override an unsupported target architecture with a similar but supported target architecture. This does not and has never worked on Windows. I think the clarification could avoid driving Windows people towards this solution that does not work.	2026-04-03 21:05:14 +08:00
Sigbjørn Skjæret	1f34806c44	jinja: coerce input for string-specific filters (#21370 )	2026-04-03 15:03:33 +02:00
Aaron Teo	887535c33f	ci: add more binary checks (#21349 )	2026-04-03 20:50:00 +08:00
Piotr Wilkin (ilintar)	d3416a4aa9	fix: remove stale assert (#21369 )	2026-04-03 13:40:41 +02:00
Concedo	784e193fbb	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build.yml # .github/workflows/hip-quality-check.yml # docs/backend/ZenDNN.md # docs/ops.md # docs/ops/ZenDNN.csv # ggml/src/ggml-zendnn/CMakeLists.txt # ggml/src/ggml-zendnn/ggml-zendnn.cpp	2026-04-03 19:04:57 +08:00

1 2 3 4 5 ...

12630 commits