Concedo
a395af65db
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-riscv.yml
# .github/workflows/build.yml
# ggml/src/ggml-hexagon/htp/argsort-ops.c
# ggml/src/ggml-sycl/fattn-tile.hpp
# tools/mtmd/CMakeLists.txt
2026-04-06 20:56:02 +08:00
Concedo
82cc19e055
calculate some fields before autofit for more accurate estimate
2026-04-06 20:44:37 +08:00
Concedo
f6e712d919
universal gemma4 fix, add memory check
2026-04-06 19:20:44 +08:00
Georgi Gerganov
400ac8e194
convert : set "add bos" == True for Gemma 4 ( #21500 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
* convert : set "add bos" == True for Gemma 4
* cont : handle old GGUFs
2026-04-06 13:52:07 +03:00
Concedo
a309086735
Revert "increase debug mode truncation limit"
...
This reverts commit 59f863746d .
2026-04-06 18:51:12 +08:00
henk717
4e30294cb1
Henk's Gemma4 31B Magic ( #2096 )
2026-04-06 18:49:19 +08:00
Neo Zhang
f51fd36d79
sycl : handle other FA case ( #21377 )
2026-04-06 13:28:00 +03:00
Concedo
59f863746d
increase debug mode truncation limit
2026-04-06 17:57:44 +08:00
Yarden Tal
25eec6f327
hexagon: slight optimization for argosrt output init ( #21463 )
2026-04-05 18:30:25 -07:00
anchortense
58190cc84d
llama : correct platform-independent loading of BOOL metadata ( #21428 )
...
* model-loader : fix GGUF bool array conversion
* model-loader : fix remaining GGUF bool pointer uses
2026-04-06 01:40:38 +02:00
Richard Davison
af76639f72
model : add HunyuanOCR support ( #21395 )
...
* HunyuanOCR: add support for text and vision models
- Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge
- Add separate HUNYUAN_OCR chat template (content-before-role format)
- Handle HunyuanOCR's invalid pad_token_id=-1 in converter
- Fix EOS/EOT token IDs from generation_config.json
- Support xdrope RoPE scaling type
- Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.)
- Register HunYuanVLForConditionalGeneration for both text and mmproj conversion
* fix proper mapping
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* address comments
* update
* Fix typecheck
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-05 23:32:14 +02:00
Ludovic Henry
761797ffdf
ci : use default RISE RISC-V Runners ( #21263 )
2026-04-05 20:29:48 +02:00
Concedo
63ca37e62a
fix assistant prefill logic (+1 squashed commits)
...
Squashed commits:
[f4963baf5] fix prefills
2026-04-05 23:25:44 +08:00
ddh0
5d3a4a7da5
server : fix logging of build + system info ( #21460 )
...
This PR changes the logging that occurs at startup of llama-server.
Currently, it is redundant (including CPU information twice) and it is
missing the build + commit info.
2026-04-05 16:14:02 +02:00
Concedo
53b3bf46e4
fixed a typo
2026-04-05 18:46:30 +08:00
Concedo
9b1f1bbf35
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-vulkan.yml
# .github/workflows/docker.yml
# embd_res/templates/google-gemma-4-31B-it-interleaved.jinja
# embd_res/templates/google-gemma-4-31B-it.jinja
# tests/test-chat.cpp
2026-04-05 18:46:23 +08:00
Concedo
e555b16549
updated lite better gemma handling
2026-04-05 18:34:22 +08:00
Concedo
49941b6268
handle think streaming for gemma4
2026-04-05 13:48:07 +08:00
Concedo
dc2e6ca2e3
fix header path
2026-04-05 11:02:08 +08:00
Concedo
13e932b241
more fixes for gemma4
2026-04-05 10:34:40 +08:00
M1DNYT3
c08d28d088
ci: lower cuda12 floor to 12.8.1 for broader host compatibility ( #21438 )
...
Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>
2026-04-05 09:04:00 +08:00
Nicholas Sparks
661e9acb36
ci: fix vulkan workflow referencing non-existent action ( #21442 )
2026-04-05 08:59:51 +08:00
Aldehir Rojas
b8635075ff
common : add gemma 4 specialized parser ( #21418 )
...
* common : add gemma4 dedicated parser
* cont : add '<|tool_response>' as eog
* cont : emit JSON from Gemma4 tool call AST
* cont : more fixes
* cont : refactor convert function
* cont : refine rules and mapping
* cont : add more tests
* cont : clean up
* cont : remove autoparser gemma4 implementation
* cont : more cleanup
* cont : rename gemma4.jinja to match the others
* cont : add custom template to support interleaved thinking
* cont : preserve reasoning in model turns
* cont : fix initializer error
* cont : fix unused vars
* cont : fix accidental static
* cont : fix specialized_template signature
* fix extra semicolon
* remove debug line and extra space [no ci]
2026-04-04 20:39:00 +02:00
Eso
11bc83229a
fix: Autoswap with override configs ( #2091 )
...
* fix: Autoswap with overrides
* fix: Autoswap with overrides
2026-04-05 00:43:19 +08:00
Concedo
376aaf258c
Merge branch 'upstream' into concedo_experimental
2026-04-04 23:56:02 +08:00
Concedo
6c937c05d9
improve ncmoe / moecpu regex
2026-04-04 23:53:13 +08:00
Concedo
f7c9029668
change env var KOBOLDCPP_PASSWORD to KCPP_PASSWORD names for consistency, same for KOBOLDCPP_ADMINPASSWORD to KCPP_ADMINPASSWORD
2026-04-04 23:36:30 +08:00
Concedo
db8bc40731
add some warnings if shifting fails
2026-04-04 23:16:26 +08:00
Concedo
d3d50a7b3c
fixed reasoning content response in fakestreaming tools
2026-04-04 23:03:33 +08:00
Concedo
ac92ac22d7
tool call fix
2026-04-04 22:35:03 +08:00
Concedo
eb3422996a
BOS fix for gemma4
2026-04-04 22:15:01 +08:00
Dan Hoffman
9c699074c9
server: Fix undefined timing measurement errors in server context ( #21201 )
...
Co-authored-by: Dan Hoffman <dhoffman@cyket.net>
2026-04-04 22:11:19 +08:00
Adrien Gallouët
d01f6274c0
common : respect specified tag, only fallback when tag is empty ( #21413 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-04-04 15:08:03 +02:00
SamareshSingh
650bf14eb9
llama-model: read final_logit_softcapping for Gemma 4 ( #21390 )
2026-04-04 13:05:10 +02:00
Aman Gupta
b7ad48ebda
llama: add custom newline split for Gemma 4 ( #21406 )
2026-04-04 15:06:34 +08:00
Concedo
2e4f94822e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/docker.yml
# ci/run.sh
# docs/build.md
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# src/llama-vocab.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-04-04 14:27:23 +08:00
Concedo
235ec9a1b9
updated lite
2026-04-04 14:24:05 +08:00
Concedo
a33eda3842
more template fixes for the gemma4 31b
2026-04-04 14:23:16 +08:00
Concedo
1c834fcbd3
try to match template more closely (+2 squashed commit)
...
Squashed commit:
[466808010] try to match template more closely
[9f805e753] try to match template more closely
2026-04-04 13:50:04 +08:00
Reese Levine
d006858316
ggml-webgpu: move from parameter buffer pool to single buffer with offsets ( #21278 )
...
Python Type-Check / python type-check (push) Has been cancelled
* Work towards removing bitcast
* Move rest of existing types over
* Add timeout back to wait and remove synchronous set_tensor/memset_tensor
* move to unpackf16 for wider compatibility
* cleanup
* Remove deadlock condition in free_bufs
* Start work on removing parameter buffer pools
* Simplify and optimize further
* simplify profile futures
* Fix stride
* Try using a single command buffer per batch
* formatting
2026-04-03 11:40:14 -07:00
Masato Nakasaka
e439700992
ci: Add Windows Vulkan backend testing on Intel ( #21292 )
...
* experimenting CI
* Experimenting CI fix for MinGW
* experimenting CI on Windows
* modified script for integration with VisualStudio
* added proxy handling
* adding python version for Windows execution
* fix iterator::end() dereference
* fixed proxy handling
* Fix errors occurring on Windows
* fixed ci script
* Reverted to master
* Stripping test items to simplify Windows test
* adjusting script for windows testing
* Changed shell
* Fixed shell
* Fixed shell
* Fix CI setting
* Fix CI setting
* Fix CI setting
* Experimenting ci fix
* Experimenting ci fix
* Experimenting ci fix
* Experimenting ci fix
* experimenting fix for unit test error
* Changed to use BUILD_LOW_PERF to skip python tests
* Fix CI
* Added option to specify Ninja generator
* Reverted proxy related changes
2026-04-03 20:16:44 +03:00
Yes You Can Have Your Own
50e0ad08fb
server: save and clear idle slots on new task (--clear-idle) ( #20993 )
...
* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)
* server: move idle slot KV clearing to slot release
The save "cost" is now paid by the finishing request.
* server: add --kv-clear-idle flag, enable by default
* server: skip clearing last idle slot, clear on launch
* server: test --no-kv-clear-idle flag
* server: simplify on-release clearing loop
* server: remove on-release KV clearing, keep launch-only
* cont : clean-up
* tests: update log strings after --clear-idle rename
* tests: use debug tags instead of log message matching
* test: fix Windows CI by dropping temp log file unlink
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-03 19:02:27 +02:00
Piotr Wilkin (ilintar)
f1f793ad06
common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers ( #21230 )
...
* Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers
* Rename
* Update common/chat-auto-parser-generator.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 17:51:52 +02:00
Samanvya Tripathi
af5c13841f
common : fix tool call type detection for nullable and enum schemas ( #21327 )
...
* common : fix tool call type detection for nullable and enum schemas
* common, tests : fix grammar delegation for nullable/enum schemas and add tests
Fix enum type inference to scan all enum values (not just index 0) so
schemas like {"enum": [0, "celsius"]} correctly detect string type.
Fix schema_delegates in peg-parser to handle nullable type arrays
(["string", "null"]) and typeless enum schemas in raw mode, allowing
the tagged parser to use raw text instead of JSON-formatted strings.
Add test cases for Qwen3-Coder (TAG_WITH_TAGGED format):
- nullable string ["string", "null"]
- nullable string with null first ["null", "string"]
- nullable integer ["integer", "null"]
- enum without explicit type key
2026-04-03 17:51:23 +02:00
M1DNYT3
277ff5fff7
docker : bump cuda12 to 12.9.1 ( #20920 )
...
Python Type-Check / python type-check (push) Waiting to run
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Update Operations Documentation / update-ops-docs (push) Has been cancelled
Co-authored-by: M1DNYT3 <m1dnyt3@MacBookPro.lan>
Co-authored-by: CISC <CISC@users.noreply.github.com>
2026-04-03 15:06:45 +02:00
jeromew
384c0076bc
docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification ( #21331 )
...
The `HSA_OVERRIDE_GFX_VERSION` variable can be used in ROCm to override an unsupported target architecture with a similar but supported target architecture.
This does not and has never worked on Windows. I think the clarification could avoid driving Windows people towards this solution that does not work.
2026-04-03 21:05:14 +08:00
Sigbjørn Skjæret
1f34806c44
jinja: coerce input for string-specific filters ( #21370 )
2026-04-03 15:03:33 +02:00
Aaron Teo
887535c33f
ci: add more binary checks ( #21349 )
2026-04-03 20:50:00 +08:00
Piotr Wilkin (ilintar)
d3416a4aa9
fix: remove stale assert ( #21369 )
2026-04-03 13:40:41 +02:00
Concedo
784e193fbb
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/workflows/build.yml
# .github/workflows/hip-quality-check.yml
# docs/backend/ZenDNN.md
# docs/ops.md
# docs/ops/ZenDNN.csv
# ggml/src/ggml-zendnn/CMakeLists.txt
# ggml/src/ggml-zendnn/ggml-zendnn.cpp
2026-04-03 19:04:57 +08:00