Commit graph

186 commits

Author SHA1 Message Date
Concedo
7240da764a Merge commit '935a340292' into concedo_experimental
# Conflicts:
#	examples/diffusion/CMakeLists.txt
#	scripts/server-test-function-call.py
#	src/llama-model.cpp
#	src/models/gemma4.cpp
#	tests/test-chat.cpp
#	tests/test-reasoning-budget.cpp
#	tools/server/README.md
2026-05-06 21:02:25 +08:00
Piotr Wilkin (ilintar)
a4701c98f7
common/autoparser: fixes for newline handling / forced tool calls (#22654)
* chat/autoparser: the fixes

* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.

* Trim whitespace on apply instead
2026-05-04 13:18:11 +02:00
Concedo
2905c6254f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.pi/gg/SYSTEM.md
#	docs/speculative.md
#	ggml/src/ggml-virtgpu/virtgpu-shm.cpp
#	ggml/src/ggml-virtgpu/virtgpu.cpp
#	ggml/src/ggml-virtgpu/virtgpu.h
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-05-04 15:36:13 +08:00
Aldehir Rojas
e48034dfc9
common : determine generation prompt using longest common prefix (#22657) 2026-05-04 00:18:23 +02:00
Concedo
340b22283e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/intel.Dockerfile
#	.github/workflows/build-android.yml
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.gitignore
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/README.md
#	examples/model-conversion/scripts/causal/convert-model.sh
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/hex-utils.h
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/htp_iface.idl
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/libggml-htp.inf
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/mmvq.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
#	scripts/server-test-structured.py
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-completion.sh
#	scripts/snapdragon/adb/run-mtmd.sh
#	scripts/snapdragon/adb/run-tool.sh
#	scripts/snapdragon/qdc/requirements.txt
#	scripts/snapdragon/windows/run-bench.ps1
#	scripts/snapdragon/windows/run-cli.ps1
#	scripts/snapdragon/windows/run-completion.ps1
#	scripts/snapdragon/windows/run-mtmd.ps1
#	scripts/snapdragon/windows/run-tool.ps1
#	tests/test-backend-ops.cpp
#	tools/cli/cli.cpp
#	ty.toml
2026-04-25 12:13:14 +08:00
Tarek Dakhran
550d684bd1
server: Enable transcriptions API for LFM2-Audio (#22000) 2026-04-23 10:47:26 +02:00
Concedo
0755f27372 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/openvino.Dockerfile
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/build.yml
#	common/chat.cpp
#	docs/backend/OPENVINO.md
#	examples/speculative-simple/speculative-simple.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/libggml-htp.inf
#	ggml/src/ggml-openvino/ggml-decoder.cpp
#	ggml/src/ggml-openvino/ggml-openvino-extra.cpp
#	ggml/src/ggml-openvino/ggml-openvino.cpp
#	ggml/src/ggml-openvino/ggml-quants.cpp
#	ggml/src/ggml-openvino/openvino/op/rope.cpp
#	ggml/src/ggml-openvino/openvino/op_table.cpp
#	ggml/src/ggml-openvino/openvino/op_table.h
#	ggml/src/ggml-openvino/openvino/translate_session.cpp
#	ggml/src/ggml-openvino/openvino/utils.cpp
#	ggml/src/ggml-openvino/openvino/utils.h
#	ggml/src/ggml-openvino/utils.cpp
#	ggml/src/ggml-openvino/utils.h
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/convert.hpp
#	ggml/src/ggml-sycl/gemm.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/set_rows.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/server/CMakeLists.txt
2026-04-23 00:55:05 +08:00
Piotr Wilkin (ilintar)
134d6e54d4
common/chat, server: refactor, move all conversion functions to common, add tests (#20690)
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Concedo
cd6788007e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-cross.yml
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/release.yml
#	examples/llama.android/lib/src/main/cpp/CMakeLists.txt
#	ggml/CMakeLists.txt
#	ggml/src/ggml-rpc/CMakeLists.txt
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync_vendor.py
#	tests/test-chat.cpp
#	tests/test-mtmd-c-api.c
#	tools/server/README.md
2026-04-20 20:19:11 +08:00
Sascha Rogmann
455d8e4be8
server : speculative checkpointing (#19493)
* server : speculative decoding using checkpoints

* server : fix draft check with checkpoints

* server : rename spec vars

* server : log levels

* server : refactored spec logic to speculative.cpp

* server : renamed spec checkpoints option

* server : fix spec checkpoints, logging

* speculative : checkpoints with draft model, logging

* server : n_tokens_cur and create_checkpoint in draft

* server : fix server_speculative_callback (slot.id)

* spec : fix ngram-map/begin idx_last_check

* spec : init ckpt (begin() wasn't called)

* chore: update webui build output

* server : restore sampler in spec checkpoint and clear mem

* cont : avoid --spec-use-checkpoints argument

* cont : remove server_prompt_checkpoint_with_size

* spec : rename (leave_draft_state)

* cont : clean-up

* cont : do not ignore partial drafts even if the are short

* cont : spec callback owned by session

* cont : simplify

* cont : avoid empty speculative session

* cont : simplify

* cont : simplify

* cont : enable mtmd speculative decoding

* cont : keep the spec sampler alive

* cont : simplify

* cont : fix nullptr deref + draft checkpoints

* cont : remove common_speculative_accept_response

* cont : remove callback

* cont : simplify

* cont : minor

* cont : simplify

* cont : fix accepted number

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-19 10:24:06 +03:00
Concedo
236ae27329 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/close-issue.yml
#	docs/multimodal.md
#	embd_res/templates/deepseek-ai-DeepSeek-V3.2.jinja
#	ggml/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
#	tests/peg-parser/test-gbnf-generation.cpp
#	tests/test-chat.cpp
2026-04-14 21:01:41 +08:00
Aldehir Rojas
e21cdc11a0
common/gemma4 : handle parsing edge cases (#21760) 2026-04-13 18:18:18 -05:00
Piotr Wilkin (ilintar)
1c0d9081fd
chat: dedicated DeepSeek v3.2 parser + "official" template (#21785) 2026-04-13 22:23:53 +02:00
Concedo
4c860ae4ae Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	common/download.cpp
#	docs/backend/OPENVINO.md
#	docs/backend/snapdragon/CMakeUserPresets.json
#	docs/backend/snapdragon/README.md
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/act-ops.c
#	ggml/src/ggml-hexagon/htp/argsort-ops.c
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/cpy-ops.c
#	ggml/src/ggml-hexagon/htp/cumsum-ops.c
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/get-rows-ops.c
#	ggml/src/ggml-hexagon/htp/hex-utils.h
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/hmx-ops.h
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/htp_iface.idl
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/repeat-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/set-rows-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/ssm-conv.c
#	ggml/src/ggml-hexagon/htp/sum-rows-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
#	models/templates/google-gemma-4-31B-it-interleaved.jinja
#	models/templates/google-gemma-4-31B-it.jinja
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-completion.sh
#	scripts/snapdragon/adb/run-tool.sh
#	scripts/snapdragon/windows/run-bench.ps1
#	scripts/snapdragon/windows/run-cli.ps1
#	scripts/snapdragon/windows/run-mtmd.ps1
#	scripts/snapdragon/windows/run-tool.ps1
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/llama-bench/llama-bench.cpp
2026-04-11 11:19:32 +08:00
Concedo
8b90bfe094 Merge commit '4ef9301e4d' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	docs/multimodal.md
#	embd_res/ggml-vocab-gemma-4.gguf
#	embd_res/ggml-vocab-gemma-4.gguf.inp
#	embd_res/ggml-vocab-gemma-4.gguf.out
#	ggml/src/ggml-sycl/fattn-tile.cpp
#	ggml/src/ggml-sycl/fattn-tile.hpp
#	ggml/src/ggml-sycl/fattn-vec.hpp
#	ggml/src/ggml-sycl/fattn.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp
#	ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp
#	tests/CMakeLists.txt
#	tests/test-jinja.cpp
#	tools/mtmd/CMakeLists.txt
2026-04-11 09:38:50 +08:00
Galunid
b136b62cf9
fix: Fix broken structured output when using $refs in json_schema (#21699)
Some checks are pending
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
2026-04-10 18:26:36 -05:00
Aldehir Rojas
3fc65063d9
common : better align to the updated official gemma4 template (#21704) 2026-04-10 16:12:53 -05:00
Berk Idem
d7ff074c87
common : enable reasoning budget sampler for gemma4 (#21697)
* fix: enable reasoning budget sampler for gemma4

Add thinking_start_tag and thinking_end_tag to
common_chat_params_init_gemma4(). Without these, the reasoning
budget sampler never activates for gemma4.

Make the newline after "thought" optional in the PEG parser to
handle budget=0 (sampler forces end tag before the newline).

Add test case for empty thinking block.

Fixes #21487

* use p.space() instead of p.optional(p.literal("\n")) in gemma4 thought parser
2026-04-10 11:49:14 +02:00
Aldehir Rojas
ddf03c6d9a
common : fix ambiguous grammar rule in gemma4 (#21661)
* common : fix ambiguous grammar rule in gemma4

* cont : fix missing comma...
2026-04-09 12:25:07 +02:00
Concedo
c82c0b463a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	.github/workflows/release.yml
#	examples/debug/debug.cpp
#	ggml/src/ggml-cuda/common.cuh
#	ggml/src/ggml-cuda/mmq.cuh
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tools/mtmd/CMakeLists.txt
2026-04-09 17:45:04 +08:00
Piotr Wilkin (ilintar)
85d482e6b6
parser: fix MiniMax handling (#21573) 2026-04-08 12:47:25 +02:00
Concedo
9b1f1bbf35 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-vulkan.yml
#	.github/workflows/docker.yml
#	embd_res/templates/google-gemma-4-31B-it-interleaved.jinja
#	embd_res/templates/google-gemma-4-31B-it.jinja
#	tests/test-chat.cpp
2026-04-05 18:46:23 +08:00
Aldehir Rojas
b8635075ff
common : add gemma 4 specialized parser (#21418)
* common : add gemma4 dedicated parser

* cont : add '<|tool_response>' as eog

* cont : emit JSON from Gemma4 tool call AST

* cont : more fixes

* cont : refactor convert function

* cont : refine rules and mapping

* cont : add more tests

* cont : clean up

* cont : remove autoparser gemma4 implementation

* cont : more cleanup

* cont : rename gemma4.jinja to match the others

* cont : add custom template to support interleaved thinking

* cont : preserve reasoning in model turns

* cont : fix initializer error

* cont : fix unused vars

* cont : fix accidental static

* cont : fix specialized_template signature

* fix extra semicolon

* remove debug line and extra space [no ci]
2026-04-04 20:39:00 +02:00
Concedo
2e4f94822e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/docker.yml
#	ci/run.sh
#	docs/build.md
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	src/llama-vocab.cpp
#	tests/test-chat.cpp
#	tests/test-jinja.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-04-04 14:27:23 +08:00
Piotr Wilkin (ilintar)
f1f793ad06
common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230)
* Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers

* Rename

* Update common/chat-auto-parser-generator.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 17:51:52 +02:00
Concedo
8fa87621d1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	common/chat.cpp
#	ggml/src/ggml-rpc/ggml-rpc.cpp
2026-04-03 16:36:41 +08:00
Georgi Gerganov
57ace0d612
chat : avoid including json in chat.h (#21306) 2026-04-03 09:07:59 +03:00
Concedo
34ad53e950 merged support for gemma4. the e2b, e4b and 26b work, the 31b does not 2026-04-03 11:07:46 +08:00
Piotr Wilkin (ilintar)
5208e2d5ba
fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00
Concedo
5dee1a1cbb Merge commit 'fbd441c379' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	AGENTS.md
#	ci/run.sh
#	docs/build.md
#	embd_res/templates/LFM2.5-Instruct.jinja
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cuda/fattn.cu
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-div.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/dequantize.hpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
#	scripts/hip/gcn-cdna-vgpr-check.py
#	scripts/sync-ggml.last
#	tests/test-chat.cpp
2026-04-03 01:06:02 +08:00
Xuan-Son Nguyen
63f8fe0ef4
model, mtmd: fix gguf conversion for audio/vision mmproj (#21309)
* fix gguf conversion for audio/vision mmproj

* fix test
2026-04-02 17:10:32 +02:00
Aldehir Rojas
223373742b
common : add commentary rules for gpt-oss-20b (#21286) 2026-04-02 08:59:59 -05:00
Piotr Wilkin (ilintar)
e15efe007d
Relax prefill parser to allow space. (#21240)
* Relax prefill parser to allow space.

* Move changes from prefix() to parser generation

* Only allow spaces if we're not having a pure content parser next
2026-04-02 11:29:11 +02:00
Jonathan
1d6d4cf7a5
fix: tool call parsing for LFM2 and LFM2.5 models (#21242)
* fix: tool call parsing for LFM2 and LFM2.5 models'

* refactor: add test / break out lfm2 and lfm2.5 parsing logic
2026-04-01 16:22:44 +02:00
Concedo
31aa072da1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.gitignore
#	examples/batched/batched.cpp
#	examples/debug/debug.cpp
#	examples/eval-callback/eval-callback.cpp
#	examples/idle/idle.cpp
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup-create.cpp
#	examples/lookup/lookup-stats.cpp
#	examples/lookup/lookup.cpp
#	examples/parallel/parallel.cpp
#	examples/passkey/passkey.cpp
#	examples/retrieval/retrieval.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	examples/training/finetune.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-sycl/fattn-tile.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/cpy.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
#	ggml/src/ggml-webgpu/wgsl-shaders/rope.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/soft_max.wgsl
#	scripts/sync-ggml.last
#	tests/export-graph-ops.cpp
#	tests/test-chat.cpp
#	tests/test-state-restore-fragmented.cpp
#	tests/test-thread-safety.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/cli/cli.cpp
#	tools/cvector-generator/cvector-generator.cpp
#	tools/export-lora/export-lora.cpp
#	tools/imatrix/imatrix.cpp
#	tools/perplexity/perplexity.cpp
#	tools/results/results.cpp
#	tools/server/CMakeLists.txt
2026-04-01 10:54:13 +08:00
Aldehir Rojas
624733d631
common : gpt-oss handle builtin and unsolicited tool calls (#21213) 2026-03-31 13:52:42 +02:00
lainon1
0b6ff47996
fix: correct misspellings in code comments (#21217)
- emdeddings → embeddings (gemma3.cpp, gemma3n-iswa.cpp,
gemma-embedding.cpp)
- imlpemented → implemented (llama-adapter.cpp)
- interere → interfere (llama-graph.cpp)
- overridde → overridden (chat.cpp)
- stastistics → statistics (ngram-map.h)
- layed → laid (llama-kv-cache.h)
- worster → worst (llama-context.cpp)
- sequantial → sequential (llama-batch.h)
2026-03-31 13:50:51 +02:00
Concedo
42ad89cd86 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/llama-cli-cann.Dockerfile
#	.devops/nix/package.nix
#	.github/workflows/build-android.yml
#	.github/workflows/build-cann.yml
#	.github/workflows/build-msys.yml
#	.github/workflows/docker.yml
#	.github/workflows/editorconfig.yml
#	.github/workflows/gguf-publish.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/release.yml
#	CMakeLists.txt
#	docs/backend/CANN.md
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	scripts/sync_vendor.py
#	tests/test-chat-auto-parser.cpp
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-reasoning-budget.cpp
#	tools/cli/cli.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
2026-03-30 20:45:38 +08:00
Aldehir Rojas
e6f2ec01ff
common : add reasoning_format = none support to gpt-oss (#21094) 2026-03-28 09:33:39 -05:00
Concedo
c00fe0af5a Merge commit '9f102a1407' into concedo_experimental
# Conflicts:
#	.devops/intel.Dockerfile
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/pull_request_template.md
#	CODEOWNERS
#	README.md
#	common/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/hex-dma.c
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/hex-dump.h
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/ssm-conv.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/sync_vendor.py
#	tests/test-backend-ops.cpp
#	tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) 2026-03-23 22:21:47 -05:00
Concedo
6054bacadd Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/ai-issues.yml
#	CONTRIBUTING.md
#	docs/autoparser.md
#	docs/ops.md
#	docs/ops/Metal.csv
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/hex-utils.h
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp_iface.idl
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hip/CMakeLists.txt
#	models/templates/Apriel-1.6-15b-Thinker-fixed.jinja
#	models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
#	models/templates/deepseek-ai-DeepSeek-V3.1.jinja
#	models/templates/llama-cpp-deepseek-r1.jinja
#	models/templates/meetkai-functionary-medium-v3.1.jinja
#	scripts/fetch_server_test_models.py
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-completion.sh
#	scripts/snapdragon/adb/run-mtmd.sh
#	scripts/snapdragon/adb/run-tool.sh
#	tests/test-chat-auto-parser.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/server/README.md
2026-03-21 12:06:01 +08:00
Concedo
98f099aecc Merge commit 'c1258830b2' into concedo_experimental
# Conflicts:
#	docs/docker.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
James O'Leary
c46583b86b
common/parser : fix out_of_range crash in throw path (#20424 regression) (#20777)
* chat : fix out_of_range crash in throw path (#20424 regression)

#20424 introduced effective_input = generation_prompt + input, but the
throw path uses input.substr(result.end) where result.end is a position
within effective_input. Every thinking model with a non-empty
generation_prompt crashes with std::out_of_range instead of the intended
error message.

Test crashes on unpatched master, passes with fix:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat

* Update test-chat.cpp

* Update test-chat.cpp

* Update test-chat.cpp

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-20 02:37:22 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading (#20424)
* Implement proper prefill extraction

* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp

* Update tools/server/server-task.cpp

* refactor: move grammars to variant, remove grammar_external, handle exception internally

* Make code less C++y

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
Aldehir Rojas
1b9bbaa357
common : fix gpt-oss content removal (#20745) 2026-03-19 11:40:39 +01:00
Concedo
48f914e374 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ci/run.sh
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cpu/arch/riscv/repack.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-exp.h
#	ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync-ggml.last
#	tests/test-backend-sampler.cpp
#	tests/test-chat.cpp
#	tests/test-jinja.cpp
#	tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser (#20393)
* common : rework gpt-oss parser

* cont : fix gpt-oss tests

* cont : add structured output test

* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. (#20289)
* Add `--force-pure-content` to force a pure content parser.

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Change parameter name [no ci]

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Concedo
f31b040941 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	.github/workflows/build-self-hosted.yml
#	benches/nemotron/nemotron-dgx-spark.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml.last
#	tests/test-jinja.cpp
#	tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00