Concedo
2905c6254f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .pi/gg/SYSTEM.md
# docs/speculative.md
# ggml/src/ggml-virtgpu/virtgpu-shm.cpp
# ggml/src/ggml-virtgpu/virtgpu.cpp
# ggml/src/ggml-virtgpu/virtgpu.h
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-05-04 15:36:13 +08:00
Aldehir Rojas
e48034dfc9
common : determine generation prompt using longest common prefix ( #22657 )
2026-05-04 00:18:23 +02:00
Concedo
340b22283e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/workflows/build-android.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# examples/model-conversion/scripts/causal/convert-model.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/libggml-htp.inf
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/mmvq.hpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
# scripts/server-test-structured.py
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# scripts/snapdragon/qdc/requirements.txt
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-completion.ps1
# scripts/snapdragon/windows/run-mtmd.ps1
# scripts/snapdragon/windows/run-tool.ps1
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# ty.toml
2026-04-25 12:13:14 +08:00
Tarek Dakhran
550d684bd1
server: Enable transcriptions API for LFM2-Audio ( #22000 )
2026-04-23 10:47:26 +02:00
Concedo
0755f27372
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/openvino.Dockerfile
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# common/chat.cpp
# docs/backend/OPENVINO.md
# examples/speculative-simple/speculative-simple.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/libggml-htp.inf
# ggml/src/ggml-openvino/ggml-decoder.cpp
# ggml/src/ggml-openvino/ggml-openvino-extra.cpp
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-openvino/ggml-quants.cpp
# ggml/src/ggml-openvino/openvino/op/rope.cpp
# ggml/src/ggml-openvino/openvino/op_table.cpp
# ggml/src/ggml-openvino/openvino/op_table.h
# ggml/src/ggml-openvino/openvino/translate_session.cpp
# ggml/src/ggml-openvino/openvino/utils.cpp
# ggml/src/ggml-openvino/openvino/utils.h
# ggml/src/ggml-openvino/utils.cpp
# ggml/src/ggml-openvino/utils.h
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/mtmd/CMakeLists.txt
# tools/server/CMakeLists.txt
2026-04-23 00:55:05 +08:00
Piotr Wilkin (ilintar)
134d6e54d4
common/chat, server: refactor, move all conversion functions to common, add tests ( #20690 )
...
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Concedo
cd6788007e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cross.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/release.yml
# examples/llama.android/lib/src/main/cpp/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/test-chat.cpp
# tests/test-mtmd-c-api.c
# tools/server/README.md
2026-04-20 20:19:11 +08:00
Sascha Rogmann
455d8e4be8
server : speculative checkpointing ( #19493 )
...
* server : speculative decoding using checkpoints
* server : fix draft check with checkpoints
* server : rename spec vars
* server : log levels
* server : refactored spec logic to speculative.cpp
* server : renamed spec checkpoints option
* server : fix spec checkpoints, logging
* speculative : checkpoints with draft model, logging
* server : n_tokens_cur and create_checkpoint in draft
* server : fix server_speculative_callback (slot.id)
* spec : fix ngram-map/begin idx_last_check
* spec : init ckpt (begin() wasn't called)
* chore: update webui build output
* server : restore sampler in spec checkpoint and clear mem
* cont : avoid --spec-use-checkpoints argument
* cont : remove server_prompt_checkpoint_with_size
* spec : rename (leave_draft_state)
* cont : clean-up
* cont : do not ignore partial drafts even if the are short
* cont : spec callback owned by session
* cont : simplify
* cont : avoid empty speculative session
* cont : simplify
* cont : simplify
* cont : enable mtmd speculative decoding
* cont : keep the spec sampler alive
* cont : simplify
* cont : fix nullptr deref + draft checkpoints
* cont : remove common_speculative_accept_response
* cont : remove callback
* cont : simplify
* cont : minor
* cont : simplify
* cont : fix accepted number
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-19 10:24:06 +03:00
Concedo
236ae27329
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/close-issue.yml
# docs/multimodal.md
# embd_res/templates/deepseek-ai-DeepSeek-V3.2.jinja
# ggml/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# tests/peg-parser/test-gbnf-generation.cpp
# tests/test-chat.cpp
2026-04-14 21:01:41 +08:00
Aldehir Rojas
e21cdc11a0
common/gemma4 : handle parsing edge cases ( #21760 )
2026-04-13 18:18:18 -05:00
Piotr Wilkin (ilintar)
1c0d9081fd
chat: dedicated DeepSeek v3.2 parser + "official" template ( #21785 )
2026-04-13 22:23:53 +02:00
Concedo
4c860ae4ae
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/download.cpp
# docs/backend/OPENVINO.md
# docs/backend/snapdragon/CMakeUserPresets.json
# docs/backend/snapdragon/README.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/argsort-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-hexagon/htp/cumsum-ops.c
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hmx-ops.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/repeat-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-hexagon/htp/sum-rows-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
# models/templates/google-gemma-4-31B-it-interleaved.jinja
# models/templates/google-gemma-4-31B-it.jinja
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-tool.sh
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-mtmd.ps1
# scripts/snapdragon/windows/run-tool.ps1
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2026-04-11 11:19:32 +08:00
Concedo
8b90bfe094
Merge commit ' 4ef9301e4d' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# docs/multimodal.md
# embd_res/ggml-vocab-gemma-4.gguf
# embd_res/ggml-vocab-gemma-4.gguf.inp
# embd_res/ggml-vocab-gemma-4.gguf.out
# ggml/src/ggml-sycl/fattn-tile.cpp
# ggml/src/ggml-sycl/fattn-tile.hpp
# ggml/src/ggml-sycl/fattn-vec.hpp
# ggml/src/ggml-sycl/fattn.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp
# tests/CMakeLists.txt
# tests/test-jinja.cpp
# tools/mtmd/CMakeLists.txt
2026-04-11 09:38:50 +08:00
Galunid
b136b62cf9
fix: Fix broken structured output when using $refs in json_schema ( #21699 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
2026-04-10 18:26:36 -05:00
Aldehir Rojas
3fc65063d9
common : better align to the updated official gemma4 template ( #21704 )
2026-04-10 16:12:53 -05:00
Berk Idem
d7ff074c87
common : enable reasoning budget sampler for gemma4 ( #21697 )
...
* fix: enable reasoning budget sampler for gemma4
Add thinking_start_tag and thinking_end_tag to
common_chat_params_init_gemma4(). Without these, the reasoning
budget sampler never activates for gemma4.
Make the newline after "thought" optional in the PEG parser to
handle budget=0 (sampler forces end tag before the newline).
Add test case for empty thinking block.
Fixes #21487
* use p.space() instead of p.optional(p.literal("\n")) in gemma4 thought parser
2026-04-10 11:49:14 +02:00
Aldehir Rojas
ddf03c6d9a
common : fix ambiguous grammar rule in gemma4 ( #21661 )
...
* common : fix ambiguous grammar rule in gemma4
* cont : fix missing comma...
2026-04-09 12:25:07 +02:00
Concedo
c82c0b463a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/release.yml
# examples/debug/debug.cpp
# ggml/src/ggml-cuda/common.cuh
# ggml/src/ggml-cuda/mmq.cuh
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
2026-04-09 17:45:04 +08:00
Piotr Wilkin (ilintar)
85d482e6b6
parser: fix MiniMax handling ( #21573 )
2026-04-08 12:47:25 +02:00
Concedo
9b1f1bbf35
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-vulkan.yml
# .github/workflows/docker.yml
# embd_res/templates/google-gemma-4-31B-it-interleaved.jinja
# embd_res/templates/google-gemma-4-31B-it.jinja
# tests/test-chat.cpp
2026-04-05 18:46:23 +08:00
Aldehir Rojas
b8635075ff
common : add gemma 4 specialized parser ( #21418 )
...
* common : add gemma4 dedicated parser
* cont : add '<|tool_response>' as eog
* cont : emit JSON from Gemma4 tool call AST
* cont : more fixes
* cont : refactor convert function
* cont : refine rules and mapping
* cont : add more tests
* cont : clean up
* cont : remove autoparser gemma4 implementation
* cont : more cleanup
* cont : rename gemma4.jinja to match the others
* cont : add custom template to support interleaved thinking
* cont : preserve reasoning in model turns
* cont : fix initializer error
* cont : fix unused vars
* cont : fix accidental static
* cont : fix specialized_template signature
* fix extra semicolon
* remove debug line and extra space [no ci]
2026-04-04 20:39:00 +02:00
Concedo
2e4f94822e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/docker.yml
# ci/run.sh
# docs/build.md
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# src/llama-vocab.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-04-04 14:27:23 +08:00
Piotr Wilkin (ilintar)
f1f793ad06
common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers ( #21230 )
...
* Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers
* Rename
* Update common/chat-auto-parser-generator.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-03 17:51:52 +02:00
Concedo
8fa87621d1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# common/chat.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
2026-04-03 16:36:41 +08:00
Georgi Gerganov
57ace0d612
chat : avoid including json in chat.h ( #21306 )
2026-04-03 09:07:59 +03:00
Concedo
34ad53e950
merged support for gemma4. the e2b, e4b and 26b work, the 31b does not
2026-04-03 11:07:46 +08:00
Piotr Wilkin (ilintar)
5208e2d5ba
fix: gemma 4 template ( #21326 )
2026-04-02 23:31:02 +02:00
Concedo
5dee1a1cbb
Merge commit ' fbd441c379' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# AGENTS.md
# ci/run.sh
# docs/build.md
# embd_res/templates/LFM2.5-Instruct.jinja
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-div.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# scripts/hip/gcn-cdna-vgpr-check.py
# scripts/sync-ggml.last
# tests/test-chat.cpp
2026-04-03 01:06:02 +08:00
Xuan-Son Nguyen
63f8fe0ef4
model, mtmd: fix gguf conversion for audio/vision mmproj ( #21309 )
...
* fix gguf conversion for audio/vision mmproj
* fix test
2026-04-02 17:10:32 +02:00
Aldehir Rojas
223373742b
common : add commentary rules for gpt-oss-20b ( #21286 )
2026-04-02 08:59:59 -05:00
Piotr Wilkin (ilintar)
e15efe007d
Relax prefill parser to allow space. ( #21240 )
...
* Relax prefill parser to allow space.
* Move changes from prefix() to parser generation
* Only allow spaces if we're not having a pure content parser next
2026-04-02 11:29:11 +02:00
Jonathan
1d6d4cf7a5
fix: tool call parsing for LFM2 and LFM2.5 models ( #21242 )
...
* fix: tool call parsing for LFM2 and LFM2.5 models'
* refactor: add test / break out lfm2 and lfm2.5 parsing logic
2026-04-01 16:22:44 +02:00
Concedo
31aa072da1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# examples/batched/batched.cpp
# examples/debug/debug.cpp
# examples/eval-callback/eval-callback.cpp
# examples/idle/idle.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# examples/training/finetune.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/fattn-tile.hpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/cpy.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
# ggml/src/ggml-webgpu/wgsl-shaders/rope.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/soft_max.wgsl
# scripts/sync-ggml.last
# tests/export-graph-ops.cpp
# tests/test-chat.cpp
# tests/test-state-restore-fragmented.cpp
# tests/test-thread-safety.cpp
# tools/batched-bench/batched-bench.cpp
# tools/cli/cli.cpp
# tools/cvector-generator/cvector-generator.cpp
# tools/export-lora/export-lora.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
# tools/results/results.cpp
# tools/server/CMakeLists.txt
2026-04-01 10:54:13 +08:00
Aldehir Rojas
624733d631
common : gpt-oss handle builtin and unsolicited tool calls ( #21213 )
2026-03-31 13:52:42 +02:00
lainon1
0b6ff47996
fix: correct misspellings in code comments ( #21217 )
...
- emdeddings → embeddings (gemma3.cpp, gemma3n-iswa.cpp,
gemma-embedding.cpp)
- imlpemented → implemented (llama-adapter.cpp)
- interere → interfere (llama-graph.cpp)
- overridde → overridden (chat.cpp)
- stastistics → statistics (ngram-map.h)
- layed → laid (llama-kv-cache.h)
- worster → worst (llama-context.cpp)
- sequantial → sequential (llama-batch.h)
2026-03-31 13:50:51 +02:00
Concedo
42ad89cd86
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/nix/package.nix
# .github/workflows/build-android.yml
# .github/workflows/build-cann.yml
# .github/workflows/build-msys.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# CMakeLists.txt
# docs/backend/CANN.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync_vendor.py
# tests/test-chat-auto-parser.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-reasoning-budget.cpp
# tools/cli/cli.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2026-03-30 20:45:38 +08:00
Aldehir Rojas
e6f2ec01ff
common : add reasoning_format = none support to gpt-oss ( #21094 )
2026-03-28 09:33:39 -05:00
Concedo
c00fe0af5a
Merge commit ' 9f102a1407' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/pull_request_template.md
# CODEOWNERS
# README.md
# common/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-dump.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss ( #20912 )
2026-03-23 22:21:47 -05:00
Concedo
6054bacadd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# CONTRIBUTING.md
# docs/autoparser.md
# docs/ops.md
# docs/ops/Metal.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hip/CMakeLists.txt
# models/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/llama-cpp-deepseek-r1.jinja
# models/templates/meetkai-functionary-medium-v3.1.jinja
# scripts/fetch_server_test_models.py
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/server/README.md
2026-03-21 12:06:01 +08:00
Concedo
98f099aecc
Merge commit ' c1258830b2' into concedo_experimental
...
# Conflicts:
# docs/docker.md
# docs/ops.md
# docs/ops/WebGPU.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
James O'Leary
c46583b86b
common/parser : fix out_of_range crash in throw path ( #20424 regression) ( #20777 )
...
* chat : fix out_of_range crash in throw path (#20424 regression)
#20424 introduced effective_input = generation_prompt + input, but the
throw path uses input.substr(result.end) where result.end is a position
within effective_input. Every thinking model with a non-empty
generation_prompt crashes with std::out_of_range instead of the intended
error message.
Test crashes on unpatched master, passes with fix:
cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat
* Update test-chat.cpp
* Update test-chat.cpp
* Update test-chat.cpp
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-20 02:37:22 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading ( #20424 )
...
* Implement proper prefill extraction
* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp
* Update tools/server/server-task.cpp
* refactor: move grammars to variant, remove grammar_external, handle exception internally
* Make code less C++y
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
Aldehir Rojas
1b9bbaa357
common : fix gpt-oss content removal ( #20745 )
2026-03-19 11:40:39 +01:00
Concedo
48f914e374
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/arch/riscv/repack.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-exp.h
# ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync-ggml.last
# tests/test-backend-sampler.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser ( #20393 )
...
* common : rework gpt-oss parser
* cont : fix gpt-oss tests
* cont : add structured output test
* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. ( #20289 )
...
* Add `--force-pure-content` to force a pure content parser.
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Change parameter name [no ci]
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Concedo
f31b040941
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/build-self-hosted.yml
# benches/nemotron/nemotron-dgx-spark.md
# docs/ops.md
# docs/ops/SYCL.csv
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00
Aldehir Rojas
1bbec6a75d
jinja : add capability check for object args ( #20612 )
2026-03-16 17:43:14 +01:00
Concedo
b1c500ae2b
Merge commit ' 2948e6049a' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# docs/backend/VirtGPU/development.md
# docs/ops.md
# docs/ops/WebGPU.csv
# embd_res/templates/GigaChat3-10B-A1.8B.jinja
# embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00