Commit graph

1149 commits

Author SHA1 Message Date
Concedo
3ec6381123 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/build.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/gguf-publish.yml
#	ci/run.sh
#	docs/backend/OPENVINO.md
#	examples/llama.android/lib/src/main/cpp/ai_chat.cpp
#	ggml/src/ggml-sycl/add-id.cpp
#	requirements/requirements-pydantic.txt
#	tests/test-gguf.cpp
#	tests/test-jinja.cpp
#	tests/test-llama-archs.cpp
#	tools/gguf-split/README.md
#	tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Concedo
633222d2e3 fix tool builds 2026-03-26 15:15:58 +08:00
Concedo
c00fe0af5a Merge commit '9f102a1407' into concedo_experimental
# Conflicts:
#	.devops/intel.Dockerfile
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/pull_request_template.md
#	CODEOWNERS
#	README.md
#	common/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/hex-dma.c
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/hex-dump.h
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/ssm-conv.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/sync_vendor.py
#	tests/test-backend-ops.cpp
#	tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
8a6c41dc5c Merge commit '841bc203e2' into concedo_experimental
# Conflicts:
#	.github/workflows/ai-issues.yml
#	embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-musa/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-openvino/ggml-openvino.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-chat-auto-parser.cpp
#	tests/test-jinja.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-03-25 22:49:53 +08:00
Xuan-Son Nguyen
914eb5ff0c
jinja: fix macro with kwargs (#20960)
* jinja: fix macro with kwargs

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix newline problem

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 12:22:48 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info (#20946)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration (#20935)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
Adrien Gallouët
8c7957ca33
common : add standard Hugging Face cache support (#20775)
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) 2026-03-23 22:21:47 -05:00
Jhen-Jie Hong
7a0b6a635e
common/autoparser : detect reasoning markers when enable_thinking changes system prompt (#20859) 2026-03-23 08:35:27 +01:00
Sigbjørn Skjæret
23c9182ce8
jinja : refactor token advancement (#20864)
* refactor token advancement

* exercise sub-expressions
2026-03-22 17:45:10 +01:00
Concedo
ef854f002e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/python-type-check.yml
#	AGENTS.md
#	CONTRIBUTING.md
#	examples/model-conversion/scripts/embedding/run-original-model.py
#	examples/model-conversion/scripts/utils/compare_tokens.py
#	examples/pydantic_models_to_grammar.py
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	pyrightconfig.json
#	scripts/compare-llama-bench.py
#	scripts/jinja/jinja-tester.py
#	scripts/server-bench.py
#	tests/test-grammar-integration.cpp
#	tests/test-grammar-parser.cpp
#	tests/test-llama-grammar.cpp
#	tests/test-tokenizer-random.py
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/server/README.md
2026-03-22 23:39:13 +08:00
ddh0
3306dbaef7
misc : prefer ggml-org models in docs and examples (#20827)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* misc : prefer ggml-org models in docs and examples

Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.

* remove accidentally committed file
2026-03-21 22:00:26 +01:00
Concedo
6054bacadd Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/ai-issues.yml
#	CONTRIBUTING.md
#	docs/autoparser.md
#	docs/ops.md
#	docs/ops/Metal.csv
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/hex-utils.h
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp_iface.idl
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hip/CMakeLists.txt
#	models/templates/Apriel-1.6-15b-Thinker-fixed.jinja
#	models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
#	models/templates/deepseek-ai-DeepSeek-V3.1.jinja
#	models/templates/llama-cpp-deepseek-r1.jinja
#	models/templates/meetkai-functionary-medium-v3.1.jinja
#	scripts/fetch_server_test_models.py
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-completion.sh
#	scripts/snapdragon/adb/run-mtmd.sh
#	scripts/snapdragon/adb/run-tool.sh
#	tests/test-chat-auto-parser.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/server/README.md
2026-03-21 12:06:01 +08:00
Concedo
98f099aecc Merge commit 'c1258830b2' into concedo_experimental
# Conflicts:
#	docs/docker.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
Piotr Wilkin (ilintar)
b1c70e2e54
common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Update Operations Documentation / update-ops-docs (push) Has been cancelled
2026-03-21 00:19:04 +01:00
James O'Leary
149b2493c0
common : fix typo in debug log ('extracft' -> 'extract') (#20807) 2026-03-20 18:23:18 +01:00
Ruikai Peng
21c8045214
jinja : fix heap OOB read in value equality comparison (#20782)
Address GHSA-q9j6-4hhc-rq9p and GHSA-2q4c-9gq5-5vfp.

The three-iterator overload of std::equal in value_array_t::equivalent()
and value_object_t::equivalent() reads past the end of the shorter
container when comparing arrays or objects of different lengths.

Use the four-iterator overload (C++14) which checks both range lengths.

Found-by: Pwno
2026-03-20 07:15:17 +01:00
James O'Leary
c46583b86b
common/parser : fix out_of_range crash in throw path (#20424 regression) (#20777)
* chat : fix out_of_range crash in throw path (#20424 regression)

#20424 introduced effective_input = generation_prompt + input, but the
throw path uses input.substr(result.end) where result.end is a position
within effective_input. Every thinking model with a non-empty
generation_prompt crashes with std::out_of_range instead of the intended
error message.

Test crashes on unpatched master, passes with fix:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat

* Update test-chat.cpp

* Update test-chat.cpp

* Update test-chat.cpp

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-20 02:37:22 +01:00
James O'Leary
76f2dc70c3
chat : handle tool calls with no required args in TAG_WITH_TAGGED format (#20764)
* chat : handle tool calls with no required args in TAG_WITH_TAGGED format

* Update tests/test-chat.cpp [no ci]

Co-authored-by: Aldehir Rojas <hello@alde.dev>

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-03-19 17:53:11 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading (#20424)
* Implement proper prefill extraction

* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp

* Update tools/server/server-task.cpp

* refactor: move grammars to variant, remove grammar_external, handle exception internally

* Make code less C++y

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
ddh0
922b90e567
common : add LLAMA_ARG_SPEC_TYPE (#20744) 2026-03-19 16:16:55 +01:00
Aldehir Rojas
1b9bbaa357
common : fix gpt-oss content removal (#20745) 2026-03-19 11:40:39 +01:00
Concedo
48f914e374 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ci/run.sh
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cpu/arch/riscv/repack.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-exp.h
#	ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync-ggml.last
#	tests/test-backend-sampler.cpp
#	tests/test-chat.cpp
#	tests/test-jinja.cpp
#	tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
Pop Flamingo
312cf03328
llama : re-enable manual LoRA adapter free (#19983)
* Re-enable manual LoRA adapter free

* Remove stale "all adapters must be loaded before context creation" stale comments
2026-03-18 12:03:26 +02:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser (#20393)
* common : rework gpt-oss parser

* cont : fix gpt-oss tests

* cont : add structured output test

* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. (#20289)
* Add `--force-pure-content` to force a pure content parser.

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Change parameter name [no ci]

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Concedo
f31b040941 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	.github/workflows/build-self-hosted.yml
#	benches/nemotron/nemotron-dgx-spark.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml.last
#	tests/test-jinja.cpp
#	tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00
Concedo
9084527b36 Merge commit '67a2209fab' into concedo_experimental
# Conflicts:
#	.github/workflows/build-cache.yml
#	.github/workflows/build-cross.yml
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/build.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/release.yml
#	.github/workflows/server-self-hosted.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	CODEOWNERS
#	ggml/src/ggml-sycl/gated_delta_net.cpp
#	scripts/sync_vendor.py
#	tools/cli/cli.cpp
2026-03-17 11:11:25 +08:00
Aldehir Rojas
1bbec6a75d
jinja : add capability check for object args (#20612) 2026-03-16 17:43:14 +01:00
Masato Nakasaka
d3936498a3
common : fix iterator::end() dereference (#20445) 2026-03-16 08:50:38 +02:00
Eric Hsieh
559646472d
fix: prevent nullptr dereference (#20552) 2026-03-15 16:51:49 +01:00
Concedo
f3d2f58fa8 note: smartcache is broken for rnn currently 2026-03-15 11:31:47 +08:00
Concedo
b1c500ae2b Merge commit '2948e6049a' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CONTRIBUTING.md
#	docs/backend/VirtGPU/development.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/GigaChat3-10B-A1.8B.jinja
#	embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00
Concedo
67c9798d0b Merge commit '3ca19b0e9f' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	common/CMakeLists.txt
#	common/chat-peg-parser.cpp
#	docs/backend/SYCL.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/norm.cpp
#	ggml/src/ggml-sycl/rope.cpp
#	ggml/src/ggml-sycl/rope.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
#	scripts/compare-llama-bench.py
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tools/cli/cli.cpp
2026-03-15 11:11:31 +08:00
Concedo
1802b09e6f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/build.md
#	docs/ops.md
#	docs/ops/CPU.csv
#	ggml/src/ggml-cpu/kleidiai/kernels.cpp
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-cpu/repack.h
#	src/llama-quant.cpp
#	tests/test-json-schema-to-grammar.cpp
2026-03-14 17:56:16 +08:00
Concedo
ff3f8533d3 Merge commit 'c96f608d98' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	docs/ops.md
#	docs/ops/Vulkan.csv
#	models/templates/LFM2-8B-A1B.jinja
#	tests/peg-parser/test-python-dict-parser.cpp
#	tests/peg-parser/test-unicode.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat.cpp
#	tools/llama-bench/llama-bench.cpp
2026-03-14 17:14:34 +08:00
Piotr Wilkin (ilintar)
1430c35948
common/parser: gracefully handle undetected tool parser, print error message. (#20286) 2026-03-13 20:56:10 +01:00
Concedo
04915d99ee Merge commit '451ef08432' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	docs/ops.md
#	docs/ops/Vulkan.csv
#	src/llama-model-loader.cpp
#	src/llama-model.cpp
#	src/llama.cpp
#	tests/CMakeLists.txt
#	tests/peg-parser/test-basic.cpp
#	tests/peg-parser/test-json-parser.cpp
#	tests/peg-parser/test-python-dict-parser.cpp
#	tests/peg-parser/test-unicode.cpp
#	tests/test-chat-auto-parser.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat.cpp
#	tools/CMakeLists.txt
2026-03-13 23:33:37 +08:00
Concedo
d2c911884d Merge commit '213c4a0b81' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	common/CMakeLists.txt
#	common/chat-peg-parser.cpp
#	common/chat.cpp
#	docs/backend/SYCL.md
#	docs/development/parsing.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja
#	embd_res/templates/Bielik-11B-v3.0-Instruct.jinja
#	embd_res/templates/GLM-4.7-Flash.jinja
#	embd_res/templates/LFM2-8B-A1B.jinja
#	embd_res/templates/StepFun3.5-Flash.jinja
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/convert.hpp
#	ggml/src/ggml-sycl/count-equal.cpp
#	ggml/src/ggml-sycl/dpct/helper.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/presets.hpp
#	ggml/src/ggml-sycl/softmax.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	models/templates/Apertus-8B-Instruct.jinja
#	models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
#	models/templates/Qwen-QwQ-32B.jinja
#	models/templates/Qwen3-Coder.jinja
#	models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
#	models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
#	models/templates/deepseek-ai-DeepSeek-V3.1.jinja
#	models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
#	models/templates/moonshotai-Kimi-K2.jinja
#	models/templates/unsloth-Apriel-1.5.jinja
#	tests/CMakeLists.txt
#	tests/peg-parser/test-basic.cpp
#	tests/peg-parser/tests.h
#	tests/test-backend-ops.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-peg-parser.cpp
#	tools/CMakeLists.txt
#	tools/cli/cli.cpp
2026-03-13 21:35:56 +08:00
Ruben Ortlam
128142fe7d
test-backend-ops: allow loading tests from file and parsing model operators into file (#19896)
* tests: allow loading test-backend-ops tests from json

* add error threshold based on op

* add error when file cannot be read

* add graph operator json extraction tool

* add nb parameter for non-contiguous input tensors

* fix view check

* only use view if non-contiguous/permuted, use C++ random instead of rand()

* replace internal API calls with public llama_graph_reserve call

* reduce test description length

* fix nb[0] not getting set for view

* add name to tests

* fix inplace error

* use text file instead of json

* move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/

* fix missing declaration

* use pragma once

* fix indent

* fix Windows build
2026-03-12 13:26:00 +01:00
Daniel Bevenius
6de1bc631d
common : update completion executables list [no ci] (#19934)
This commit updates the bash completion executables list, adding missing
executables and removing some that non longer exist.
2026-03-12 12:12:01 +01:00
Mishusha
a8304b4d27
common/parser: add GigaChatV3/3.1 models support (#19931)
Co-authored-by: Mishusha <pmv26021975@gmail.com>
2026-03-12 01:22:25 +01:00
ddh0
4a748b8f15
common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (#20416) 2026-03-12 00:13:28 +01:00
Aldehir Rojas
b5fe4559ae
common/parser: use nlohmann::ordered_json to preserve parameter order (#20385) 2026-03-11 10:26:51 +01:00
Piotr Wilkin (ilintar)
acb7c79069
common/parser: handle reasoning budget (#20297)
* v1

* Finished!

* Handlie cli

* Reasoning sampler

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Less explosive terminology :)

* Add utf-8 case and tests

* common : migrate reasoning budget sampler to common

* cont : clean up

* cont : expose state and allow passing as initial state

* cont : remove unused imports

* cont : update state machine doc string

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Piotr Wilkin (ilintar)
6c770d16ca
Reduce level of content parser warning message to avoid log spam on non-debug verbosity (#20347) 2026-03-10 15:21:51 +01:00
Concedo
6adcd0b5db Merge commit '34df42f7be' into concedo_experimental
# Conflicts:
#	README.md
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/act-ops.c
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/cpy-ops.c
#	ggml/src/ggml-hexagon/htp/get-rows-ops.c
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-arith.h
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/set-rows-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	tests/test-backend-ops.cpp
#	tools/cli/cli.cpp
#	tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6 Merge commit '2cd20b72ed' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/README.md
#	docs/backend/snapdragon/windows.md
#	docs/build.md
#	docs/multimodal/MobileVLM.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	examples/debug/README.md
#	examples/llama.vim
#	examples/model-conversion/README.md
#	examples/sycl/README.md
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp-drv.cpp
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-copy.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cpy.cl
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/quants.hpp
#	ggml/src/ggml-sycl/softmax.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	scripts/server-bench.py
#	scripts/snapdragon/windows/run-cli.ps1
#	tests/test-alloc.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/completion/README.md
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/README.md
#	tools/perplexity/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Sigbjørn Skjæret
ec947d2b16
common : fix incorrect uses of stoul (#20313) 2026-03-10 11:40:26 +01:00