Concedo
3ec6381123
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/gguf-publish.yml
# ci/run.sh
# docs/backend/OPENVINO.md
# examples/llama.android/lib/src/main/cpp/ai_chat.cpp
# ggml/src/ggml-sycl/add-id.cpp
# requirements/requirements-pydantic.txt
# tests/test-gguf.cpp
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
# tools/gguf-split/README.md
# tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Concedo
633222d2e3
fix tool builds
2026-03-26 15:15:58 +08:00
Concedo
c00fe0af5a
Merge commit ' 9f102a1407' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/pull_request_template.md
# CODEOWNERS
# README.md
# common/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-dump.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
8a6c41dc5c
Merge commit ' 841bc203e2' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-03-25 22:49:53 +08:00
Xuan-Son Nguyen
914eb5ff0c
jinja: fix macro with kwargs ( #20960 )
...
* jinja: fix macro with kwargs
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix newline problem
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 12:22:48 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info ( #20946 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration ( #20935 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
Adrien Gallouët
8c7957ca33
common : add standard Hugging Face cache support ( #20775 )
...
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss ( #20912 )
2026-03-23 22:21:47 -05:00
Jhen-Jie Hong
7a0b6a635e
common/autoparser : detect reasoning markers when enable_thinking changes system prompt ( #20859 )
2026-03-23 08:35:27 +01:00
Sigbjørn Skjæret
23c9182ce8
jinja : refactor token advancement ( #20864 )
...
* refactor token advancement
* exercise sub-expressions
2026-03-22 17:45:10 +01:00
Concedo
ef854f002e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/python-type-check.yml
# AGENTS.md
# CONTRIBUTING.md
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/compare_tokens.py
# examples/pydantic_models_to_grammar.py
# ggml/src/ggml-rpc/ggml-rpc.cpp
# pyrightconfig.json
# scripts/compare-llama-bench.py
# scripts/jinja/jinja-tester.py
# scripts/server-bench.py
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-llama-grammar.cpp
# tests/test-tokenizer-random.py
# tools/cli/README.md
# tools/completion/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2026-03-22 23:39:13 +08:00
ddh0
3306dbaef7
misc : prefer ggml-org models in docs and examples ( #20827 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* misc : prefer ggml-org models in docs and examples
Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.
* remove accidentally committed file
2026-03-21 22:00:26 +01:00
Concedo
6054bacadd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# CONTRIBUTING.md
# docs/autoparser.md
# docs/ops.md
# docs/ops/Metal.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hip/CMakeLists.txt
# models/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/llama-cpp-deepseek-r1.jinja
# models/templates/meetkai-functionary-medium-v3.1.jinja
# scripts/fetch_server_test_models.py
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/server/README.md
2026-03-21 12:06:01 +08:00
Concedo
98f099aecc
Merge commit ' c1258830b2' into concedo_experimental
...
# Conflicts:
# docs/docker.md
# docs/ops.md
# docs/ops/WebGPU.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
Piotr Wilkin (ilintar)
b1c70e2e54
common/parser: fix nasty bug causing subtle corruption of generation prompt ( #20825 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Update Operations Documentation / update-ops-docs (push) Has been cancelled
2026-03-21 00:19:04 +01:00
James O'Leary
149b2493c0
common : fix typo in debug log ('extracft' -> 'extract') ( #20807 )
2026-03-20 18:23:18 +01:00
Ruikai Peng
21c8045214
jinja : fix heap OOB read in value equality comparison ( #20782 )
...
Address GHSA-q9j6-4hhc-rq9p and GHSA-2q4c-9gq5-5vfp.
The three-iterator overload of std::equal in value_array_t::equivalent()
and value_object_t::equivalent() reads past the end of the shorter
container when comparing arrays or objects of different lengths.
Use the four-iterator overload (C++14) which checks both range lengths.
Found-by: Pwno
2026-03-20 07:15:17 +01:00
James O'Leary
c46583b86b
common/parser : fix out_of_range crash in throw path ( #20424 regression) ( #20777 )
...
* chat : fix out_of_range crash in throw path (#20424 regression)
#20424 introduced effective_input = generation_prompt + input, but the
throw path uses input.substr(result.end) where result.end is a position
within effective_input. Every thinking model with a non-empty
generation_prompt crashes with std::out_of_range instead of the intended
error message.
Test crashes on unpatched master, passes with fix:
cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat
* Update test-chat.cpp
* Update test-chat.cpp
* Update test-chat.cpp
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-20 02:37:22 +01:00
James O'Leary
76f2dc70c3
chat : handle tool calls with no required args in TAG_WITH_TAGGED format ( #20764 )
...
* chat : handle tool calls with no required args in TAG_WITH_TAGGED format
* Update tests/test-chat.cpp [no ci]
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-03-19 17:53:11 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading ( #20424 )
...
* Implement proper prefill extraction
* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp
* Update tools/server/server-task.cpp
* refactor: move grammars to variant, remove grammar_external, handle exception internally
* Make code less C++y
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
ddh0
922b90e567
common : add LLAMA_ARG_SPEC_TYPE ( #20744 )
2026-03-19 16:16:55 +01:00
Aldehir Rojas
1b9bbaa357
common : fix gpt-oss content removal ( #20745 )
2026-03-19 11:40:39 +01:00
Concedo
48f914e374
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/arch/riscv/repack.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-exp.h
# ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync-ggml.last
# tests/test-backend-sampler.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
Pop Flamingo
312cf03328
llama : re-enable manual LoRA adapter free ( #19983 )
...
* Re-enable manual LoRA adapter free
* Remove stale "all adapters must be loaded before context creation" stale comments
2026-03-18 12:03:26 +02:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser ( #20393 )
...
* common : rework gpt-oss parser
* cont : fix gpt-oss tests
* cont : add structured output test
* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. ( #20289 )
...
* Add `--force-pure-content` to force a pure content parser.
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Change parameter name [no ci]
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Concedo
f31b040941
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/build-self-hosted.yml
# benches/nemotron/nemotron-dgx-spark.md
# docs/ops.md
# docs/ops/SYCL.csv
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00
Concedo
9084527b36
Merge commit ' 67a2209fab' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cache.yml
# .github/workflows/build-cross.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# .github/workflows/server-self-hosted.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# CODEOWNERS
# ggml/src/ggml-sycl/gated_delta_net.cpp
# scripts/sync_vendor.py
# tools/cli/cli.cpp
2026-03-17 11:11:25 +08:00
Aldehir Rojas
1bbec6a75d
jinja : add capability check for object args ( #20612 )
2026-03-16 17:43:14 +01:00
Masato Nakasaka
d3936498a3
common : fix iterator::end() dereference ( #20445 )
2026-03-16 08:50:38 +02:00
Eric Hsieh
559646472d
fix: prevent nullptr dereference ( #20552 )
2026-03-15 16:51:49 +01:00
Concedo
f3d2f58fa8
note: smartcache is broken for rnn currently
2026-03-15 11:31:47 +08:00
Concedo
b1c500ae2b
Merge commit ' 2948e6049a' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# docs/backend/VirtGPU/development.md
# docs/ops.md
# docs/ops/WebGPU.csv
# embd_res/templates/GigaChat3-10B-A1.8B.jinja
# embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00
Concedo
67c9798d0b
Merge commit ' 3ca19b0e9f' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# common/CMakeLists.txt
# common/chat-peg-parser.cpp
# docs/backend/SYCL.md
# docs/ops.md
# docs/ops/SYCL.csv
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/norm.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/rope.hpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# scripts/compare-llama-bench.py
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tools/cli/cli.cpp
2026-03-15 11:11:31 +08:00
Concedo
1802b09e6f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/build.md
# docs/ops.md
# docs/ops/CPU.csv
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# src/llama-quant.cpp
# tests/test-json-schema-to-grammar.cpp
2026-03-14 17:56:16 +08:00
Concedo
ff3f8533d3
Merge commit ' c96f608d98' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/ops.md
# docs/ops/Vulkan.csv
# models/templates/LFM2-8B-A1B.jinja
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-14 17:14:34 +08:00
Piotr Wilkin (ilintar)
1430c35948
common/parser: gracefully handle undetected tool parser, print error message. ( #20286 )
2026-03-13 20:56:10 +01:00
Concedo
04915d99ee
Merge commit ' 451ef08432' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/ops.md
# docs/ops/Vulkan.csv
# src/llama-model-loader.cpp
# src/llama-model.cpp
# src/llama.cpp
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/test-json-parser.cpp
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/CMakeLists.txt
2026-03-13 23:33:37 +08:00
Concedo
d2c911884d
Merge commit ' 213c4a0b81' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# common/chat-peg-parser.cpp
# common/chat.cpp
# docs/backend/SYCL.md
# docs/development/parsing.md
# docs/ops.md
# docs/ops/SYCL.csv
# embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# embd_res/templates/Bielik-11B-v3.0-Instruct.jinja
# embd_res/templates/GLM-4.7-Flash.jinja
# embd_res/templates/LFM2-8B-A1B.jinja
# embd_res/templates/StepFun3.5-Flash.jinja
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/count-equal.cpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/presets.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# models/templates/Apertus-8B-Instruct.jinja
# models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
# models/templates/Qwen-QwQ-32B.jinja
# models/templates/Qwen3-Coder.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
# models/templates/moonshotai-Kimi-K2.jinja
# models/templates/unsloth-Apriel-1.5.jinja
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/tests.h
# tests/test-backend-ops.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-peg-parser.cpp
# tools/CMakeLists.txt
# tools/cli/cli.cpp
2026-03-13 21:35:56 +08:00
Ruben Ortlam
128142fe7d
test-backend-ops: allow loading tests from file and parsing model operators into file ( #19896 )
...
* tests: allow loading test-backend-ops tests from json
* add error threshold based on op
* add error when file cannot be read
* add graph operator json extraction tool
* add nb parameter for non-contiguous input tensors
* fix view check
* only use view if non-contiguous/permuted, use C++ random instead of rand()
* replace internal API calls with public llama_graph_reserve call
* reduce test description length
* fix nb[0] not getting set for view
* add name to tests
* fix inplace error
* use text file instead of json
* move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/
* fix missing declaration
* use pragma once
* fix indent
* fix Windows build
2026-03-12 13:26:00 +01:00
Daniel Bevenius
6de1bc631d
common : update completion executables list [no ci] ( #19934 )
...
This commit updates the bash completion executables list, adding missing
executables and removing some that non longer exist.
2026-03-12 12:12:01 +01:00
Mishusha
a8304b4d27
common/parser: add GigaChatV3/3.1 models support ( #19931 )
...
Co-authored-by: Mishusha <pmv26021975@gmail.com>
2026-03-12 01:22:25 +01:00
ddh0
4a748b8f15
common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up ( #20416 )
2026-03-12 00:13:28 +01:00
Aldehir Rojas
b5fe4559ae
common/parser: use nlohmann::ordered_json to preserve parameter order ( #20385 )
2026-03-11 10:26:51 +01:00
Piotr Wilkin (ilintar)
acb7c79069
common/parser: handle reasoning budget ( #20297 )
...
* v1
* Finished!
* Handlie cli
* Reasoning sampler
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Less explosive terminology :)
* Add utf-8 case and tests
* common : migrate reasoning budget sampler to common
* cont : clean up
* cont : expose state and allow passing as initial state
* cont : remove unused imports
* cont : update state machine doc string
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Piotr Wilkin (ilintar)
6c770d16ca
Reduce level of content parser warning message to avoid log spam on non-debug verbosity ( #20347 )
2026-03-10 15:21:51 +01:00
Concedo
6adcd0b5db
Merge commit ' 34df42f7be' into concedo_experimental
...
# Conflicts:
# README.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-arith.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6
Merge commit ' 2cd20b72ed' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# docs/backend/snapdragon/windows.md
# docs/build.md
# docs/multimodal/MobileVLM.md
# docs/ops.md
# docs/ops/WebGPU.csv
# examples/debug/README.md
# examples/llama.vim
# examples/model-conversion/README.md
# examples/sycl/README.md
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-drv.cpp
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cpy.cl
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/quants.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/server-bench.py
# scripts/snapdragon/windows/run-cli.ps1
# tests/test-alloc.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/completion/README.md
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/README.md
# tools/perplexity/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Sigbjørn Skjæret
ec947d2b16
common : fix incorrect uses of stoul ( #20313 )
2026-03-10 11:40:26 +01:00