Commit graph

1089 commits

Author SHA1 Message Date
Concedo
6adcd0b5db Merge commit '34df42f7be' into concedo_experimental
# Conflicts:
#	README.md
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/act-ops.c
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/cpy-ops.c
#	ggml/src/ggml-hexagon/htp/get-rows-ops.c
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-arith.h
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/set-rows-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	tests/test-backend-ops.cpp
#	tools/cli/cli.cpp
#	tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6 Merge commit '2cd20b72ed' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/README.md
#	docs/backend/snapdragon/windows.md
#	docs/build.md
#	docs/multimodal/MobileVLM.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	examples/debug/README.md
#	examples/llama.vim
#	examples/model-conversion/README.md
#	examples/sycl/README.md
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp-drv.cpp
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-copy.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cpy.cl
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/quants.hpp
#	ggml/src/ggml-sycl/softmax.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	scripts/server-bench.py
#	scripts/snapdragon/windows/run-cli.ps1
#	tests/test-alloc.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/completion/README.md
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/README.md
#	tools/perplexity/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Piotr Wilkin (ilintar)
f5ddcd1696
Checkpoint every n tokens: squash (#20087) 2026-03-06 11:39:26 +01:00
Aleksander Grygier
f6235a41ef
webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655) 2026-03-06 10:00:39 +01:00
Sigbjørn Skjæret
b5ed0e058c
cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] (#20041)
* fix(docs): correct typos found during code review

Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>

* Update docs/backend/CANN.md

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"

This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
Concedo
4e358265a3 Merge commit '8387ffb28d' into concedo_experimental
# Conflicts:
#	docs/backend/VirtGPU.md
#	docs/backend/ZenDNN.md
#	ggml/src/ggml-cpu/amx/amx.cpp
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-sycl/add-id.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.h
#	ggml/src/ggml-virtgpu/backend/backend-virgl-apir.h
#	ggml/src/ggml-virtgpu/backend/backend.cpp
#	ggml/src/ggml-virtgpu/backend/shared/api_remoting.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs_rpc.h
#	ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
#	ggml/src/ggml-virtgpu/ggml-backend-device.cpp
#	ggml/src/ggml-virtgpu/ggml-backend-reg.cpp
#	ggml/src/ggml-virtgpu/ggml-backend.cpp
#	ggml/src/ggml-virtgpu/ggml-remoting.h
#	ggml/src/ggml-virtgpu/include/apir_hw.h
#	ggml/src/ggml-virtgpu/regenerate_remoting.py
#	ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-impl.h
#	ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
#	ggml/src/ggml-virtgpu/virtgpu.cpp
#	ggml/src/ggml-virtgpu/virtgpu.h
#	ggml/src/ggml-zendnn/CMakeLists.txt
#	ggml/src/ggml-zendnn/ggml-zendnn.cpp
#	src/CMakeLists.txt
#	tests/CMakeLists.txt
#	tests/test-tokenizer-0.sh
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/imatrix/imatrix.cpp
#	tools/server/README.md
2026-02-28 12:45:16 +08:00
Pascal
2e7e638523
server : support multiple model aliases via comma-separated --alias (#19926)
* server : support multiple model aliases via comma-separated --alias

* server : update --alias description and regenerate docs

* server : multiple model aliases and tags

- address review feedback from ngxson
- --alias accepts comma-separated values (std::set, no duplicates)
- --tags for informational metadata (not used for routing)
- aliases resolve transparently in router via get_meta/has_model
- /v1/models exposes aliases and tags fields

* regenerate docs

* nits

* server : use first alias as model_name for backward compat

address review feedback from ngxson

* server : add single-model test for aliases and tags
2026-02-27 07:05:23 +01:00
Eric Zhang
9b62913b40
jinja : correct default size for string slices (#19913) 2026-02-26 12:28:09 +01:00
Concedo
749a606374 whisper broke 2026-02-26 16:45:04 +08:00
Concedo
7e53bfd28d Merge commit '2b6dfe824d' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	examples/save-load-state/save-load-state.cpp
#	src/llama-context.cpp
#	tools/cli/cli.cpp
2026-02-26 15:07:23 +08:00
ddh0
832aa94762
common : add more aliases for sampler CLI params (#19797)
* common : add more aliases for sampler CLI params
2026-02-25 16:34:25 +01:00
Daniel Bevenius
2b6dfe824d
llama : remove write/read of output ids/logits/embeddings (#18862)
* llama : remove write/read of output ids/logits/embeddings

This commit removes the write/read of output ids, logits and
embeddings from the llama context state.

Refs: https://github.com/ggml-org/llama.cpp/pull/18862#issuecomment-3756330941

* completion : add replying of session state

This commit updates the session handing in the completion tool to handle
the that logits are no longer stored in the session file. Instead, we
need to replay the last token to get the logits for sampling.

* common : add common_prompt_batch_decode function

This commit adds a new function which is responsible for decoding prompt
and optionally handle the saving for session data.

* update save-state.cpp to use llama_state_load_file

This commit updates the save-load-state example to utilize the new
llama_state_load_file function for loading the model state from a file.
And it also replays the last token after loading since this state is now
stored before the last token is processed.

* examples : set n_seq_max = 2 for ctx3

This commit updates the save-load-state example to set the n_seq_max
parameter to 2 when initializing the ctx3 context.

The motivation for this change is that using 1 as n_parallel/n_seq_max
the context only supports one sequence, but the test laster tries to
use a second sequence which results in the following error:
```console
main : loaded state with 4 tokens
main : seq 0 copied, 225760 bytes
main : kv cache cleared
find_slot: seq_id=1 >= n_seq_max=1 Try using a bigger --parallel value
state_read_meta: failed to find available cells in kv cache
```
This seems to only happen for recurrent/hybrid models.
2026-02-23 07:04:30 +01:00
Xuan-Son Nguyen
5452d736f8
jinja: correct stats for tojson and string filters (#19785) 2026-02-22 21:08:23 +01:00
Aldehir Rojas
ed4837891d
common : fix improper trimming in XML parser on complete message (#19805)
Co-authored-by: Jules LEIDELINGER <11395311+julio75012@users.noreply.github.com>
2026-02-22 17:34:54 +01:00
Concedo
d06700687f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/rocm.Dockerfile
#	.github/workflows/release.yml
#	CMakeLists.txt
#	ggml/src/ggml-cuda/common.cuh
#	scripts/sync_vendor.py
#	tests/test-chat.cpp
2026-02-22 09:33:13 +08:00
Aldehir Rojas
94b0200a01
common : merge qwen3-coder and nemotron nano 3 parsers (#19765)
* common : migrate qwen3-coder to PEG parsing variant

* cont : add JSON parameter test
2026-02-20 23:22:22 +01:00
Concedo
e626de2430 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/stepfun-ai-Step-3.5-Flash.jinja
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/mtmd/CMakeLists.txt
2026-02-20 15:16:26 +08:00
Concedo
07c45ced56 Merge commit 'c78e682245' into concedo_experimental
# Conflicts:
#	src/models/qwen35.cpp
#	src/models/qwen35moe.cpp
2026-02-20 14:41:32 +08:00
Concedo
9eb9e4eb83 Merge commit '8a70973557' into concedo_experimental
# Conflicts:
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	examples/model-conversion/scripts/utils/tensor-info.py
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/expm1.cl
#	ggml/src/ggml-opencl/kernels/mean.cl
#	ggml/src/ggml-opencl/kernels/softplus.cl
#	ggml/src/ggml-opencl/kernels/sum_rows.cl
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
#	ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/scale.wgsl
#	tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-02-20 14:36:49 +08:00
Jesse Posner
3dadc88b58
common : fix Step-3.5-Flash format detection and thinking support (#19635)
* common : fix Step-3.5-Flash format detection and thinking support

Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder
(<tool_call><function=...><parameter=...>) but its Jinja template lacks
the bare <function> and plural <parameters> markers that the detection
logic previously required. This caused it to fall through to Hermes 2
Pro, which doesn't call func_args_not_string(), so arguments stayed as
JSON strings and templates using arguments|items crashed.

Additionally, the Qwen3-Coder-XML format handler had no thinking support.
Models like Step-3.5-Flash that unconditionally emit <think> in their
generation prompt need the same thinking_forced_open handling that
Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content
is never separated from content in API responses.

Changes:
- Relax Qwen3-Coder XML detection to only require the 3 shared markers
- Tighten Nemotron v3 branch to also require bare <function> and plural
  <parameters>, preventing Step-3.5-Flash from being misrouted via <think>
- Add thinking_forced_open support to Qwen3-Coder-XML init function
- Add <think>/</think> to preserved tokens
- Fix build_grammar_xml_tool_call to handle thinking_forced_open in the
  grammar root rule, allowing </think> before tool calls
- Add Step-3.5-Flash chat template and format detection test

Builds on: https://github.com/ggml-org/llama.cpp/pull/19283

* chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests

Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and
Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with
unconditional <think> output. Route it to the Nemotron v3 PEG parser
for streaming and schema-aware parameter parsing.

Detection: templates with <think> + XML tool tags use Nemotron v3 PEG
parser; templates without <think> (Qwen3-Coder) use GBNF grammar.

Tests cover: basic messages, tool calls with/without thinking content,
parallel tool calls, code string parameters, optional </parameter>
closing tags, and JSON schema response format.

* chat : remove dead thinking code from qwen3_coder_xml

Remove thinking handling code that became unreachable after routing
Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no
<think> in its template, so the thinking_forced_open logic, preserved
tokens, and grammar prefix were dead paths.
2026-02-19 22:40:52 +01:00
abhijitb11
39e4b1dc9b
common : fix gpt-oss Jinja error when assistant message has both content and thinking with tool calls (#19704) 2026-02-19 14:59:20 -06:00
Tarek Dakhran
c5897995a7
mtmd : chat : Fix extra \n between text and media marker (#19595)
* mtmd : chat : Fix extra \n between text and media marker

Thanks to @tugot17 for detecting and reporting the issue.

For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation.

However `llama-server` doesn't. I traced it down to extra newline
inserted after `<__media__>`.

This happens in `to_json_oaicompat`, that treats media markers as text
and joins all parts with `\n` separator.

PR introduces new type `media_marker` and uses it for media markers.
Extra logic is added to prevent insertion of newlines before and after
media markers.

With this change number of input tokens is identical to HF
implementation and as a result the output is also identical.

I explored other ways to address the issue
* remove completely `\n` between text parts in `to_json_oaicompat`
* merge text messages in server-common.cpp before sending them to `to_json_oaicompat`

Please propose alternative ways of fixing this issue.

* Refactor to use explicite per type ifs

* Update common/chat.cpp

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

* Update common_chat_templates_apply_legacy

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-02-19 12:18:57 +01:00
Piotr Wilkin (ilintar)
8a70973557
Add Jinja support for "indent" string filter (#19529)
Some checks are pending
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* Add partial Jinja support for "indent" string filter

* Fully implement indent

* Add tests for all width variants.

* Update tests/test-jinja.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Fix getline ignoring trailing newlines

* Update common/jinja/value.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix first indent condition

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-19 00:25:52 +01:00
Adrien Gallouët
a569bda445
common : make small string helpers as inline functions (#19693)
Some checks are pending
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Also use string_view when it make sense and fix some corner cases.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-18 08:03:01 +01:00
Concedo
b6bb9c914e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/winget.yml
#	CMakeLists.txt
#	common/CMakeLists.txt
#	examples/model-conversion/scripts/causal/run-org-model.py
#	ggml/src/ggml-cpu/CMakeLists.txt
#	tools/perplexity/perplexity.cpp
#	tools/server/CMakeLists.txt
2026-02-17 19:41:28 +08:00
Adrien Gallouët
65cede7c70
build : cleanup library linking logic (#19665)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-17 08:36:45 +01:00
Ivan Chikish
cceb1b4e33
common : inline functions (#18639) 2026-02-16 17:52:24 +02:00
Concedo
8d61ff4530 revert "build : remove LLAMA_HTTPLIB option (#19623)" 2026-02-16 18:32:29 +08:00
Concedo
dbe0604dfc tools broken currently 2026-02-16 18:10:42 +08:00
Concedo
72f7e01b27 Merge commit '01d8eaa28d' into concedo_experimental
# Conflicts:
#	build-xcframework.sh
#	scripts/sync_vendor.py
#	tests/test-backend-ops.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/rpc/rpc-server.cpp
2026-02-16 15:36:59 +08:00
Adrien Gallouët
9e118b97c4
build : remove LLAMA_HTTPLIB option (#19623)
This option was introduced as a workaround because cpp-httplib could not
build on visionOS. Since it has been fixed and now compiles on all platforms,
we can remove it and simplify many things.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-15 15:38:50 +01:00
iMil
badba89320
NetBSD build support (#19589) 2026-02-14 09:47:01 +01:00
agent-enemy-2
2d8015e8a4
llama : update LoRA API. + fix excessive graph reserves (#19280)
* Refactoring to use new llama_put_adapter_loras

* cont : alternative lora API

---------

Co-authored-by: Jake Chavis <jakechavis6@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-02-14 10:06:27 +02:00
Concedo
45dc155530 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	AGENTS.md
#	SECURITY.md
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	scripts/sync_vendor.py
#	src/unicode.cpp
#	tests/test-backend-ops.cpp
#	tools/cli/cli.cpp
2026-02-14 12:44:16 +08:00
Adrien Gallouët
b48e80f677
common : update download code (#19573)
* common : remove legacy .json to .etag migration code

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* common : simplify common_download_file_single_online

This commit also force a redownload if the file exists
but has no .etag file.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-13 15:10:46 +01:00
Concedo
bff3fd3e34 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	common/common.cpp
#	docs/backend/snapdragon/README.md
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	scripts/pr2wt.sh
#	tests/test-backend-ops.cpp
#	tools/server/README.md
2026-02-13 14:00:45 +08:00
Concedo
55524e160b temp merge, not working 2026-02-13 12:11:26 +08:00
Georgi Gerganov
338085c69e
args : add -kvu to llama-parallel (#19577) 2026-02-12 21:52:41 +02:00
Concedo
261d78eaaa Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
#	docs/speculative.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/mtmd/clip.cpp
2026-02-12 18:05:20 +08:00
Adrien Gallouët
4ae1b7517a
common : replace deprecated codecvt using parse_utf8_codepoint (#19517)
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-12 07:27:52 +01:00
Daniel Bevenius
3136a849db
common : remove unused token util functions (#19506)
This commit removes two unused functions `common_lcp` and `common_lcs`.
The last usage of these functions was removed in
Commit 33eff40240 ("server : vision support
via libmtmd") and are no longer used anywhere in the codebase.
2026-02-11 17:41:35 +01:00
Adrien Gallouët
0c1f39a9ae
common : improve download error reporting (#19491)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-11 09:27:55 +01:00
thecaptain789
8ee538ce73
llama : correct typos 'occured' and 'occurences' (#19414)
Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>
2026-02-11 07:05:31 +01:00
Xuan-Son Nguyen
98e57ca422
chat: fix case where template accepts type content only (#19419)
* chat: fix case where template accepts type content only

* rm stray log

* reuse render_message_to_json
2026-02-09 22:14:12 +01:00
Sascha Rogmann
292f6908cd
spec : remove check rate (#19377)
* spec: remove parameter spec-ngram-check-rate

* spec : renamed statistics vars

* spec : add n_call_begin, n_call_accept

* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Concedo
a0a78dacc4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	pyproject.toml
#	requirements/requirements-convert_legacy_llama.txt
#	src/CMakeLists.txt
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
2026-02-07 15:54:02 +08:00
Georgi Gerganov
dfde5993ea
common : add common_speculative_is_compat() (#19270)
* llama : add llama_memory_can_rm_suffix()

* Revert "llama : add llama_memory_can_rm_suffix()"

This reverts commit d30e59b62a15ef4266a6503e3f4eba770aec001b.

* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Concedo
157fac7bd0 Merge commit 'c342c3b93d' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CODEOWNERS
#	scripts/sync_vendor.py
2026-02-05 22:23:05 +08:00
Xuan-Son Nguyen
e0c93af2a0
debug: make common_debug_print_tensor readable (#19331)
* debug: make common_debug_print_tensor readable

* editorconfig
2026-02-04 17:55:31 +01:00