Adrien Gallouët
463b6a963c
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice ( #19954 )
...
llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande
winogrande_score : tokenizing selected tasks
winogrande_score : calculating winogrande score over selected tasks.
split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 46
failed to decode the batch, n_batch = 2048, ret = 1
winogrande_score: llama_decode() failed
same for hellaswag:
split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 99
failed to decode the batch, n_batch = 2048, ret = 1
hellaswag_score: llama_decode() failed
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 21:25:57 +01:00
ZeroV0LT
f17b3be63f
llama : fix pooling assertion crash in chunked GDN detection path ( #20468 )
...
* llama : fix pooling assertion crash in chunked GDN detection path
The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).
Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.
Regression introduced by #20340 (d28961d ).
Same class of bug as #12517 , fixed by #12545 .
* server : add mean pooling tests to embedding test suite
Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.
These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.
---------
Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
2026-03-13 20:53:42 +02:00
SoftwareRenderer
d7ba99c485
server: reset counter related to kill-switch on client error ( #20513 )
...
* server: reset kill-switch on client error
This avoids triggering a server kill switch.
If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.
However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.
* moved counter reset as per recommendation
* cont : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-13 19:58:09 +02:00
Concedo
04915d99ee
Merge commit ' 451ef08432' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/ops.md
# docs/ops/Vulkan.csv
# src/llama-model-loader.cpp
# src/llama-model.cpp
# src/llama.cpp
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/test-json-parser.cpp
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/CMakeLists.txt
2026-03-13 23:33:37 +08:00
Concedo
d2c911884d
Merge commit ' 213c4a0b81' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# common/chat-peg-parser.cpp
# common/chat.cpp
# docs/backend/SYCL.md
# docs/development/parsing.md
# docs/ops.md
# docs/ops/SYCL.csv
# embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# embd_res/templates/Bielik-11B-v3.0-Instruct.jinja
# embd_res/templates/GLM-4.7-Flash.jinja
# embd_res/templates/LFM2-8B-A1B.jinja
# embd_res/templates/StepFun3.5-Flash.jinja
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/count-equal.cpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/presets.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# models/templates/Apertus-8B-Instruct.jinja
# models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
# models/templates/Qwen-QwQ-32B.jinja
# models/templates/Qwen3-Coder.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
# models/templates/moonshotai-Kimi-K2.jinja
# models/templates/unsloth-Apriel-1.5.jinja
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/tests.h
# tests/test-backend-ops.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-peg-parser.cpp
# tools/CMakeLists.txt
# tools/cli/cli.cpp
2026-03-13 21:35:56 +08:00
Daniel Bevenius
8f974d2392
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate ( #20105 )
...
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.
The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.
This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar)
0e810413bb
tests : use reasoning instead of reasoning_budget in server tests ( #20432 )
2026-03-12 13:41:01 +01:00
Pascal
de190154c8
New conversations now auto-select the first loaded model ( #20403 )
...
* webui: auto-select first loaded model for new conversations in router mode
* chore: update webui build output
2026-03-12 09:07:05 +01:00
DAN™
fdb17643d3
model : add support for Phi4ForCausalLMV ( #20168 )
...
* Add support for Phi4ForCausalLMV.
* Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd.
* Rename contants + fix tokenizer label
* Clean-ups.
* Fix GGUF export.
* Set tokenizer.ggml.pre explicitly.
* Default vocab name rather than forcing it.
* Clean-ups.
* Fix indent.
* Fix subscriptable error.
* remov overcomplicated code path
* Clean-ups.
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-12 00:25:54 +01:00
Piotr Wilkin (ilintar)
acb7c79069
common/parser: handle reasoning budget ( #20297 )
...
* v1
* Finished!
* Handlie cli
* Reasoning sampler
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Less explosive terminology :)
* Add utf-8 case and tests
* common : migrate reasoning budget sampler to common
* cont : clean up
* cont : expose state and allow passing as initial state
* cont : remove unused imports
* cont : update state machine doc string
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Pascal
00de615345
Fix agentic mcp image single model ( #20339 )
...
* webui: fix MCP image attachments dropped during the agentic loop in single-model mode
* chore: update webui build output
2026-03-11 05:31:33 +01:00
Concedo
6adcd0b5db
Merge commit ' 34df42f7be' into concedo_experimental
...
# Conflicts:
# README.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-arith.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6
Merge commit ' 2cd20b72ed' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# docs/backend/snapdragon/windows.md
# docs/build.md
# docs/multimodal/MobileVLM.md
# docs/ops.md
# docs/ops/WebGPU.csv
# examples/debug/README.md
# examples/llama.vim
# examples/model-conversion/README.md
# examples/sycl/README.md
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-drv.cpp
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cpy.cl
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/quants.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/server-bench.py
# scripts/snapdragon/windows/run-cli.ps1
# tests/test-alloc.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/completion/README.md
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/README.md
# tools/perplexity/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Georgi Gerganov
a7b3dee7a5
server : make 2 checkpoints near the end of the prompt ( #20288 )
...
* server : make 2 checkpoints near the end of the prompt
* cont : adjust checkpoints
2026-03-10 14:28:23 +02:00
ddh0
1dab5f5a44
llama-quant : fail early on missing imatrix, refactor type selection, code cleanup ( #19770 )
...
* quantize : imatrix-fail early + code cleanup
* fix manual override printing
it's in the preliminary loop now, so needs to be on its own line
* revert header changes per ggerganov
* remove old #includes
* clarify naming
rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality
* fix per barto
2026-03-10 08:16:05 +02:00
Evan Huus
23fbfcb1ad
server: Parse port numbers from MCP server URLs in CORS proxy ( #20208 )
...
* Parse port numbers from MCP server URLs
* Pass scheme to http proxy for determining whether to use SSL
* Fix download on non-standard port and re-add port to logging
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-09 17:47:54 +01:00
Georgi Gerganov
96cfc4992c
server : fix checkpoints n_tokens calculation ( #20287 )
2026-03-09 16:47:06 +02:00
Georgi Gerganov
344ee2a38a
server : warn swa-full is not supported for non-SWA models ( #20291 )
2026-03-09 16:44:25 +02:00
Georgi Gerganov
d6e1556499
server : fix off-by-1 in server_tokens::size_up_to_pos() ( #20279 )
...
* server : fix off-by-1 in server_tokens::size_up_to_pos()
* cont : fix typo [no ci]
2026-03-09 16:43:38 +02:00
Georgi Gerganov
107d599952
server : add kill switch when server is stuck ( #20277 )
2026-03-09 10:33:12 +02:00
Aaron Teo
ae87863dc1
llama-bench: introduce -hf and -hff flags & use --mmap 1 by default ( #20211 )
2026-03-09 09:05:44 +08:00
Georgi Gerganov
d417bc43dd
server : do not create checkpoints right after mtmd chunks ( #20232 )
2026-03-08 22:16:46 +02:00
Johannes Gäßler
a976ff081b
llama: end-to-end tests ( #19802 )
...
* tests: add end-to-end tests per model architecture
* fixup for rebase
* fix use-after-free in llama-model-loader.cpp
* fix CI
* fix WebGPU
* fix CI
* disable CI for macOS-latest-cmake-arm64
* use expert_weights_scale only if != 0.0f
* comments
2026-03-08 12:30:21 +01:00
decahedron1
ff52ee964d
server : correct index on finish in OAI completion streams ( #20226 )
2026-03-08 10:08:57 +01:00
Piotr Wilkin (ilintar)
566059a26b
Autoparser - complete refactoring of parser architecture ( #18675 )
...
* Autoparser - full single commit squish
* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
2026-03-06 21:01:00 +01:00
Tom Vaucourt
e68f2fb894
server : preserve anthropic thinking blocks in conversion ( #20120 )
...
* server : preserve anthropic thinking blocks in conversion (#20090 )
* server : add tests for anthropic thinking block conversion
---------
Co-authored-by: root <root@llamacpp.home>
2026-03-06 17:41:12 +01:00
Concedo
d20e60ddd5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/build.md
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/deprecation-warning/deprecation-warning.cpp
# examples/eval-callback/eval-callback.cpp
# examples/gen-docs/gen-docs.cpp
# examples/gguf-hash/gguf-hash.cpp
# examples/gguf/gguf.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-merge.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# examples/sycl/ls-sycl-device.cpp
# examples/training/finetune.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/amx/common.h
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_general_q8_0_f32.cl
# ggml/src/ggml-opencl/kernels/transpose.cl
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# scripts/get-wikitext-2.sh
# tests/test-backend-ops.cpp
# tools/batched-bench/batched-bench.cpp
# tools/cvector-generator/cvector-generator.cpp
# tools/export-lora/export-lora.cpp
# tools/imatrix/imatrix.cpp
# tools/llama-bench/llama-bench.cpp
# tools/perplexity/perplexity.cpp
# tools/rpc/rpc-server.cpp
# tools/tokenize/tokenize.cpp
2026-03-06 21:19:49 +08:00
Concedo
abcca8c0f9
do not use the mxfp4 repack - repack must be synced again from before this commit if it's ever to be used in future. this will break compilation with older w64devkit
2026-03-06 21:07:41 +08:00
JustCommitRandomness
2fbc3b2ae5
Adjust int types in format strings ( #2009 )
...
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD
* complete the changes needed to fix the format string specifers
* avoid using inttypes, directly cast to size_t (u64 usually) instead
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 19:06:18 +08:00
Piotr Wilkin (ilintar)
f5ddcd1696
Checkpoint every n tokens: squash ( #20087 )
2026-03-06 11:39:26 +01:00
Aleksander Grygier
f6235a41ef
webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts ( #18655 )
2026-03-06 10:00:39 +01:00
Roj234
f7db3f3789
cli : Don't clear system prompt when using '/clear' ( #20067 )
...
* Enhance /clear command to include system prompt
Add system prompt to messages when clearing chat history.
* Use lambda
2026-03-06 06:41:11 +01:00
Sigbjørn Skjæret
b5ed0e058c
cli : add command and file auto-completion ( #19985 )
2026-03-05 10:47:28 +01:00
Aleksander Grygier
5e335ba113
webui: Improvements for Models Selector UI ( #20066 )
2026-03-05 08:52:22 +01:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] ( #20041 )
...
* fix(docs): correct typos found during code review
Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
* Update docs/backend/CANN.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"
This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
Sigbjørn Skjæret
d969e933e1
tools : add missing clocale include in mtmd-cli [no ci] ( #20107 )
2026-03-04 14:18:04 +01:00
SamareshSingh
cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata ( #17331 )
...
* Set C locale for consistent float formatting across all binaries.
* Add C locale setting to all tools binaries
Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.
* Apply suggestion from @JohannesGaessler
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
standby24x7
54910bd4f3
completion : Fix a typo in warning message ( #20082 )
...
resuse -> reuse
2026-03-04 06:44:49 +01:00
Concedo
4e358265a3
Merge commit ' 8387ffb28d' into concedo_experimental
...
# Conflicts:
# docs/backend/VirtGPU.md
# docs/backend/ZenDNN.md
# ggml/src/ggml-cpu/amx/amx.cpp
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-sycl/add-id.cpp
# ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp
# ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp
# ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp
# ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
# ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h
# ggml/src/ggml-virtgpu/backend/backend-dispatched.h
# ggml/src/ggml-virtgpu/backend/backend-virgl-apir.h
# ggml/src/ggml-virtgpu/backend/backend.cpp
# ggml/src/ggml-virtgpu/backend/shared/api_remoting.h
# ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
# ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
# ggml/src/ggml-virtgpu/backend/shared/apir_cs.h
# ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h
# ggml/src/ggml-virtgpu/backend/shared/apir_cs_rpc.h
# ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
# ggml/src/ggml-virtgpu/ggml-backend-device.cpp
# ggml/src/ggml-virtgpu/ggml-backend-reg.cpp
# ggml/src/ggml-virtgpu/ggml-backend.cpp
# ggml/src/ggml-virtgpu/ggml-remoting.h
# ggml/src/ggml-virtgpu/include/apir_hw.h
# ggml/src/ggml-virtgpu/regenerate_remoting.py
# ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp
# ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp
# ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp
# ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp
# ggml/src/ggml-virtgpu/virtgpu-forward-impl.h
# ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
# ggml/src/ggml-virtgpu/virtgpu.cpp
# ggml/src/ggml-virtgpu/virtgpu.h
# ggml/src/ggml-zendnn/CMakeLists.txt
# ggml/src/ggml-zendnn/ggml-zendnn.cpp
# src/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-tokenizer-0.sh
# tools/cli/README.md
# tools/completion/README.md
# tools/imatrix/imatrix.cpp
# tools/server/README.md
2026-02-28 12:45:16 +08:00
Roj234
3e6ab244ad
server: Add pragma once to server-context.h ( #19944 )
2026-02-27 18:28:36 +01:00
Sami Kama
5596a35791
server: Mirroring /v1/responses to /responses to match /v1/chat/completions pattern ( #19873 )
2026-02-28 00:44:42 +08:00
Pascal
2e7e638523
server : support multiple model aliases via comma-separated --alias ( #19926 )
...
* server : support multiple model aliases via comma-separated --alias
* server : update --alias description and regenerate docs
* server : multiple model aliases and tags
- address review feedback from ngxson
- --alias accepts comma-separated values (std::set, no duplicates)
- --tags for informational metadata (not used for routing)
- aliases resolve transparently in router via get_meta/has_model
- /v1/models exposes aliases and tags fields
* regenerate docs
* nits
* server : use first alias as model_name for backward compat
address review feedback from ngxson
* server : add single-model test for aliases and tags
2026-02-27 07:05:23 +01:00
Georgi Gerganov
37964f44f9
mtmd : fix padding of n_tokens ( #19930 )
2026-02-26 18:39:49 +02:00
Georgi Gerganov
01cd448b8c
server : fix ctx checkpoint restore logic ( #19924 )
2026-02-26 18:20:16 +02:00
drrros
efba35a860
server: fix load-on-startup not respected in ini file ( #19897 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Co-authored-by: Roman Marchenko <r.marchenko@ideco.ru>
2026-02-26 12:32:31 +01:00
Maximilian Werk
66287bdaac
model : add Jina Embeddings v5 Nano (partial EuroBERT) support ( #19826 )
...
* WIP: Add EuroBERT support with autoformatting changes
This commit includes:
- EuroBERT model implementation for GGUF conversion
- C++ backend support for EuroBERT architecture
- Unintended autoformatting changes to Python files
Saving before reverting formatting-only changes.
* feat: add back eos assert when not last token pooling
* feat: removed duplicated code and cleanup
* feat: removed not working architectures and unnecessary check
* fix: typo
* fix: dynamic pooling config
* feat: added an example model for eurobert
* feat: proper llama-vocab implementation for jina-v5
* fix: removed unnecessary comments
2026-02-26 12:14:09 +01:00
yggdrasil75
bd72300591
server : fix typo in server README.md ( #19900 )
...
fix typo
2026-02-26 11:26:16 +01:00
Concedo
749a606374
whisper broke
2026-02-26 16:45:04 +08:00
Concedo
44182ebefe
Merge commit ' 8c2c0108dd' into concedo_experimental
...
# Conflicts:
# examples/model-conversion/Makefile
# examples/model-conversion/scripts/utils/inspect-org-model.py
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/windows/run-cli.ps1
# scripts/sync_vendor.py
# tests/test-backend-sampler.cpp
2026-02-26 16:30:37 +08:00
Concedo
7e53bfd28d
Merge commit ' 2b6dfe824d' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# examples/save-load-state/save-load-state.cpp
# src/llama-context.cpp
# tools/cli/cli.cpp
2026-02-26 15:07:23 +08:00