koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-11 13:11:49 +00:00

Author	SHA1	Message	Date
Concedo	ef854f002e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/python-type-check.yml # AGENTS.md # CONTRIBUTING.md # examples/model-conversion/scripts/embedding/run-original-model.py # examples/model-conversion/scripts/utils/compare_tokens.py # examples/pydantic_models_to_grammar.py # ggml/src/ggml-rpc/ggml-rpc.cpp # pyrightconfig.json # scripts/compare-llama-bench.py # scripts/jinja/jinja-tester.py # scripts/server-bench.py # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-llama-grammar.cpp # tests/test-tokenizer-random.py # tools/cli/README.md # tools/completion/README.md # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2026-03-22 23:39:13 +08:00
ddh0	3306dbaef7	misc : prefer ggml-org models in docs and examples (#20827 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details * misc : prefer ggml-org models in docs and examples Prefer referring to known-good quantizations under ggml-org rather than 3rd-party uploaders. * remove accidentally committed file	2026-03-21 22:00:26 +01:00
Sigbjørn Skjæret	29b28a9824	ci : switch from pyright to ty (#20826 ) * type fixes * switch to ty * tweak rules * tweak more rules * more tweaks * final tweak * use common import-not-found rule	2026-03-21 08:54:34 +01:00
Concedo	6054bacadd	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/ai-issues.yml # CONTRIBUTING.md # docs/autoparser.md # docs/ops.md # docs/ops/Metal.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp_iface.idl # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hip/CMakeLists.txt # models/templates/Apriel-1.6-15b-Thinker-fixed.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # models/templates/deepseek-ai-DeepSeek-V3.1.jinja # models/templates/llama-cpp-deepseek-r1.jinja # models/templates/meetkai-functionary-medium-v3.1.jinja # scripts/fetch_server_test_models.py # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-mtmd.sh # scripts/snapdragon/adb/run-tool.sh # tests/test-chat-auto-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/server/README.md	2026-03-21 12:06:01 +08:00
Concedo	98f099aecc	Merge commit '`c1258830b2`' into concedo_experimental # Conflicts: # docs/docker.md # docs/ops.md # docs/ops/WebGPU.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl	2026-03-21 12:00:52 +08:00
Piotr Wilkin (ilintar)	b1c70e2e54	common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details	2026-03-21 00:19:04 +01:00
Xuan-Son Nguyen	fb78ad29bb	server: (doc) clarify in-scope and out-scope features (#20794 ) * server: (doc) clarify in-scope and out-scope features * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-20 14:03:50 +01:00
Georgi Gerganov	ab9d4c3678	server : improve mtmd ctx checkpoints (#20726 ) * server : improve mtmd ctx checkpoints * server : fix off-by-one in pos_min_thold	2026-03-20 11:13:12 +02:00
Ben Racicot	c1b911654a	server: fix router mode deadlock on child crash and TOCTOU race in models_max (#20763 ) Two bugs in `server_models::load()` that affect router mode reliability: Bug 1: Deadlock when child process crashes When a child process is killed (e.g., SIGKILL from OS code signature validation), the monitoring thread deadlocks on `stopping_thread.join()` because the stopping_thread's wait predicate (`is_stopping`) is never satisfied — the model name was never inserted into `stopping_models`. `update_status()` is never reached and the model stays stuck in LOADING state permanently. Fix: extend the stopping_thread's wait predicate to also wake when the child process is no longer alive (`!subprocess_alive()`). When woken by a dead child, the thread skips the shutdown sequence and returns immediately. The original `stopping_models.erase()` logic is preserved for normal unloads. Bug 2: TOCTOU race bypasses `--models-max` (ref #20137) `unload_lru()` is called outside the mutex, then `load()` acquires the lock afterward. Under concurrent requests, multiple threads observe capacity and all proceed to load, exceeding the limit. Fix: re-check capacity under the lock after `unload_lru()` returns. If another thread filled the slot in the window between `unload_lru()` and the lock acquisition, reject with an error instead of silently exceeding the limit.	2026-03-19 22:16:05 +01:00
Tomeamis	b739738dad	docs: Update server README to reflect PR #20297 (#20560 )	2026-03-19 21:28:44 +01:00
Ryan Goulden	26c9ce1288	server: Add cached_tokens info to oaicompat responses (#19361 ) * tests : fix fetch_server_test_models.py * server: to_json_oaicompat cached_tokens Adds OpenAI and Anthropic compatible information about the number of cached prompt tokens used in a response.	2026-03-19 19:09:33 +01:00
Piotr Wilkin (ilintar)	5e54d51b19	common/parser: add proper reasoning tag prefill reading (#20424 ) * Implement proper prefill extraction * Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp * Update tools/server/server-task.cpp * refactor: move grammars to variant, remove grammar_external, handle exception internally * Make code less C++y Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-19 16:58:21 +01:00
Pascal	4065c1a3a6	Server becomes the source of truth for sampling parameter defaults (#20558 ) * webui: make server the source of truth for sampling defaults * webui: fix Custom badge for sampling parameters * webui: log user overrides after server sync * chore: update webui build output * fix: Default values for sampling settings config object * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-19 13:20:39 +01:00
Xuan-Son Nguyen	1e64534570	mtmd: add clip_graph::build_mm() (#20751 ) * clip: add build_mm() * apply to all models * add TODO for bias overload	2026-03-19 13:11:39 +01:00
Pascal	cd708db0cc	WebUI: Persist the on/off state of the MCP servers for new conversations (#20750 ) * webui: add persistent storage for MCP server on/off state in new chats * webui: simplify MCP enabled checks, remove dead server.enabled fallback * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-19 12:54:06 +01:00
Aleksander Grygier	512bba6ee0	webui: Improve model parsing logic + add unit tests (#20749 ) * add tests for model id parser * add test case having activated params * add structured tests for model id parser * add ToDo * feat: Improve model parsing logic + tests * chore: update webui build output --------- Co-authored-by: bluemoehre <bluemoehre@gmx.de>	2026-03-19 12:25:50 +01:00
Concedo	48f914e374	Merge branch 'upstream' into concedo_experimental # Conflicts: # ci/run.sh # ggml/CMakeLists.txt # ggml/src/ggml-cpu/arch/riscv/repack.cpp # ggml/src/ggml-cpu/arch/x86/repack.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-exp.h # ggml/src/ggml-hexagon/htp/hvx-sigmoid.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync-ggml.last # tests/test-backend-sampler.cpp # tests/test-chat.cpp # tests/test-jinja.cpp # tools/cli/cli.cpp	2026-03-19 02:23:06 +08:00
crsawyer	5744d7ec43	Rebuild index.html.gz (#20724 )	2026-03-18 18:49:57 +01:00
Julien Chaumond	48e61238e1	webui: improve tooltip wording for attachment requirements (#20688 ) * webui: improve tooltip wording for attachment requirements Co-Authored-By: Claude <Agents+claude@huggingface.co> * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Claude <Agents+claude@huggingface.co>	2026-03-18 14:01:02 +01:00
Aleksander Grygier	7ab321d40d	webui: Fix duplicated messages on q param (#20715 ) * fix: Remove duplicate message sending on `?q` param * chore: update webui build output	2026-03-18 10:32:43 +01:00
Concedo	ded5486d52	Merge commit '`ab0bb93748`' into concedo_experimental # Conflicts: # .github/workflows/build-apple.yml # .github/workflows/build-sanitize.yml # .github/workflows/build-vulkan.yml # .github/workflows/build.yml # .github/workflows/copilot-setup-steps.yml # .github/workflows/release.yml	2026-03-18 01:22:34 +08:00
Piotr Wilkin (ilintar)	d2ecd2d1cf	common/parser: add `--skip-chat-parsing` to force a pure content parser. (#20289 ) * Add `--force-pure-content` to force a pure content parser. * Update common/arg.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Change parameter name [no ci] --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 16:16:43 +01:00
Georgi Gerganov	8cc2d81264	server : fix ctx checkpoint invalidation (#20671 )	2026-03-17 15:21:14 +02:00
Concedo	f31b040941	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-self-hosted.yml # benches/nemotron/nemotron-dgx-spark.md # docs/ops.md # docs/ops/SYCL.csv # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/sync-ggml.last # tests/test-jinja.cpp # tests/test-llama-archs.cpp	2026-03-17 14:05:23 +08:00
Concedo	9084527b36	Merge commit '`67a2209fab`' into concedo_experimental # Conflicts: # .github/workflows/build-cache.yml # .github/workflows/build-cross.yml # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # .github/workflows/python-lint.yml # .github/workflows/release.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server-webui.yml # .github/workflows/server.yml # CODEOWNERS # ggml/src/ggml-sycl/gated_delta_net.cpp # scripts/sync_vendor.py # tools/cli/cli.cpp	2026-03-17 11:11:25 +08:00
Piotr Wilkin (ilintar)	2e4a6edd4a	tools/server: support refusal content for Responses API (#20285 ) * Support refusal content for Responses API * Update tools/server/server-common.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tools/server/server-common.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 01:42:04 +01:00
Pascal	dddca026bf	webui: add model information dialog to router mode (#20600 ) * webui: add model information dialog to router mode * webui: add "Available models" section header in model list * webui: remove nested scrollbar from chat template in model info dialog * chore: update webui build output * feat: UI improvements * refactor: Cleaner rendering + UI docs * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-16 15:38:11 +01:00
Aleksander Grygier	67a2209fab	webui: Add MCP CORS Proxy detection logic & UI (#20167 ) * refactor: MCP store cleanup * feat: Add MCP proxy availability detection * fix: Sidebar icon * chore: update webui build output * chore: Formatting * chore: update webui build output * chore: Update package lock * chore: update webui build output * chore: update webui build output * chore: update webui build output	2026-03-16 13:05:36 +01:00
Pascal	d65c4f2dc9	Fix model selector locked to first loaded model with multiple models (#20580 ) * webui: fix model selector being locked to first loaded model When multiple models are loaded, the auto-select effect would re-fire on every loadedModelIds change, overriding the user's manual model selection. Guard with selectedModelId so auto-select only kicks in when no model is chosen yet. * chore: update webui build output	2026-03-16 12:04:06 +01:00
Woof Dog	d8c331c0af	webui: use date in more human readable exported filename (#19939 ) * webui: use date in exported filename Move conversation naming and export to utils update index.html.gz * webui: move literals to message export constants file * webui: move export naming and download back to the conversation store * chore: update webui build output * webui: add comments to some constants * chore: update webui build output	2026-03-16 11:18:13 +01:00
Piotr Wilkin (ilintar)	9e2e2198b0	tools/cli: fix disable reasoning (#20606 )	2026-03-15 22:40:53 +01:00
Georgi Gerganov	88915cb55c	server : fix wait in test_cancel_requests() test (#20601 ) * server : fix wait in test_cancel_requests() test * codeowners : add team for server tests	2026-03-15 20:54:37 +02:00
Concedo	f3d2f58fa8	note: smartcache is broken for rnn currently	2026-03-15 11:31:47 +08:00
Concedo	b1c500ae2b	Merge commit '`2948e6049a`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # docs/backend/VirtGPU/development.md # docs/ops.md # docs/ops/WebGPU.csv # embd_res/templates/GigaChat3-10B-A1.8B.jinja # embd_res/templates/GigaChat3.1-10B-A1.8B.jinja # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-quantize-fns.cpp	2026-03-15 11:21:24 +08:00
Concedo	67c9798d0b	Merge commit '`3ca19b0e9f`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # common/chat-peg-parser.cpp # docs/backend/SYCL.md # docs/ops.md # docs/ops/SYCL.csv # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/rope.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # scripts/compare-llama-bench.py # scripts/sync_vendor.py # tests/CMakeLists.txt # tools/cli/cli.cpp	2026-03-15 11:11:31 +08:00
Xuan-Son Nguyen	94d0262277	mtmd: add llama-mtmd-debug binary (#20508 ) * mtmd: add llama-mtmd-debug binary * adapt * fixes * fix compile error * fix windows compile error * rm legacy clip_debug_encode() * add MTMD_API to fix build	2026-03-14 15:52:29 +01:00
Chedrian07	710878a7dd	webui: restore code preview iframe origin isolation (#20477 )	2026-03-14 11:28:28 +01:00
Concedo	1802b09e6f	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # docs/ops.md # docs/ops/CPU.csv # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-cpu/repack.h # src/llama-quant.cpp # tests/test-json-schema-to-grammar.cpp	2026-03-14 17:56:16 +08:00
Concedo	ff3f8533d3	Merge commit '`c96f608d98`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # docs/ops.md # docs/ops/Vulkan.csv # models/templates/LFM2-8B-A1B.jinja # tests/peg-parser/test-python-dict-parser.cpp # tests/peg-parser/test-unicode.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2026-03-14 17:14:34 +08:00
Adrien Gallouët	463b6a963c	tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954 ) llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande winogrande_score : tokenizing selected tasks winogrande_score : calculating winogrande score over selected tasks. split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 46 failed to decode the batch, n_batch = 2048, ret = 1 winogrande_score: llama_decode() failed same for hellaswag: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 99 failed to decode the batch, n_batch = 2048, ret = 1 hellaswag_score: llama_decode() failed Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-13 21:25:57 +01:00
ZeroV0LT	f17b3be63f	llama : fix pooling assertion crash in chunked GDN detection path (#20468 ) * llama : fix pooling assertion crash in chunked GDN detection path The chunked fused Gated Delta Net detection in sched_reserve() calls graph_reserve(16n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs. This creates a dimension mismatch in build_pooling() for embedding models with mean/rank pooling: build_inp_mean() creates a tensor with shape [n_tokens=16n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...] via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b). Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation, matching the pattern used by the pp/tg worst-case reservations. Regression introduced by #20340 (`d28961d`). Same class of bug as #12517, fixed by #12545. * server : add mean pooling tests to embedding test suite Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple to cover the --pooling mean codepath, which was previously untested. These tests would have caught the regression introduced by #20340 where build_pooling() crashes with a ggml_mul_mat assertion due to mismatched dimensions in the chunked GDN detection path. --------- Co-authored-by: Domenico Crupi <domenico@zerovolt.it>	2026-03-13 20:53:42 +02:00
SoftwareRenderer	d7ba99c485	server: reset counter related to kill-switch on client error (#20513 ) * server: reset kill-switch on client error This avoids triggering a server kill switch. If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated. However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates. * moved counter reset as per recommendation * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-13 19:58:09 +02:00
Concedo	04915d99ee	Merge commit '`451ef08432`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/ops.md # docs/ops/Vulkan.csv # src/llama-model-loader.cpp # src/llama-model.cpp # src/llama.cpp # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/test-json-parser.cpp # tests/peg-parser/test-python-dict-parser.cpp # tests/peg-parser/test-unicode.cpp # tests/test-chat-auto-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/CMakeLists.txt	2026-03-13 23:33:37 +08:00
Concedo	d2c911884d	Merge commit '`213c4a0b81`' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # common/chat-peg-parser.cpp # common/chat.cpp # docs/backend/SYCL.md # docs/development/parsing.md # docs/ops.md # docs/ops/SYCL.csv # embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja # embd_res/templates/Bielik-11B-v3.0-Instruct.jinja # embd_res/templates/GLM-4.7-Flash.jinja # embd_res/templates/LFM2-8B-A1B.jinja # embd_res/templates/StepFun3.5-Flash.jinja # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/count-equal.cpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/presets.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/vecdotq.hpp # models/templates/Apertus-8B-Instruct.jinja # models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja # models/templates/Qwen-QwQ-32B.jinja # models/templates/Qwen3-Coder.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # models/templates/deepseek-ai-DeepSeek-V3.1.jinja # models/templates/fireworks-ai-llama-3-firefunction-v2.jinja # models/templates/moonshotai-Kimi-K2.jinja # models/templates/unsloth-Apriel-1.5.jinja # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/tests.h # tests/test-backend-ops.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-peg-parser.cpp # tools/CMakeLists.txt # tools/cli/cli.cpp	2026-03-13 21:35:56 +08:00
Daniel Bevenius	8f974d2392	mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105 ) This commit renames the the function `mtmd_get_audio_bitrate` to `mtmd_get_audio_sample_rate` to better reflect its purpose. The motivation for this is that the function currently returns the audio sample rate, not the bitrate (sample_rate × bit_depth × channels), and that is how it is used in the code as well. This is a breaking change, but I believe mtmd is still in experimental/development phase so it might be alright to simply rename.	2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar)	0e810413bb	tests : use `reasoning` instead of `reasoning_budget` in server tests (#20432 )	2026-03-12 13:41:01 +01:00
Pascal	de190154c8	New conversations now auto-select the first loaded model (#20403 ) * webui: auto-select first loaded model for new conversations in router mode * chore: update webui build output	2026-03-12 09:07:05 +01:00
DAN™	fdb17643d3	model : add support for Phi4ForCausalLMV (#20168 ) * Add support for Phi4ForCausalLMV. * Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd. * Rename contants + fix tokenizer label * Clean-ups. * Fix GGUF export. * Set tokenizer.ggml.pre explicitly. * Default vocab name rather than forcing it. * Clean-ups. * Fix indent. * Fix subscriptable error. * remov overcomplicated code path * Clean-ups. --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-12 00:25:54 +01:00
Piotr Wilkin (ilintar)	acb7c79069	common/parser: handle reasoning budget (#20297 ) * v1 * Finished! * Handlie cli * Reasoning sampler * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Less explosive terminology :) * Add utf-8 case and tests * common : migrate reasoning budget sampler to common * cont : clean up * cont : expose state and allow passing as initial state * cont : remove unused imports * cont : update state machine doc string --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Alde Rojas <hello@alde.dev>	2026-03-11 10:26:12 +01:00
Pascal	00de615345	Fix agentic mcp image single model (#20339 ) * webui: fix MCP image attachments dropped during the agentic loop in single-model mode * chore: update webui build output	2026-03-11 05:31:33 +01:00

1 2 3 4 5 ...

836 commits