Concedo
a3a5897d93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/workflows/python-type-check.yml
# embd_res/templates/Qwen3.5-4B.jinja
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/model-conversion/scripts/utils/compare_tokens.py
# examples/model-conversion/scripts/utils/semantic_check.py
# examples/sycl/build.sh
# examples/sycl/run-llama2.sh
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/rope-ops.c
# scripts/gen-unicode-data.py
# tests/test-chat.cpp
2026-03-30 21:41:19 +08:00
Concedo
42ad89cd86
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/nix/package.nix
# .github/workflows/build-android.yml
# .github/workflows/build-cann.yml
# .github/workflows/build-msys.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# CMakeLists.txt
# docs/backend/CANN.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync_vendor.py
# tests/test-chat-auto-parser.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-reasoning-budget.cpp
# tools/cli/cli.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2026-03-30 20:45:38 +08:00
Aleksander Grygier
389c7d4955
webui: Fix branching logic on edit message ( #21175 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* fix: Branching logic + small refactor
* chore: update webui build output
2026-03-30 14:40:50 +02:00
Sigbjørn Skjæret
e2eb39e81c
ci : bump ty to 0.0.26 ( #21156 )
...
* fix incorrect type ignore comments
* bump ty to 0.0.26
2026-03-30 09:29:15 +02:00
Xuan-Son Nguyen
abf9a62161
server: wrap headers for mcp proxy ( #21072 )
...
* server: wrap headers for mcp proxy
* Update tools/server/server-cors-proxy.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix build
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-30 08:59:16 +02:00
BlueMöhre
968189729f
WebUI: Replace illegal nested button elements ( #21026 )
...
* remove/replace nested button elements
* map rest props to outer element
* solve TODO
* chore: update webui build output
2026-03-28 17:57:59 +01:00
Georgi Gerganov
edfb440a2f
server : fix processing of multiple back-to-back mtmd chunks ( #21107 )
2026-03-28 16:27:36 +02:00
Adrien Gallouët
3d66da1809
ci : gracefully shut down the server ( #21110 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 14:49:57 +01:00
Woof Dog
82b703f8bc
Document custom default webui preferences in server README ( #19771 )
2026-03-28 14:19:16 +01:00
Aleksander Grygier
51a84efc53
webui: Conversation forking + branching improvements ( #21021 )
...
* refactor: Make `DialogConfirmation` extensible with children slot
* feat: Add conversation forking logic
* feat: Conversation forking UI
* feat: Update delete/edit dialogs and logic for forks
* refactor: Improve Chat Sidebar UX and add MCP Servers entry
* refactor: Cleanup
* feat: Update message in place when editing leaf nodes
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* refactor: Post-review improvements
* chore: update webui build output
* test: Update Storybook test
* chore: update webui build output
* chore: update webui build output
2026-03-28 13:38:15 +01:00
Adrien Gallouët
b0f0dd3e51
vendor : update cpp-httplib to 0.40.0 ( #21100 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 08:59:44 +01:00
Sigbjørn Skjæret
c46758d28f
cli : add /glob command ( #21084 )
...
* add /glob command
* output error when max files reached
* support globbing outside curdir
2026-03-28 02:33:04 +01:00
Adrien Gallouët
5c1a7b8355
server : add custom socket options to disable SO_REUSEPORT ( #21056 )
...
* server : add custom socket options to disable SO_REUSEPORT
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add --reuse-port
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 --reuse-port
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update tools/server/README.md (llama-gen-docs)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Fix windows
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 01:12:43 +01:00
Aldehir Rojas
59d840209a
common : inhibit lazy grammar sampler while reasoning is active ( #20970 )
...
* common : inhibit grammar while reasoning budget is active
* cont : update force_pos in accept
* cont : fix tests
* cont : tweak should apply logic
* cont : return early not using grammar sampler
* Add tests
* cont : prevent backend sampling when reasoning budget enabled
* cont : fix typo
---------
Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
2026-03-27 18:30:40 +01:00
Concedo
3ec6381123
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/gguf-publish.yml
# ci/run.sh
# docs/backend/OPENVINO.md
# examples/llama.android/lib/src/main/cpp/ai_chat.cpp
# ggml/src/ggml-sycl/add-id.cpp
# requirements/requirements-pydantic.txt
# tests/test-gguf.cpp
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
# tools/gguf-split/README.md
# tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Kusha Gharahi
ff934e29bc
server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui ( #20158 )
...
* introduce LLAMA_SERVER_NO_WEBUI
* LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI
* LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE
* MIssed this
* Add useWebUi to package.nix
2026-03-27 17:25:55 +01:00
Aleksander Grygier
e6f6770515
webui: Improve Chat Messages initial scroll + auto-scroll logic + add lazy loading with transitions to content blocks ( #20999 )
...
* refactor: Always use agentic content renderer for Assistant Message
* feat: Improve initial scroll + auto-scroll logic + implement fade in action for content blocks
* chore: update webui build output
2026-03-27 17:01:36 +01:00
AN Long
48cda24c11
server: remove the verbose_prompt parameter ( #21059 )
...
* server: respect the verbose_prompt parameter
* Revert "server: respect the verbose_prompt parameter"
This reverts commit 8ed885cf375b2c8ba641c661f3667df70b9797f4.
* Remove --verbose-prompt parameter from llama-server
* Using set_examples instead of set_excludes
2026-03-27 13:36:13 +02:00
Xuan-Son Nguyen
20197b6fe3
server: add built-in tools backend support ( #20898 )
...
* wip: server_tools
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* change arg to --tools all
* add readme mention
* llama-gen-docs
2026-03-27 10:07:11 +01:00
Pascal
d0fa2c9fbb
Send reasoning content back to the model across turns via the reasoning_content API field ( #21036 )
...
* webui: send reasoning_content back to model in context
Preserve assistant reasoning across turns by extracting it from
internal tags and sending it as a separate reasoning_content field
in the API payload. The server and Jinja templates handle native
formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...).
Adds "Exclude reasoning from context" toggle in Settings > Developer
(off by default, so reasoning is preserved). Includes unit tests.
* webui: add syncable parameter for excludeReasoningFromContext
* chore: update webui build output
2026-03-27 08:17:35 +01:00
Concedo
c00fe0af5a
Merge commit ' 9f102a1407' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/pull_request_template.md
# CODEOWNERS
# README.md
# common/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-dump.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
8a6c41dc5c
Merge commit ' 841bc203e2' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-03-25 22:49:53 +08:00
Aleksander Grygier
69e0ecef06
webui: Fix editing assistant message without branching ( #20944 )
...
* fix: Editing assistant response without branching
* chore: update webui build output
2026-03-25 12:47:33 +02:00
Pascal
062cca58fc
Add SLEEPING status to the WebUI model selector ( #20949 )
...
* webui: handle sleeping model status, fix favourite -> favorite
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: fix optional event parameter in sleeping model onclick
* typo
* webui: restore orange sleeping indicator dot with hover unload
* chore: update webui build output
* webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore
* chore: update webui build output
* webui: fix favourite -> favorite (UK -> US spelling) everywhere
Address review feedback from WhyNotHugo
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-25 11:02:32 +01:00
BlueMöhre
a94fdb090a
WebUI: fix edit msg form textarea height ( #20830 )
...
* autoresize textarea on mount
* allow textarea to grow to same height as rendered messages
* add UI build file
2026-03-24 13:17:45 +01:00
Adrien Gallouët
8c7957ca33
common : add standard Hugging Face cache support ( #20775 )
...
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Aleksander Grygier
11fb11b901
webui: Improve chat form positioning ( #20901 )
2026-03-23 14:30:55 +01:00
Eric Zhang
841bc203e2
docs : rerun llama-gen-docs to include new CLI args ( #20892 )
2026-03-23 12:33:38 +01:00
Xuan-Son Nguyen
31a5cf4c3f
server: use httplib dynamic threads ( #20817 )
...
* server: use httplib dynamic threads
* change to n_threads_http + 1024
2026-03-23 12:22:46 +01:00
Pascal
c44a932cf4
webui: fix --webui-config-file settings not applied on load ( #20823 )
...
* webui: fix --webui-config-file settings not applied on load
* chore: update webui build output
2026-03-23 11:25:35 +01:00
Xuan-Son Nguyen
49bfddeca1
server: allow router to report child instances sleep status ( #20849 )
...
* server: allow router to report child instances sleep status
* refactor
* move sleeping to state
* nits
2026-03-22 18:33:52 +01:00
Concedo
ef854f002e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/python-type-check.yml
# AGENTS.md
# CONTRIBUTING.md
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/compare_tokens.py
# examples/pydantic_models_to_grammar.py
# ggml/src/ggml-rpc/ggml-rpc.cpp
# pyrightconfig.json
# scripts/compare-llama-bench.py
# scripts/jinja/jinja-tester.py
# scripts/server-bench.py
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-llama-grammar.cpp
# tests/test-tokenizer-random.py
# tools/cli/README.md
# tools/completion/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2026-03-22 23:39:13 +08:00
Evgeny Kurnevsky
81bc4d3ddc
server: fix Host header ( #20843 )
...
It should include port when it's not default.
2026-03-22 22:29:22 +08:00
ddh0
3306dbaef7
misc : prefer ggml-org models in docs and examples ( #20827 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* misc : prefer ggml-org models in docs and examples
Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.
* remove accidentally committed file
2026-03-21 22:00:26 +01:00
Sigbjørn Skjæret
29b28a9824
ci : switch from pyright to ty ( #20826 )
...
* type fixes
* switch to ty
* tweak rules
* tweak more rules
* more tweaks
* final tweak
* use common import-not-found rule
2026-03-21 08:54:34 +01:00
Concedo
6054bacadd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# CONTRIBUTING.md
# docs/autoparser.md
# docs/ops.md
# docs/ops/Metal.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hip/CMakeLists.txt
# models/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/llama-cpp-deepseek-r1.jinja
# models/templates/meetkai-functionary-medium-v3.1.jinja
# scripts/fetch_server_test_models.py
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/server/README.md
2026-03-21 12:06:01 +08:00
Concedo
98f099aecc
Merge commit ' c1258830b2' into concedo_experimental
...
# Conflicts:
# docs/docker.md
# docs/ops.md
# docs/ops/WebGPU.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
Piotr Wilkin (ilintar)
b1c70e2e54
common/parser: fix nasty bug causing subtle corruption of generation prompt ( #20825 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Update Operations Documentation / update-ops-docs (push) Has been cancelled
2026-03-21 00:19:04 +01:00
Xuan-Son Nguyen
fb78ad29bb
server: (doc) clarify in-scope and out-scope features ( #20794 )
...
* server: (doc) clarify in-scope and out-scope features
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-20 14:03:50 +01:00
Georgi Gerganov
ab9d4c3678
server : improve mtmd ctx checkpoints ( #20726 )
...
* server : improve mtmd ctx checkpoints
* server : fix off-by-one in pos_min_thold
2026-03-20 11:13:12 +02:00
Ben Racicot
c1b911654a
server: fix router mode deadlock on child crash and TOCTOU race in models_max ( #20763 )
...
Two bugs in `server_models::load()` that affect router mode reliability:
**Bug 1: Deadlock when child process crashes**
When a child process is killed (e.g., SIGKILL from OS code signature
validation), the monitoring thread deadlocks on `stopping_thread.join()`
because the stopping_thread's wait predicate (`is_stopping`) is never
satisfied — the model name was never inserted into `stopping_models`.
`update_status()` is never reached and the model stays stuck in LOADING
state permanently.
Fix: extend the stopping_thread's wait predicate to also wake when the
child process is no longer alive (`!subprocess_alive()`). When woken by
a dead child, the thread skips the shutdown sequence and returns
immediately. The original `stopping_models.erase()` logic is preserved
for normal unloads.
**Bug 2: TOCTOU race bypasses `--models-max` (ref #20137 )**
`unload_lru()` is called outside the mutex, then `load()` acquires the
lock afterward. Under concurrent requests, multiple threads observe
capacity and all proceed to load, exceeding the limit.
Fix: re-check capacity under the lock after `unload_lru()` returns.
If another thread filled the slot in the window between `unload_lru()`
and the lock acquisition, reject with an error instead of silently
exceeding the limit.
2026-03-19 22:16:05 +01:00
Tomeamis
b739738dad
docs: Update server README to reflect PR #20297 ( #20560 )
2026-03-19 21:28:44 +01:00
Ryan Goulden
26c9ce1288
server: Add cached_tokens info to oaicompat responses ( #19361 )
...
* tests : fix fetch_server_test_models.py
* server: to_json_oaicompat cached_tokens
Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.
2026-03-19 19:09:33 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading ( #20424 )
...
* Implement proper prefill extraction
* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp
* Update tools/server/server-task.cpp
* refactor: move grammars to variant, remove grammar_external, handle exception internally
* Make code less C++y
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
Pascal
4065c1a3a6
Server becomes the source of truth for sampling parameter defaults ( #20558 )
...
* webui: make server the source of truth for sampling defaults
* webui: fix Custom badge for sampling parameters
* webui: log user overrides after server sync
* chore: update webui build output
* fix: Default values for sampling settings config object
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-19 13:20:39 +01:00
Pascal
cd708db0cc
WebUI: Persist the on/off state of the MCP servers for new conversations ( #20750 )
...
* webui: add persistent storage for MCP server on/off state in new chats
* webui: simplify MCP enabled checks, remove dead server.enabled fallback
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-19 12:54:06 +01:00
Aleksander Grygier
512bba6ee0
webui: Improve model parsing logic + add unit tests ( #20749 )
...
* add tests for model id parser
* add test case having activated params
* add structured tests for model id parser
* add ToDo
* feat: Improve model parsing logic + tests
* chore: update webui build output
---------
Co-authored-by: bluemoehre <bluemoehre@gmx.de>
2026-03-19 12:25:50 +01:00
Concedo
48f914e374
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/arch/riscv/repack.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-exp.h
# ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync-ggml.last
# tests/test-backend-sampler.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
crsawyer
5744d7ec43
Rebuild index.html.gz ( #20724 )
2026-03-18 18:49:57 +01:00
Julien Chaumond
48e61238e1
webui: improve tooltip wording for attachment requirements ( #20688 )
...
* webui: improve tooltip wording for attachment requirements
Co-Authored-By: Claude <Agents+claude@huggingface.co>
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Claude <Agents+claude@huggingface.co>
2026-03-18 14:01:02 +01:00