Concedo
718dc159b6
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# docs/speculative.md
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hmx-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-completion.ps1
# scripts/snapdragon/windows/run-mtmd.ps1
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tools/batched-bench/CMakeLists.txt
# tools/batched-bench/batched-bench.cpp
# tools/cli/CMakeLists.txt
# tools/cli/README.md
# tools/cli/cli.cpp
# tools/completion/CMakeLists.txt
# tools/completion/README.md
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/tests/test-deepseek-ocr.py
# tools/mtmd/tests/tests-requirements.txt
# tools/perplexity/CMakeLists.txt
# tools/perplexity/perplexity.cpp
# tools/quantize/CMakeLists.txt
# tools/server/CMakeLists.txt
# tools/server/README.md
# ty.toml
2026-05-21 23:47:21 +08:00
Concedo
54af9aada9
Merge commit ' e6b4acfe86' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/musa.Dockerfile
# .devops/openvino.Dockerfile
# .devops/rocm.Dockerfile
# .devops/s390x.Dockerfile
# .devops/vulkan.Dockerfile
# tools/mtmd/clip.cpp
# tools/mtmd/clip.h
2026-05-21 23:31:32 +08:00
Adrien Gallouët
1d7ab2b947
app : add batched-bench, fit-params, quantize & perplexity ( #23459 )
...
Python Type-Check / python type-check (push) Waiting to run
* app : add batched-bench, fit-params, quantize & perplexity
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add missing main.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add EOL
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-21 10:29:44 +03:00
Aleksander Grygier
5e932a1c8d
ui: Improve Git Hooks for UI development ( #23403 )
...
* refactor: Improve Git Hooks for UI development
* fix: Address review comments
* fix: Use absolute git path for `/hooks`
Co-authored-by: Pascal <admin@serveurperso.com>
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2026-05-21 08:27:50 +02:00
wendadawen
6a257d4463
mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision ( #23329 )
...
- HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference.
- Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch
2026-05-21 00:35:37 +02:00
stduhpf
3a479c9132
ui: Add max image size option ( #22849 )
...
* webui: Add max image size option
* remove magic numbers
* support all image formats
* use const
* Move regex to match b64 images to constants
* use SETTINGS_KEYS to get max image resolution setting
* Do not touch the image if already under the size threshold
2026-05-21 00:00:09 +02:00
Saba Fallah
a8681a0ed2
mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor ( #23345 )
...
* mtmd : deepseek-ocr fixes, improvements and refactoring
- image processing changes to achieve full parity with Pillow (reference impl)
- SAM mask casting only when flash-attn is on
- SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it)
- llama-chat changes to fix server/WebUI issue (new media_markers_first())
- adapted test-chat-template and added test cases for deepseek-ocr
- changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model
- ty.toml ignore unresolved-import for tools/mtmd/tests/**
* image-text reordering fix removed
* refactor bool add_padding + pad_rounding enum into a single pad_style enum
2026-05-20 17:37:10 +02:00
Aleksander Grygier
6ce96713de
feat: Add WAV MIME type variants and improve audio format detection ( #23396 )
2026-05-20 16:55:24 +02:00
Adrien Gallouët
29f1482221
app : introduce the llama unified executable ( #23296 )
...
* app : introduce the llama unified executable
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use serve for server
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Hide completion and bench, add help command
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Remove STATIC
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use -impl targets instead of -lib
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Revert "Remove STATIC"
This reverts commit cc44caccb9902b34a3531633edac911e5b3d65cd.
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-20 13:22:22 +02:00
Aleksander Grygier
e6b4acfe86
refactor: Move text attachments up before the message content in chat completions payload ( #23406 )
2026-05-20 13:04:01 +02:00
Concedo
7d987af23a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/openvino.Dockerfile
# .devops/rocm.Dockerfile
# .devops/s390x.Dockerfile
# .devops/vulkan.Dockerfile
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# .github/workflows/build-and-test-snapdragon.yml
# .github/workflows/docker.yml
# .github/workflows/server-self-hosted.yml
# .github/workflows/ui-ci.yml
# .pi/gg/SYSTEM.md
# README.md
# common/arg.cpp
# docs/backend/SYCL.md
# docs/backend/snapdragon/CMakeUserPresets.json
# docs/backend/snapdragon/README.md
# docs/speculative.md
# examples/save-load-state/save-load-state.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/gated_delta_net.wgsl
# tools/cli/README.md
# tools/server/README.md
2026-05-20 18:48:34 +08:00
Xuan-Son Nguyen
e2b129e1bf
mtmd: fit_params now take into account mmproj ( #21489 )
...
* mtmd: fit_params now take into account mmproj
* rename alloc_compute_meta to reserve_compute_meta
* rm unused functions
* add ggml_backend_dev_t support
* add debug log
2026-05-20 11:27:44 +02:00
Aleksander Grygier
5028447384
ui: Refactor isMobile as reactive value in viewport store ( #23330 )
...
* refactor: `isMobile` as reactive value in `viewport` store
* refactor: Use Svelte media query for the viewport store
2026-05-20 10:52:00 +02:00
Aleksander Grygier
585080d310
fix: Div wrapper no pointer events on hidden ( #23390 )
Python Type-Check / python type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
2026-05-20 09:46:31 +02:00
Aleksander Grygier
67ace021da
refactor: Chat Screen UI rendering ( #23333 )
2026-05-19 22:38:42 +02:00
Johannes Gäßler
7256fce047
common: fix --fit verbosity with --verbosity 4 ( #23282 )
2026-05-19 21:33:23 +02:00
Georgi Gerganov
d14ce3dab4
llama : MTP clean-up ( #23269 )
...
* llama : disable equal splits for recurrent memory with partial rollback
* spec : re-enable p-min with MTP drafts
* spec : re-enable ngram spec in combination with RS rollback
* spec : fix ngram-map-* params
* spec : fix acceptance logic in combined ngram + draft configs
* graph : fix reuse for combined `token` + `embd` batches
* spec : log parameters for each speculative implementation
- add LOG_INF in each constructor with implementation type and parameters
- extract device string logic into common_speculative_get_devices_str()
- move 'adding speculative implementation' log from init into constructors
Assisted-by: llama.cpp:local pi
* spec : extend --spec-default with ngram-map-k4v
Assisted-by: llama.cpp:local pi
* minor : fix n_embd log
* args : update draft.n_max == 3 + regen docs
* spec : relax ngram-mod rejection thold to 0.25 @ 5 low
* logs : improve
* docs : update speculative decoding CLI argument documentation
- Add missing draft model CPU scheduling and tensor override parameters
- Update --spec-type to include all available types (excluding draft-eagle3 WIP)
- Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
- Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
- Add environment variables for new parameters
Assisted-by: llama.cpp:local pi
* arg : step-back on adding k4v to the default spec config
* cont : fix name
2026-05-19 15:32:58 +03:00
Aleksander Grygier
6db130445d
ui: Bump packages + address build warnings ( #23300 )
...
* chore: Update vulnerable packages
* chore: Formatting
* refactor: Update Tailwind CSS imports
* ci: Use `ubuntu-latest` for Unit/E2E UI tests
* chore: Bump package
* fix: Add missing tag
* refactor: Enums files naming
2026-05-19 10:16:04 +02:00
Pascal
ccee426426
server-context: guarantee there is at least 1 token to decode ( #23280 )
2026-05-19 09:49:01 +03:00
Georgi Gerganov
3c81c8deea
server : print graphs reused in slot timings ( #23279 )
...
Add graphs reused counter to the per-slot timing output, printed via
llama_perf_context().
Assisted-by: llama.cpp:local pi
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
2026-05-19 09:46:58 +03:00
Aleksander Grygier
3a9c1b854d
ui: Update KaTeX package and clean up logs from sass warnings ( #23275 )
...
* ui: migrate katex imports to @use to resolve SCSS deprecation warnings
* ci: Use `ubuntu-slim` for CI (UI) workflow
2026-05-18 16:26:01 +02:00
Aleksander Grygier
b9a2170fce
feat: add scroll-to-bottom button to chat + prevent forced scroll down ( #23270 )
2026-05-18 16:17:21 +02:00
Aleksander Grygier
1ff0fc1384
ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 )
...
* refactor: Scope console logs to `DEV` + `VITE_DEBUG` env vars
* refactor: skip MCP proxy probe when no server requires it
* refactor: suppress expected disconnect errors during MCP client shutdown
* refactor: Deduplicate requests
* refactor: deduplicate model fetching across ROUTER and MODEL modes
* refactor: Clean up models logic
* chore: Add `.env.example` file
* refactor: replace client-side CORS proxy probe with server status flag
* refactor: Post-review fixes
* test: add vitest client setup with API fetch mocks
2026-05-18 16:09:40 +02:00
Concedo
fecf2dc3fa
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/server-self-hosted.yml
# CMakeLists.txt
# CODEOWNERS
# ci/run.sh
# cmake/llama-config.cmake.in
# common/chat.cpp
# examples/sycl/start-svr.sh
# examples/sycl/test.sh
# examples/sycl/win-start-svr.bat
# examples/sycl/win-test.bat
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/wc2wt.sh
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2026-05-18 21:27:23 +08:00
Aleksander Grygier
a135ec0baa
ui: Centralize monospace font styles in app.css ( #23272 )
Python Type-Check / python type-check (push) Has been cancelled
2026-05-18 15:10:14 +02:00
Martin Andersson
232f466583
webui: fix Tailwind v4 utility classes missing when built via cmake ( #23253 )
2026-05-18 14:08:02 +02:00
Aldehir Rojas
87589042ca
cmake : fix LLAMA_BUILD_UI logic ( #23190 )
2026-05-17 14:42:26 -04:00
Aman Gupta
3e12fbdea5
llama: avoid copying logits during prompt decode in MTP ( #23198 )
...
* llama: avoid copying logits during prompt decode in MTP
* review: update comment
* llama-graph: call set_output for t_h_pre_norm
2026-05-17 23:30:25 +08:00
Aldehir Rojas
39cf5d6191
common : delegate assistant continuation to underlying template handlers ( #23089 )
...
* common : delegate assistant continuation to template handler
* server : implement echo parameter to exclude assistant prefill in the response
* server : fix tests for prefill
* server : use existing llama template
* cont : clean up
2026-05-17 13:36:05 +02:00
Rares Vernica
1a68ec9378
server : honor --embd-normalize CLI arg ( #23125 )
...
The --embd-normalize flag was registered only for the embedding and debug
examples, so llama-server rejected it and the /embedding handler used a
hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's
example set and read params.embd_normalize as the handler's default. The
per-request "embd_normalize" body field continues to override.
2026-05-17 09:39:04 +03:00
Concedo
1e828ccabf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/common.cpp
# ggml/CMakeLists.txt
# scripts/sync-ggml.last
# scripts/sync_vendor.py
# src/llama-context.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-05-17 11:26:18 +08:00
Judd
4f13cb7424
webui: support video files as input ( #22830 )
2026-05-17 02:13:44 +02:00
Xuan-Son Nguyen
b64739ea39
server: (router) alloc tmp buffer on heap ( #23159 )
2026-05-16 23:42:16 +02:00
Pascal
64b38b561b
server: skip device enumeration in router mode to avoid creating CUDA primary context ( #23137 )
2026-05-16 21:21:06 +02:00
Concedo
9203b6a051
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/release.yml
# .github/workflows/server-sanitize.yml
# .github/workflows/server-self-hosted.yml
# .github/workflows/server.yml
# .github/workflows/ui-build.yml
# .github/workflows/ui-ci.yml
# .github/workflows/ui-publish.yml
# .gitignore
# CMakeLists.txt
# CODEOWNERS
# scripts/ui-download.cmake
# scripts/xxd.cmake
# tests/test-backend-ops.cpp
# tests/test-reasoning-budget.cpp
# tools/CMakeLists.txt
# tools/server/CMakeLists.txt
# tools/server/README.md
2026-05-16 22:56:33 +08:00
Aleksander Grygier
0253fb21f5
ui: Add request timeout for MCP tool calls ( #23138 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* feat: Add request timeout for MCP tool calls in llama-ui
* feat: MCP Settings tab with max timeout setting
2026-05-16 15:20:27 +02:00
Holger Voormann
25b1bc9c2f
ui: Correct links in tools/ui/README.md [no ci] ( #23139 )
...
In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`.
See 59778f0196 .
2026-05-16 14:42:38 +02:00
Aman Gupta
255582687b
llama + spec: MTP Support ( #22673 )
...
* spec: support MTP
* fix batch size
* rename files
* cont : simplify (#7 )
* MTP: clean-up (#9 )
* MTP: clean-up
* review: use llama_context_type instead of llama_graph_type
* review: remove llama_model_has_mtp
* review: fix convert issues
* convert: fix pycheck
* review: formatting
* use `mtp-` for identifying mtp models
* convert: fix mtp conversion
* mtp -> draft-mtp
* remove unused llama_arch
* add need_embd in speculative
* llama: allow partial seq_rm for GDN models for speculative decoding
Currently speculative checkpoint needs to restart from a checkpoint
after some draft tokens are not accepted, this leads to some wastage in
running the target again. This PR adds the ability to rollback upto
`draft_max` by storing the GDN intermediates.
* fix pending state
* vulkan: add GDN partial rollback
* meta: extend check to axis 1
* metal: add GDN partial rollback
Extend the gated delta net kernel to store intermediate states for
partial rollback support on the Metal backend.
- Add K (snapshot slot count) as a function constant
- Read input state from slot 0 of the 3D state tensor
- Write intermediate states to different slots during token loop
- For K=1, maintain backward-compatible single-slot behavior
Ref: 8c05923630
Assisted-by: llama.cpp:local pi
* delta_net_base: use ggml_pad instead of new_tensor
* review: add need_rs_seq
* review: rename part_bounded to n_rs
* review: deslop comments
* review: rename, add asserts
* server : adjust checkpoint logic (#11 )
* server : adjust checkpoint logic
* cont : rm asserts
* server-context: fix early exit
* spec : fix compatibility with n-gram and add TODOs (#13 )
* metal : cleanup
* llama : fix faulty bitwise check in recurrent memory
* server : disable RS-based MTP in combination with other spec types
* spec : add TODOs
* cont : fix comment
* cont : update comment
* common : fix logic for ngram + mtp compat
* llama-memory: enable checkpointing with partial rollback
* cont: add test-case for loading into a dirty ctx
* llama-memory-recurrent: clear rs_idx in clear
* download: fix mtp path
* llama-arch: fix enorm op
* docs: update docs
* conversion: fix type annotations
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-05-16 20:06:23 +08:00
kubawoo
b81c2cdd74
ui: Fix handling of MCP resource template parameters ( #23117 )
...
* Fix handling of MCP resource template parameters
* Fix formatting for uri-template.test.ts
---------
Co-authored-by: kuba <kuba@laptop.local.net>
2026-05-16 13:25:41 +02:00
viggy
1428004808
webui : [ChatFormActionAdd][a11y] fix accessibility issues in add menu trigger and items ( #22736 )
...
* fix tab order on attach button, and dont focus on disabled mennu item
* add a11y tests
2026-05-16 12:00:46 +02:00
Pascal
366c5e2a3b
ui: untrack settings sync in props effect to prevent reactive loop ( #23127 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
2026-05-16 11:25:34 +02:00
Aleksander Grygier
59778f0196
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 )
...
* webui: Move static build output from `tools/server/public` to `build/ui` directory
* refactor: Move to `tools/ui`
* refactor: rename CMake variables and preprocessor defines
- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated)
- Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated)
- Backward compat: old vars auto-forward to new ones with DEPRECATION warning
- Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc.
- Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET
- Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines
- Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED
* refactor: rename CLI flags (--webui -> --ui) with backward compat
- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases)
- Add --ui-config (old --webui-config kept as deprecated alias)
- Add --ui-config-file (old --webui-config-file kept as deprecated alias)
- Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated)
- Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY
- C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields
- Backward compat: old fields synced to new ones in g_params_to_internals
* refactor: update C++ server internals with backward compat
- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta)
- Rename params.webui usage -> params.ui (both synced, old still works)
- JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys
- Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy
- Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)
* refactor: rename CI/CD workflows, artifacts, and build script
- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build
- Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT
- Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks
- Update server.yml: job/artifact refs webui-build -> ui-build
- Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT
- Update server-self-hosted.yml: webui-build -> ui-build
- Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION
- Rename webui-download.cmake -> ui-download.cmake (internal refs updated)
- Update labeler.yml: server/webui -> server/ui path label
* docs: update CODEOWNERS and server README docs
- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/
- Update server README.md: CLI tables show --ui flags with deprecated --webui aliases
- Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/
* fix: Small fixes for UI build
* fix: CMake.txt syntax
* chore: Formatting
* fix: `.editorconfig` for llama-ui
* chore: Formatting
* refactor: Use `APP_NAME` in Error route
* refactor: Cleanup
* refactor: Single migration service
* make llama-ui a linkable target
* fix: UI Build output
* fix: Missing change
* fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI
* refactor: UI workflows cleanup
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-05-16 02:02:40 +02:00
Julien Chaumond
6831fe470c
docs: document usage object in server timings response ( #23110 )
...
* docs: document `usage` object in server timings response
Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
* Apply suggestion from @julien-c
---------
Co-authored-by: julien-agent <Agents+cyolo@huggingface.co>
2026-05-15 19:33:12 +02:00
Xuan-Son Nguyen
72e60f500d
mtmd: add chunks and fix preproc for qwen3a ( #23073 )
...
* mtmd: add chunks and fix preproc for qwen3a
* add attn_mask
* limit mtmd_chunk size (avoid blow up memory)
* correct audio tokens
* re-order the set_input case
* remove attn_mask
2026-05-15 19:32:47 +02:00
Pascal
8be1786707
webui: fix theme from --webui-config-file not applied on first load (fresh localStorage) ( #22902 )
2026-05-15 19:25:38 +02:00
Pascal
d528444580
webui: preserve partial response on streaming error ( #23090 )
2026-05-15 11:18:11 +02:00
Concedo
da2cc90723
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/build-and-test-snapdragon.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/release.yml
# .github/workflows/server-self-hosted.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# .gitignore
# CMakeLists.txt
# CONTRIBUTING.md
# README.md
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# grammars/README.md
# scripts/snapdragon/qdc/run_qdc_jobs.py
# scripts/snapdragon/qdc/tests/run_backend_ops_posix.py
# scripts/snapdragon/qdc/tests/run_bench_tests_posix.py
# scripts/snapdragon/qdc/tests/utils.py
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
# tools/server/webui/src/lib/components/app/server/ServerLoadingSplash.svelte
# tools/server/webui/src/routes/(chat)/chat/[id]/+page.svelte
# ty.toml
2026-05-15 17:09:48 +08:00
Sid Shaytay
91e84fed64
Support for Codex CLI by skipping unsupported Responses tools ( #23041 )
...
Python Type-Check / python type-check (push) Waiting to run
* Support for Codex CLI by skipping unsupported Responses tools
* Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection
* Revert gpt-oss apply_patch special handling
2026-05-15 09:03:24 +02:00
Aleksander Grygier
0c3e4fccca
fix: Propagate version tag to WebUI asset download in self-hosted CI ( #23051 )
...
* fix: Propagate version tag to WebUI asset download in self-hosted CI
* refactor: Apply suggestions from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix: Skip npm build when Node.js is not installed
Avoid 'no such file or directory' errors on CI runners that lack
Node.js. Check if npm is available via find_program before attempting
npm install + npm run build. Falls back to HF Bucket download.
* fix: Use + separator for ASSETS list to fix Windows build
Replace fragile \; escaping with a + separator when passing the
WebUI asset list via -DASSETS to the download script. On Windows,
the \; escaping was not reliably preserved through the CMake build
system, causing all asset filenames to be concatenated into one
(e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single
file), which broke the HF Bucket download and subsequent xxd.cmake
step.
+ is safe because it is not special in cmd.exe (unlike | which is a
pipe operator), not special in CMake's -D argument parser, and not
a valid Windows filename character. CMakeLists.txt joins assets
with + and webui-download.cmake splits them back via regex.
* fix: Validate HF_WEBUI_VERSION environment variable with regex
Add input validation for the HF_WEBUI_VERSION env var to prevent
CMake list separator or path-traversal issues in stamp filenames
and download URLs. Rejects non-conforming characters early.
* fix: Remove 'latest' fallback for HF_WEBUI_VERSION
When needs.determine-tag.outputs.tag_name is empty, let CMake's
default resolution handle it (empty -> git-based version lookup)
instead of falling back to 'latest'. This ensures the sentinel
stamp file is consistent with CMake's resolution logic.
* fix: Demote checksum verification failure to warning instead of hard gate
* fix: End line character
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 17:57:20 +02:00
Aleksander Grygier
253ba110bc
webui: Move static build output from repo code to HF Bucket ( #22937 )
...
* ci: add workflow to publish webui to Hugging Face bucket
* ci: add webui release job to release workflow
* ci: test webui release job
* chore: Return to default minification strategy for build output files
* ci: extract webui build into separate workflow and job
* chore: Ignore webui static output + clean up references
* chore: Delete legacy webui static output
* chore: Ignore webui build static output
* fix: Workflow
* fix: Versioning naming
* chore: Update package name
* test: Test CI fix
* refactor: Naming
* server: implement webui build strategy with HF Bucket support
* chore: Remove test workflow
* chore: Use WebUI build workflow call in other workflows
* server: HF Buckets fallback for WebUI build
* refactor: App name variable
* refactor: Naming
* fix: Retrieve loading.html
* fix: workflow syntax
* fix: Rewrite malformed release.yml
* fix: Req param
* test: Re-add missing Playwright installation for CI tests
* refactor: Logic & security improvements
* refactor: Retrieve publishing jobs and DRY the workflows
* fix: Test workflow syntax
* fix: Upstream Release Tag for test workflow
* chore: Remove test workflow
* ci: Run WebUI jobs on `ubuntu-24.04-arm`
* refactor: Post-CR cleanup
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* refactor: CI cleanup
* refactor: Cleanup
* test: Test workflow
* refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads
* server: add fallback mechanism for HF Bucket webui downloads from latest directory
* fix: Incorrect argument order in file(SHA256) calls for checksum verification
* refactor: Use cmake script for handling the HF Bucket download on build time
* feat: support local npm build for WebUI assets
* refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning
* refactor: Cleanup
* chore: Remove test workflow
* fix: remove s390x from release workflow
* fix: add webui-build dependency to ubuntu-22-rocm and windows-hip
* Revert "fix: remove s390x from release workflow"
This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19.
* fix: Release workflow file
* fix: Proper release tag used for HF Bucket upload
* fix: Remove duplicate steps in release workflow
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 13:21:41 +02:00