Commit graph

844 commits

Author SHA1 Message Date
Georgi Gerganov
bbce619adb
cmake : add install() for impl libraries + fix apple builds (#23511)
* pi : update

* ci : fix ios build

* ci : fix andoroid

* ci : fix apple builds

* cmake : add install() for impl libraries

Add install(TARGETS <target> LIBRARY) for all -impl libraries that were
changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in
commit bb28c1fe2. Without this, cmake --install fails to copy the shared
libraries, causing runtime errors like:

  llama-server: error while loading shared libraries: libllama-server-impl.so

Ref: https://github.com/ggml-org/llama.cpp/issues/23494#issuecomment-4512912515

Assisted-by: llama.cpp:local pi

* ci : fix xcframework build
2026-05-22 11:46:26 +03:00
Georgi Gerganov
bb28c1fe24
cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462)
* cmake : remove STATIC from impl libraries, allow BUILD_SHARED_LIBS control

Remove explicit STATIC from all -impl libraries (server, cli, completion, bench,
batched-bench, fit-params, quantize, perplexity) so BUILD_SHARED_LIBS controls
shared vs static linkage.

Add WINDOWS_EXPORT_ALL_SYMBOLS ON for proper DLL export on Windows.

Assisted-by: llama.cpp:local pi

* cmake : enable LLAMA_BUILD_APP by default

Assisted-by: llama.cpp:local pi

* ci : disable app in build-cmake-pkg.yml
2026-05-21 21:13:59 +03:00
ScrewTSW
b65bb4baae
server: expose prompt token counts in /slots endpoint (#23454)
Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache
to the /slots JSON response. These fields are already tracked internally
but were not exposed, making it impossible for clients to monitor prompt
evaluation progress during processing.
2026-05-21 13:29:13 +02:00
Aman Gupta
52fb93a2bd
server : free draft/MTP resources on sleep to fix VRAM leak (#23461)
The destroy() function in server_context_impl only cleaned up the main
model and context (via llama_init.reset()) but did not free the speculative
decoder (spec), draft context (ctx_dft), or draft model (model_dft).

For MTP (Multi-Token Prediction) models, ctx_dft holds GPU-allocated
resources (KV cache, compute buffers) that are not freed when entering
the sleeping state. On each sleep/resume cycle, new resources are
allocated without the old ones being freed, leading to a VRAM leak
that eventually crashes the server with out-of-memory errors.

Fix by explicitly resetting spec, ctx_dft, and model_dft in destroy()
before resetting llama_init, ensuring proper cleanup order to avoid
use-after-free.

ref: https://github.com/ggml-org/llama.cpp/issues/23395

Assisted-by: llama.cpp:local pi
2026-05-21 16:11:11 +08:00
Pascal
c9021714e8
server: re-inject subcommand when router spawns children under unified binary (#23442) 2026-05-21 10:09:19 +02:00
Adrien Gallouët
1d7ab2b947
app : add batched-bench, fit-params, quantize & perplexity (#23459)
Some checks are pending
Python Type-Check / python type-check (push) Waiting to run
* app : add batched-bench, fit-params, quantize & perplexity

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Add missing main.cpp

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Add EOL

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-21 10:29:44 +03:00
Aleksander Grygier
5e932a1c8d
ui: Improve Git Hooks for UI development (#23403)
* refactor: Improve Git Hooks for UI development

* fix: Address review comments

* fix: Use absolute git path for `/hooks`

Co-authored-by: Pascal <admin@serveurperso.com>

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2026-05-21 08:27:50 +02:00
wendadawen
6a257d4463
mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329)
- HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference.
- Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch
2026-05-21 00:35:37 +02:00
stduhpf
3a479c9132
ui: Add max image size option (#22849)
* webui: Add max image size option

* remove magic numbers

* support all image formats

* use const

* Move regex to match b64 images to constants

* use SETTINGS_KEYS to get max image resolution setting

* Do not touch the image if already under the size threshold
2026-05-21 00:00:09 +02:00
Saba Fallah
a8681a0ed2
mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345)
* mtmd : deepseek-ocr fixes, improvements and refactoring

- image processing changes to achieve full parity with Pillow (reference impl)
- SAM mask casting only when flash-attn is on
- SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it)
- llama-chat changes to fix server/WebUI issue (new media_markers_first())
- adapted test-chat-template and added test cases for deepseek-ocr
- changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model
- ty.toml ignore unresolved-import for tools/mtmd/tests/**

* image-text reordering fix removed

* refactor bool add_padding + pad_rounding enum into a single pad_style enum
2026-05-20 17:37:10 +02:00
Aleksander Grygier
6ce96713de
feat: Add WAV MIME type variants and improve audio format detection (#23396) 2026-05-20 16:55:24 +02:00
Adrien Gallouët
29f1482221
app : introduce the llama unified executable (#23296)
* app : introduce the llama unified executable

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use serve for server

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Hide completion and bench, add help command

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Remove STATIC

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use -impl targets instead of -lib

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Revert "Remove STATIC"

This reverts commit cc44caccb9902b34a3531633edac911e5b3d65cd.

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-20 13:22:22 +02:00
Aleksander Grygier
e6b4acfe86
refactor: Move text attachments up before the message content in chat completions payload (#23406) 2026-05-20 13:04:01 +02:00
Xuan-Son Nguyen
e2b129e1bf
mtmd: fit_params now take into account mmproj (#21489)
* mtmd: fit_params now take into account mmproj

* rename alloc_compute_meta to reserve_compute_meta

* rm unused functions

* add ggml_backend_dev_t support

* add debug log
2026-05-20 11:27:44 +02:00
Aleksander Grygier
5028447384
ui: Refactor isMobile as reactive value in viewport store (#23330)
* refactor: `isMobile` as reactive value in `viewport` store

* refactor: Use Svelte media query for the viewport store
2026-05-20 10:52:00 +02:00
Aleksander Grygier
585080d310
fix: Div wrapper no pointer events on hidden (#23390)
Some checks failed
Python Type-Check / python type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
2026-05-20 09:46:31 +02:00
Aleksander Grygier
67ace021da
refactor: Chat Screen UI rendering (#23333) 2026-05-19 22:38:42 +02:00
Johannes Gäßler
7256fce047
common: fix --fit verbosity with --verbosity 4 (#23282) 2026-05-19 21:33:23 +02:00
Georgi Gerganov
d14ce3dab4
llama : MTP clean-up (#23269)
* llama : disable equal splits for recurrent memory with partial rollback

* spec : re-enable p-min with MTP drafts

* spec : re-enable ngram spec in combination with RS rollback

* spec : fix ngram-map-* params

* spec : fix acceptance logic in combined ngram + draft configs

* graph : fix reuse for combined `token` + `embd` batches

* spec : log parameters for each speculative implementation

- add LOG_INF in each constructor with implementation type and parameters
- extract device string logic into common_speculative_get_devices_str()
- move 'adding speculative implementation' log from init into constructors

Assisted-by: llama.cpp:local pi

* spec : extend --spec-default with ngram-map-k4v

Assisted-by: llama.cpp:local pi

* minor : fix n_embd log

* args : update draft.n_max == 3 + regen docs

* spec : relax ngram-mod rejection thold to 0.25 @ 5 low

* logs : improve

* docs : update speculative decoding CLI argument documentation

- Add missing draft model CPU scheduling and tensor override parameters
- Update --spec-type to include all available types (excluding draft-eagle3 WIP)
- Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
- Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
- Add environment variables for new parameters

Assisted-by: llama.cpp:local pi

* arg : step-back on adding k4v to the default spec config

* cont : fix name
2026-05-19 15:32:58 +03:00
Aleksander Grygier
6db130445d
ui: Bump packages + address build warnings (#23300)
* chore: Update vulnerable packages

* chore: Formatting

* refactor: Update Tailwind CSS imports

* ci: Use `ubuntu-latest` for Unit/E2E UI tests

* chore: Bump package

* fix: Add missing tag

* refactor: Enums files naming
2026-05-19 10:16:04 +02:00
Pascal
ccee426426
server-context: guarantee there is at least 1 token to decode (#23280) 2026-05-19 09:49:01 +03:00
Georgi Gerganov
3c81c8deea
server : print graphs reused in slot timings (#23279)
Add graphs reused counter to the per-slot timing output, printed via
llama_perf_context().

Assisted-by: llama.cpp:local pi

Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
2026-05-19 09:46:58 +03:00
Aleksander Grygier
3a9c1b854d
ui: Update KaTeX package and clean up logs from sass warnings (#23275)
* ui: migrate katex imports to @use to resolve SCSS deprecation warnings

* ci: Use `ubuntu-slim` for CI (UI) workflow
2026-05-18 16:26:01 +02:00
Aleksander Grygier
b9a2170fce
feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270) 2026-05-18 16:17:21 +02:00
Aleksander Grygier
1ff0fc1384
ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236)
* refactor: Scope console logs to `DEV` + `VITE_DEBUG` env vars

* refactor: skip MCP proxy probe when no server requires it

* refactor: suppress expected disconnect errors during MCP client shutdown

* refactor: Deduplicate requests

* refactor: deduplicate model fetching across ROUTER and MODEL modes

* refactor: Clean up models logic

* chore: Add `.env.example` file

* refactor: replace client-side CORS proxy probe with server status flag

* refactor: Post-review fixes

* test: add vitest client setup with API fetch mocks
2026-05-18 16:09:40 +02:00
Aleksander Grygier
a135ec0baa
ui: Centralize monospace font styles in app.css (#23272)
Some checks failed
Python Type-Check / python type-check (push) Has been cancelled
2026-05-18 15:10:14 +02:00
Martin Andersson
232f466583
webui: fix Tailwind v4 utility classes missing when built via cmake (#23253) 2026-05-18 14:08:02 +02:00
Aldehir Rojas
87589042ca
cmake : fix LLAMA_BUILD_UI logic (#23190) 2026-05-17 14:42:26 -04:00
Aman Gupta
3e12fbdea5
llama: avoid copying logits during prompt decode in MTP (#23198)
* llama: avoid copying logits during prompt decode in MTP

* review: update comment

* llama-graph: call set_output for t_h_pre_norm
2026-05-17 23:30:25 +08:00
Aldehir Rojas
39cf5d6191
common : delegate assistant continuation to underlying template handlers (#23089)
* common : delegate assistant continuation to template handler

* server : implement echo parameter to exclude assistant prefill in the response

* server : fix tests for prefill

* server : use existing llama template

* cont : clean up
2026-05-17 13:36:05 +02:00
Rares Vernica
1a68ec9378
server : honor --embd-normalize CLI arg (#23125)
The --embd-normalize flag was registered only for the embedding and debug
examples, so llama-server rejected it and the /embedding handler used a
hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's
example set and read params.embd_normalize as the handler's default. The
per-request "embd_normalize" body field continues to override.
2026-05-17 09:39:04 +03:00
Judd
4f13cb7424
webui: support video files as input (#22830) 2026-05-17 02:13:44 +02:00
Xuan-Son Nguyen
b64739ea39
server: (router) alloc tmp buffer on heap (#23159) 2026-05-16 23:42:16 +02:00
Pascal
64b38b561b
server: skip device enumeration in router mode to avoid creating CUDA primary context (#23137) 2026-05-16 21:21:06 +02:00
Aleksander Grygier
0253fb21f5
ui: Add request timeout for MCP tool calls (#23138)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* feat: Add request timeout for MCP tool calls in llama-ui

* feat: MCP Settings tab with max timeout setting
2026-05-16 15:20:27 +02:00
Holger Voormann
25b1bc9c2f
ui: Correct links in tools/ui/README.md [no ci] (#23139)
In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`.

See 59778f0196.
2026-05-16 14:42:38 +02:00
Aman Gupta
255582687b
llama + spec: MTP Support (#22673)
* spec: support MTP

* fix batch size

* rename files

* cont : simplify (#7)

* MTP: clean-up (#9)

* MTP: clean-up

* review: use llama_context_type instead of llama_graph_type

* review: remove llama_model_has_mtp

* review: fix convert issues

* convert: fix pycheck

* review: formatting

* use `mtp-` for identifying mtp models

* convert: fix mtp conversion

* mtp -> draft-mtp

* remove unused llama_arch

* add need_embd in speculative

* llama: allow partial seq_rm for GDN models for speculative decoding

Currently speculative checkpoint needs to restart from a checkpoint
after some draft tokens are not accepted, this leads to some wastage in
running the target again. This PR adds the ability to rollback upto
`draft_max` by storing the GDN intermediates.

* fix pending state

* vulkan: add GDN partial rollback

* meta: extend check to axis 1

* metal: add GDN partial rollback

Extend the gated delta net kernel to store intermediate states for
partial rollback support on the Metal backend.

- Add K (snapshot slot count) as a function constant
- Read input state from slot 0 of the 3D state tensor
- Write intermediate states to different slots during token loop
- For K=1, maintain backward-compatible single-slot behavior

Ref: 8c05923630

Assisted-by: llama.cpp:local pi

* delta_net_base: use ggml_pad instead of new_tensor

* review: add need_rs_seq

* review: rename part_bounded to n_rs

* review: deslop comments

* review: rename, add asserts

* server : adjust checkpoint logic (#11)

* server : adjust checkpoint logic

* cont : rm asserts

* server-context: fix early exit

* spec : fix compatibility with n-gram and add TODOs (#13)

* metal : cleanup

* llama : fix faulty bitwise check in recurrent memory

* server : disable RS-based MTP in combination with other spec types

* spec : add TODOs

* cont : fix comment

* cont : update comment

* common : fix logic for ngram + mtp compat

* llama-memory: enable checkpointing with partial rollback

* cont: add test-case for loading into a dirty ctx

* llama-memory-recurrent: clear rs_idx in clear

* download: fix mtp path

* llama-arch: fix enorm op

* docs: update docs

* conversion: fix type annotations

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-05-16 20:06:23 +08:00
kubawoo
b81c2cdd74
ui: Fix handling of MCP resource template parameters (#23117)
* Fix handling of MCP resource template parameters

* Fix formatting for uri-template.test.ts

---------

Co-authored-by: kuba <kuba@laptop.local.net>
2026-05-16 13:25:41 +02:00
viggy
1428004808
webui : [ChatFormActionAdd][a11y] fix accessibility issues in add menu trigger and items (#22736)
* fix tab order on attach button, and dont focus on disabled mennu item

* add a11y tests
2026-05-16 12:00:46 +02:00
Pascal
366c5e2a3b
ui: untrack settings sync in props effect to prevent reactive loop (#23127)
Some checks are pending
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
2026-05-16 11:25:34 +02:00
Aleksander Grygier
59778f0196
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064)
* webui: Move static build output from `tools/server/public` to `build/ui` directory

* refactor: Move to `tools/ui`

* refactor: rename CMake variables and preprocessor defines

- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated)
- Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated)
- Backward compat: old vars auto-forward to new ones with DEPRECATION warning
- Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc.
- Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET
- Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines
- Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED

* refactor: rename CLI flags (--webui -> --ui) with backward compat

- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases)
- Add --ui-config (old --webui-config kept as deprecated alias)
- Add --ui-config-file (old --webui-config-file kept as deprecated alias)
- Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated)
- Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY
- C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields
- Backward compat: old fields synced to new ones in g_params_to_internals

* refactor: update C++ server internals with backward compat

- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta)
- Rename params.webui usage -> params.ui (both synced, old still works)
- JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys
- Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy
- Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)

* refactor: rename CI/CD workflows, artifacts, and build script

- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build
- Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT
- Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks
- Update server.yml: job/artifact refs webui-build -> ui-build
- Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT
- Update server-self-hosted.yml: webui-build -> ui-build
- Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION
- Rename webui-download.cmake -> ui-download.cmake (internal refs updated)
- Update labeler.yml: server/webui -> server/ui path label

* docs: update CODEOWNERS and server README docs

- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/
- Update server README.md: CLI tables show --ui flags with deprecated --webui aliases
- Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/

* fix: Small fixes for UI build

* fix: CMake.txt syntax

* chore: Formatting

* fix: `.editorconfig` for llama-ui

* chore: Formatting

* refactor: Use `APP_NAME` in Error route

* refactor: Cleanup

* refactor: Single migration service

* make llama-ui a linkable target

* fix: UI Build output

* fix: Missing change

* fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI

* refactor: UI workflows cleanup

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-05-16 02:02:40 +02:00
Julien Chaumond
6831fe470c
docs: document usage object in server timings response (#23110)
* docs: document `usage` object in server timings response

Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>

* Apply suggestion from @julien-c

---------

Co-authored-by: julien-agent <Agents+cyolo@huggingface.co>
2026-05-15 19:33:12 +02:00
Xuan-Son Nguyen
72e60f500d
mtmd: add chunks and fix preproc for qwen3a (#23073)
* mtmd: add chunks and fix preproc for qwen3a

* add attn_mask

* limit mtmd_chunk size (avoid blow up memory)

* correct audio tokens

* re-order the set_input case

* remove attn_mask
2026-05-15 19:32:47 +02:00
Pascal
8be1786707
webui: fix theme from --webui-config-file not applied on first load (fresh localStorage) (#22902) 2026-05-15 19:25:38 +02:00
Pascal
d528444580
webui: preserve partial response on streaming error (#23090) 2026-05-15 11:18:11 +02:00
Sid Shaytay
91e84fed64
Support for Codex CLI by skipping unsupported Responses tools (#23041)
Some checks are pending
Python Type-Check / python type-check (push) Waiting to run
* Support for Codex CLI by skipping unsupported Responses tools

* Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection

* Revert gpt-oss apply_patch special handling
2026-05-15 09:03:24 +02:00
Aleksander Grygier
0c3e4fccca
fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051)
* fix: Propagate version tag to WebUI asset download in self-hosted CI

* refactor: Apply suggestions from @CISC

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix: Skip npm build when Node.js is not installed

Avoid 'no such file or directory' errors on CI runners that lack
Node.js. Check if npm is available via find_program before attempting
npm install + npm run build. Falls back to HF Bucket download.

* fix: Use + separator for ASSETS list to fix Windows build

Replace fragile \; escaping with a + separator when passing the
WebUI asset list via -DASSETS to the download script. On Windows,
the \; escaping was not reliably preserved through the CMake build
system, causing all asset filenames to be concatenated into one
(e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single
file), which broke the HF Bucket download and subsequent xxd.cmake
step.

+ is safe because it is not special in cmd.exe (unlike | which is a
pipe operator), not special in CMake's -D argument parser, and not
a valid Windows filename character. CMakeLists.txt joins assets
with + and webui-download.cmake splits them back via regex.

* fix: Validate HF_WEBUI_VERSION environment variable with regex

Add input validation for the HF_WEBUI_VERSION env var to prevent
CMake list separator or path-traversal issues in stamp filenames
and download URLs. Rejects non-conforming characters early.

* fix: Remove 'latest' fallback for HF_WEBUI_VERSION

When needs.determine-tag.outputs.tag_name is empty, let CMake's
default resolution handle it (empty -> git-based version lookup)
instead of falling back to 'latest'. This ensures the sentinel
stamp file is consistent with CMake's resolution logic.

* fix: Demote checksum verification failure to warning instead of hard gate

* fix: End line character

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 17:57:20 +02:00
Aleksander Grygier
253ba110bc
webui: Move static build output from repo code to HF Bucket (#22937)
* ci: add workflow to publish webui to Hugging Face bucket

* ci: add webui release job to release workflow

* ci: test webui release job

* chore: Return to default minification strategy for build output files

* ci: extract webui build into separate workflow and job

* chore: Ignore webui static output + clean up references

* chore: Delete legacy webui static output

* chore: Ignore webui build static output

* fix: Workflow

* fix: Versioning naming

* chore: Update package name

* test: Test CI fix

* refactor: Naming

* server: implement webui build strategy with HF Bucket support

* chore: Remove test workflow

* chore: Use WebUI build workflow call in other workflows

* server: HF Buckets fallback for WebUI build

* refactor: App name variable

* refactor: Naming

* fix: Retrieve loading.html

* fix: workflow syntax

* fix: Rewrite malformed release.yml

* fix: Req param

* test: Re-add missing Playwright installation for CI tests

* refactor: Logic & security improvements

* refactor: Retrieve publishing jobs and DRY the workflows

* fix: Test workflow syntax

* fix: Upstream Release Tag for test workflow

* chore: Remove test workflow

* ci: Run WebUI jobs on `ubuntu-24.04-arm`

* refactor: Post-CR cleanup

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* refactor: CI cleanup

* refactor: Cleanup

* test: Test workflow

* refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads

* server: add fallback mechanism for HF Bucket webui downloads from latest directory

* fix: Incorrect argument order in file(SHA256) calls for checksum verification

* refactor: Use cmake script for handling the HF Bucket download on build time

* feat: support local npm build for WebUI assets

* refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning

* refactor: Cleanup

* chore: Remove test workflow

* fix: remove s390x from release workflow

* fix: add webui-build dependency to ubuntu-22-rocm and windows-hip

* Revert "fix: remove s390x from release workflow"

This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19.

* fix: Release workflow file

* fix: Proper release tag used for HF Bucket upload

* fix: Remove duplicate steps in release workflow

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 13:21:41 +02:00
Georgi Gerganov
67b2b7f2f2
logs : reduce (#23021)
Some checks failed
Python Type-Check / python type-check (push) Waiting to run
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Update Operations Documentation / update-ops-docs (push) Has been cancelled
* logs : reduce

* args : fix envs

* server : fix build

* common : print verbosity level at start

* server : clean-up logs

* server : print prompt processing timings + sampling params

* minor : whitespaces
2026-05-14 13:05:52 +03:00
Aleksander Grygier
320a6a44a5
fix: Autoscroll detection (#23026) 2026-05-14 08:09:29 +02:00