Concedo
9e9497f0cc
Merge remote-tracking branch 'origin/upstream' into concedo_experimental
...
# Conflicts:
# examples/save-load-state/save-load-state.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemm_noshuffle_q8_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32_spec.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q8_0_f32.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync-ggml.last
# scripts/sync_vendor.py
# src/llama-graph.cpp
# tests/test-backend-ops.cpp
# tests/test-state-restore-fragmented.cpp
2026-05-06 21:20:06 +08:00
Concedo
7240da764a
Merge commit ' 935a340292' into concedo_experimental
...
# Conflicts:
# examples/diffusion/CMakeLists.txt
# scripts/server-test-function-call.py
# src/llama-model.cpp
# src/models/gemma4.cpp
# tests/test-chat.cpp
# tests/test-reasoning-budget.cpp
# tools/server/README.md
2026-05-06 21:02:25 +08:00
Aleksander Grygier
e3e3f8e46a
webui: Remove Google Favicons & Improve MCP Information logic & UI ( #22719 )
...
* refactor: Remove Google favicon utility
* fix: MCP Server favicon
* refactor: Cleanup
* refactor: MCP Server Information
* fix: Fix MCP Settings UI
* refactor: Cleanup
2026-05-06 11:12:27 +02:00
viggy
07eaf919ed
add tabindex and aria-hidden ( #22699 )
2026-05-06 09:21:58 +02:00
Georgi Gerganov
2bacb1eb77
server : validate --tools CLI argument against known tool names ( #22538 )
...
Previously, unknown tool names passed via --tools were silently ignored.
Now the server validates each tool name at startup and exits with an
error if an unrecognized tool is specified, listing the available tools.
Assisted-by: llama.cpp:local pi
2026-05-05 06:35:27 +03:00
Georgi Gerganov
d6e7b033a4
llama : add option to save memory in device buffers ( #22679 )
...
* llama : add option to save memory in device buffers
* tests : extend llama-save-load-state
2026-05-05 06:35:07 +03:00
Xuan-Son Nguyen
935a340292
server: implement /models?reload=1 ( #21848 )
2026-05-04 16:23:26 +02:00
JusteLeo
36a694c965
webui : fix circular dependency between chat.service.ts and models.svelte.ts ( #22625 )
2026-05-04 13:38:10 +02:00
Piotr Wilkin (ilintar)
a4701c98f7
common/autoparser: fixes for newline handling / forced tool calls ( #22654 )
...
* chat/autoparser: the fixes
* Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls.
* Trim whitespace on apply instead
2026-05-04 13:18:11 +02:00
Evan Huus
c84e6d6db5
server: Add a simple get_datetime server tool ( #22649 )
2026-05-04 12:19:41 +02:00
Concedo
2905c6254f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .pi/gg/SYSTEM.md
# docs/speculative.md
# ggml/src/ggml-virtgpu/virtgpu-shm.cpp
# ggml/src/ggml-virtgpu/virtgpu.cpp
# ggml/src/ggml-virtgpu/virtgpu.h
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-05-04 15:36:13 +08:00
Nick Towle
fa8feaed34
webui: restore missing settings ( #22666 )
Python Type-Check / python type-check (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
2026-05-04 09:04:07 +02:00
Georgi Gerganov
846262d787
docs : update speculative decoding parameters after refactor ( #22397 ) ( #22539 )
...
* docs : update speculative decoding parameters after refactor (#22397 )
Update docs/speculative.md to reflect the new parameter naming scheme
introduced in PR #22397 :
- Replace --draft-max/--draft-min with --spec-draft-n-max/--spec-draft-n-min
- Replace --spec-ngram-size-n/m with per-implementation variants
- Add documentation for all new --spec-ngram-*- parameters
- Update all example commands
Assisted-by: llama.cpp:local pi
* pi : add rule to use gh CLI for GitHub resources
Assisted-by: llama.cpp:local pi
* docs : run llama-gen-docs
* arg : fix typo
2026-05-04 08:52:07 +03:00
Georgi Gerganov
0754b7b6fe
server : avoid checkpoint data host copies ( #22558 )
...
* server : avoid checkpoint data host copies
* llama : refactor llama_io_read_i
2026-05-02 18:03:25 +03:00
Concedo
7c70187e26
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# .github/ISSUE_TEMPLATE/020-enhancement.yml
# .github/ISSUE_TEMPLATE/030-research.yml
# .github/ISSUE_TEMPLATE/040-refactor.yml
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-hexagon/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/cmake-toolchain.cmake
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hmx-ops.h
# ggml/src/ggml-hexagon/htp/hmx-utils.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-exp.h
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-virtgpu/ggml-backend.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# ggml/src/ggml-zendnn/ggml-zendnn.cpp
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2026-05-02 18:07:50 +08:00
Aleksander Grygier
ab6120cde5
webui: Spring Cleaning Refactor v1 ( #22505 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
* refactor: Storybook cleanup
* refactor: isInViewport util function
* refactor: Rename globally `onClick` to `onclick`
* chore: `npm audit fix`
* refactor: Action Icon usage
* refactor: Naming
* refactor: JS in `class` directive
* refactor: Chat components cleanup WIP
* refactor: Components structure
* refactor: Cleanup WIP
* feat: New ChatAttachmentsPreview component
* feat: UI improvements
* feat: UI improvements
* refactor: Cleanup
* refactor: ChatAttachmentsPreview UI/UX
* refactor: Remove dead code
* refactor: Cleanup
* fix: Model Name aliases displaying
* feat: Shortcut improvements
* refactor: Chat Message
* feat: Move Import/Export to settings
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-05-01 18:36:29 +02:00
Concedo
61478cbf4a
Merge commit ' c20c44514a' into concedo_experimental
...
# Conflicts:
# .github/workflows/python-type-check.yml
# examples/speculative/speculative.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# scripts/jinja/jinja-tester.py
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
2026-05-01 00:07:46 +08:00
Concedo
37073bc13d
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/mmq.cuh
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-log.cpp
2026-04-30 17:37:52 +08:00
Georgi Gerganov
80afa33aad
spec : fix draft model checkpoints ( #22521 )
...
* spec : fix draft model checkpoints
* cont : clean-up
* cont : gate the ngram-mod reset warning behind verbose flag
2026-04-30 08:32:18 +03:00
Concedo
45f8ff49bb
Merge commit ' 52e5f0a5c1' into concedo_experimental
...
# Conflicts:
# examples/gen-docs/gen-docs.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-vulkan/ggml-vulkan.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/binary.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/rms_norm_mul.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/ssm_scan.wgsl
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-reasoning-budget.cpp
# tools/llama-bench/llama-bench.cpp
# tools/rpc/rpc-server.cpp
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
# tools/server/webui/src/lib/components/app/chat/ChatSidebar/ChatSidebar.svelte
# tools/server/webui/src/routes/(chat)/+page.svelte
2026-04-29 22:27:36 +08:00
Georgi Gerganov
683c5acb90
spec : disacard last drafted token with low prob ( #22506 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / python type-check (push) Waiting to run
2026-04-29 17:00:00 +03:00
Pascal
59237bfbbc
webui: fix slow mic stop and WAV encode ( #22480 )
...
* webui: instant mic stop, race-free recorder restart
* webui: faster WAV PCM encode via hoisted channels and Int16Array
* chore: update webui build output
* webui: drop setTimeout(0) hack and harden cancelRecording
* chore: update webui build output
2026-04-29 12:58:35 +02:00
Aleksander Grygier
f42e29fdf1
webui: Server tools ( #21237 )
...
* wip: server_tools
* feat: Integrate with `/tools` endpoint
* feat: Builtin + MCP + JSON Schema Tools WIP
* refactor
* displayName -> display_name
* snake_case everywhere
* rm redundant field
* feat: Improvements
* chore: update webui build output
* refactor: Updates after server updates
* chore: update webui build output
* change arg to --tools all
* feat: UI improvements
* chore: update webui build output
* add readme mention
* llama-gen-docs
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* feat: Reorganize settings sections
* feat: Separate dialogs for MCP Servers Settings and Import/Export
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* feat: WIP
* WIP on allozaur/20677-webui-server-tools
* feat: UI improvements
* chore: Update package lock
* chore: Run `npm audit fix`
* feat: UI WIP
* feat: UI
* refactor: Desktop Icon Strip DRY
* feat: Cleaner rendering and transition for ChatScreen
* feat: UI improvements
* feat: UI improvement
* feat: Remove MCP Server "enable" switch from Tools submenu
* chore: Run `npm audit fix`
* feat: WIP
* feat: Logic improvements
* refactor: Cleanup
* refactor: DRY
* test: Fix Chat Sidebar UI Tests
* chore: Update package lock
* refactor: Cleanup
* feat: Chat Message Action Card with Continue and Permission flow implementations
* feat: Add agentic steering messages, draft messages and improve chat UX
* fix: Search results UI
* test: Fix unit test
* feat: UI/UX improvements
* refactor: Simplify `useToolsPanel` access in components
* feat: Implement Processing Info Context API
* feat: Implement 'Go back to chat' functionality for settings
* feat: Enhance MCP Server management in Chat Form Attachments
* style: Minor UI and branding adjustments
* chore: Update webui static build output
* chore: Formatting, linting & type checks
* feat: Draft messages logic
* feat: UI improvements
* feat: Steering Messages improvements
* refactor: Cleanup
* refactor: Cleanup
* feat: Improve UI
* refactor: Settings navigation hook
* refactor: DRY code
* refactor: DRY ChatMessageUser UI components
* refactor: Desktop Icon Strip DRY
* refactor: Tools & permissions
* fix: Navigation condition
* refactor: Cleanup
* refactor: Cleanup
* refactor: Cleanup
* fix: preserve reasoning_content in agentic flow
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-04-28 14:35:49 +03:00
Georgi Gerganov
14e733e36f
spec : refactor params ( #22397 )
...
* spec : refactor params
* cont : fix
* cont : rename "sparam" to "sampling"
* cont : add spec params category
* cont : add info about removed arguments
* cont : skip param length check for spec params
* cont : adapt server tests
2026-04-28 09:07:33 +03:00
Aman Gupta
516e8d7a8a
server: use pos_next instead of n_tokens for m-rope ( #22439 )
2026-04-28 08:41:00 +03:00
tha80
983ca8992e
server: (router) Forward form-data to model server ( Fixes #22044 ) ( #22118 )
...
* This commit enables the router to forward form-data to model server.
Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode)
* * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload.
* Addressed Copilots third comment by extending the files representation to also include filename and content-type
* Addressed Copilots fourth comment by making the RNG thread_local
* Changed variable body from std::string to std::ostringstream in build_multipart_body
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053
* Added sanitize_field lambda in build_multipart_body for key, filename and content_type
as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647
* explicitly checking if value/item is string before calling value/item.get<std::string>()
as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279
* Added double quote to the sanitize lambda and throw on json parse failure
---------
Co-authored-by: Ralph Paßgang <ralph@trust-it.de>
2026-04-27 23:55:00 +02:00
Concedo
340b22283e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/workflows/build-android.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# examples/model-conversion/scripts/causal/convert-model.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/htp_iface.idl
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/libggml-htp.inf
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/mmvq.hpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl
# scripts/server-test-structured.py
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-completion.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# scripts/snapdragon/qdc/requirements.txt
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-completion.ps1
# scripts/snapdragon/windows/run-mtmd.ps1
# scripts/snapdragon/windows/run-tool.ps1
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# ty.toml
2026-04-25 12:13:14 +08:00
Piotr Wilkin (ilintar)
0adede866d
parser: fix structured output bug ( #22302 )
...
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
* fix very stupid structured output bug
* Things just cannot be too easy.
2026-04-24 23:19:55 +02:00
Georgi Gerganov
ffdd983fb8
server : fix swa-full logic ( #22288 )
2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own
793d0a7931
server: rename debug tags to match --cache-idle-slots naming ( #22292 )
2026-04-24 09:28:44 +03:00
srkizer
185cbff6f1
server : convert_anthropic_to_oai: also copy chat_template_kwargs ( #22154 )
2026-04-23 13:32:46 -05:00
Song Li
c78fb909b2
server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) ( #22267 )
...
* server: clamp n_discard to non-negative at JSON parse boundary (CVE-2026-21869)
A negative n_discard from client JSON causes heap-buffer-overflow in
update_slots() context-shift loop (CWE-787, CVSS 8.8). Clamp to 0 at
ingress; n_discard=0 already triggers auto-discard (n_left/2).
Ref: GHSA-8947-pfff-2f3c
* cont : cleaner
* cont : cleanerer
* cont : cleanest
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-23 18:39:07 +02:00
kvc0
c807c6e3b0
server: (anthropic API) fix prefix caching ( #21793 )
...
When testing claude code against llama.cpp, I noticed that only
n_past 18577 was used even when context was 60k or more. The log
in llama-server says:
```
slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are
slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4;
```
I observed that the cch value changed every time. Reading about that,
the x-anthropic-billing-header system message seems to be specially
handled inside of the anthropic api. I could remove it, but there
is a meaningful string sometimes included at the end. So instead,
I just replace the changing cch checksum with fffff.
I'm treating this as an anthropic message body API detail - I think this
is the right way to do this, but by all means please correct me!
It's always 5 hexadecimal characters, but I've written the replacement
defensively in case they change the protocol.
2026-04-23 17:45:02 +02:00
Tarek Dakhran
550d684bd1
server: Enable transcriptions API for LFM2-Audio ( #22000 )
2026-04-23 10:47:26 +02:00
Concedo
0755f27372
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/openvino.Dockerfile
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# common/chat.cpp
# docs/backend/OPENVINO.md
# examples/speculative-simple/speculative-simple.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/libggml-htp.inf
# ggml/src/ggml-openvino/ggml-decoder.cpp
# ggml/src/ggml-openvino/ggml-openvino-extra.cpp
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-openvino/ggml-quants.cpp
# ggml/src/ggml-openvino/openvino/op/rope.cpp
# ggml/src/ggml-openvino/openvino/op_table.cpp
# ggml/src/ggml-openvino/openvino/op_table.h
# ggml/src/ggml-openvino/openvino/translate_session.cpp
# ggml/src/ggml-openvino/openvino/utils.cpp
# ggml/src/ggml-openvino/openvino/utils.h
# ggml/src/ggml-openvino/utils.cpp
# ggml/src/ggml-openvino/utils.h
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/mtmd/CMakeLists.txt
# tools/server/CMakeLists.txt
2026-04-23 00:55:05 +08:00
Piotr Wilkin (ilintar)
8bccdbbff9
chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs ( #22217 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs
* Fix ty errors.
* Fix flake8 err
2026-04-22 18:10:56 +02:00
Georgi Gerganov
bcb5eeb645
speculative-simple : add checkpoint support ( #22227 )
...
* speculative-simple : add checkpoint support
* cont : fix build
2026-04-22 15:44:45 +03:00
Xuan-Son Nguyen
17f6245168
server: ignore reasoning content from transcription api ( #21905 )
2026-04-22 12:10:50 +02:00
Ethan Turner
750579ff14
common: Refactoring sampler parameters ( #20429 ) ( #22233 )
...
This change refactors the reasoning_budget_message parameter from the
common params into the sampling parameters specifically. It also removes
the reasoning_budget common parameter and standardizes on the existing
reasoning_budget_tokens parameter in the sampling configuration.
Issue: https://github.com/ggml-org/llama.cpp/issues/20429
Original PR: https://github.com/ggml-org/llama.cpp/pull/20297
2026-04-22 10:40:19 +02:00
Piotr Wilkin (ilintar)
134d6e54d4
common/chat, server: refactor, move all conversion functions to common, add tests ( #20690 )
...
* Refactor conversion functions
2026-04-22 10:28:45 +02:00
Xuan-Son Nguyen
04fe84b69d
server: allow cancel loading model ( #21814 )
2026-04-22 00:26:09 +02:00
Concedo
19a12bb080
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# scripts/sync-ggml.last
# tools/cli/cli.cpp
# tools/llama-bench/llama-bench.cpp
# tools/perplexity/perplexity.cpp
2026-04-21 18:53:03 +08:00
Georgi Gerganov
cfe9838d26
fit-params : refactor + add option to output estimated memory per device ( #22171 )
...
* fit-params : add option to output estimated memory per device
* cont : minor
* cont : refactor
* cont : move fit params implementation to libcommon
* cont : header
* cont : headers
* cont : codeowners
2026-04-21 09:54:36 +03:00
xris99
ff6b1062af
server : fix hardcoded proxy connection timeout in router mode ( #18760 ) ( #22003 )
...
Fixes: https://github.com/ggml-org/llama.cpp/issues/18760
Co-authored-by: Christian <christian@example.com>
2026-04-21 06:41:14 +02:00
Georgi Gerganov
cf8b0dbda9
server : remove /api endpoints ( #22165 )
...
* server : remove /api endpoints
* cont : remove /api/tags
2026-04-20 20:41:19 +03:00
Concedo
cd6788007e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cross.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/release.yml
# examples/llama.android/lib/src/main/cpp/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/test-chat.cpp
# tests/test-mtmd-c-api.c
# tools/server/README.md
2026-04-20 20:19:11 +08:00
Georgi Gerganov
de71b5f81c
server : refactor "use checkpoint" logic ( #22114 )
2026-04-20 08:42:37 +03:00
Yes You Can Have Your Own
9d49acb2a7
server: rename --clear-idle to --cache-idle-slots ( #21741 )
2026-04-20 08:30:24 +03:00
Sascha Rogmann
455d8e4be8
server : speculative checkpointing ( #19493 )
...
* server : speculative decoding using checkpoints
* server : fix draft check with checkpoints
* server : rename spec vars
* server : log levels
* server : refactored spec logic to speculative.cpp
* server : renamed spec checkpoints option
* server : fix spec checkpoints, logging
* speculative : checkpoints with draft model, logging
* server : n_tokens_cur and create_checkpoint in draft
* server : fix server_speculative_callback (slot.id)
* spec : fix ngram-map/begin idx_last_check
* spec : init ckpt (begin() wasn't called)
* chore: update webui build output
* server : restore sampler in spec checkpoint and clear mem
* cont : avoid --spec-use-checkpoints argument
* cont : remove server_prompt_checkpoint_with_size
* spec : rename (leave_draft_state)
* cont : clean-up
* cont : do not ignore partial drafts even if the are short
* cont : spec callback owned by session
* cont : simplify
* cont : avoid empty speculative session
* cont : simplify
* cont : simplify
* cont : enable mtmd speculative decoding
* cont : keep the spec sampler alive
* cont : simplify
* cont : fix nullptr deref + draft checkpoints
* cont : remove common_speculative_accept_response
* cont : remove callback
* cont : simplify
* cont : minor
* cont : simplify
* cont : fix accepted number
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-19 10:24:06 +03:00
Cetarthoriphros
9e5647affa
server: Expose media_tag on /props endpoint. ( #22028 )
2026-04-19 00:27:17 +02:00