koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

Author	SHA1	Message	Date
Concedo	7c70187e26	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # .github/ISSUE_TEMPLATE/020-enhancement.yml # .github/ISSUE_TEMPLATE/030-research.yml # .github/ISSUE_TEMPLATE/040-refactor.yml # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-hexagon/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/cmake-toolchain.cmake # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hmx-ops.h # ggml/src/ggml-hexagon/htp/hmx-utils.h # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-copy.h # ggml/src/ggml-hexagon/htp/hvx-exp.h # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-virtgpu/ggml-backend.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-zdnn/ggml-zdnn.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2026-05-02 18:07:50 +08:00
Aleksander Grygier	ab6120cde5	webui: Spring Cleaning Refactor v1 (#22505 ) * wip: server_tools * feat: Integrate with `/tools` endpoint * feat: Builtin + MCP + JSON Schema Tools WIP * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * feat: Improvements * chore: update webui build output * refactor: Updates after server updates * chore: update webui build output * change arg to --tools all * feat: UI improvements * chore: update webui build output * add readme mention * llama-gen-docs * chore: update webui build output * chore: update webui build output * chore: update webui build output * feat: Reorganize settings sections * feat: Separate dialogs for MCP Servers Settings and Import/Export * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * WIP on allozaur/20677-webui-server-tools * feat: UI improvements * chore: Update package lock * chore: Run `npm audit fix` * feat: UI WIP * feat: UI * refactor: Desktop Icon Strip DRY * feat: Cleaner rendering and transition for ChatScreen * feat: UI improvements * feat: UI improvement * feat: Remove MCP Server "enable" switch from Tools submenu * chore: Run `npm audit fix` * feat: WIP * feat: Logic improvements * refactor: Cleanup * refactor: DRY * test: Fix Chat Sidebar UI Tests * chore: Update package lock * refactor: Cleanup * feat: Chat Message Action Card with Continue and Permission flow implementations * feat: Add agentic steering messages, draft messages and improve chat UX * fix: Search results UI * test: Fix unit test * feat: UI/UX improvements * refactor: Simplify `useToolsPanel` access in components * feat: Implement Processing Info Context API * feat: Implement 'Go back to chat' functionality for settings * feat: Enhance MCP Server management in Chat Form Attachments * style: Minor UI and branding adjustments * chore: Update webui static build output * chore: Formatting, linting & type checks * feat: Draft messages logic * feat: UI improvements * feat: Steering Messages improvements * refactor: Cleanup * refactor: Cleanup * feat: Improve UI * refactor: Settings navigation hook * refactor: DRY code * refactor: DRY ChatMessageUser UI components * refactor: Desktop Icon Strip DRY * refactor: Tools & permissions * fix: Navigation condition * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * fix: preserve reasoning_content in agentic flow * refactor: Storybook cleanup * refactor: isInViewport util function * refactor: Rename globally `onClick` to `onclick` * chore: `npm audit fix` * refactor: Action Icon usage * refactor: Naming * refactor: JS in `class` directive * refactor: Chat components cleanup WIP * refactor: Components structure * refactor: Cleanup WIP * feat: New ChatAttachmentsPreview component * feat: UI improvements * feat: UI improvements * refactor: Cleanup * refactor: ChatAttachmentsPreview UI/UX * refactor: Remove dead code * refactor: Cleanup * fix: Model Name aliases displaying * feat: Shortcut improvements * refactor: Chat Message * feat: Move Import/Export to settings * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-01 18:36:29 +02:00
Concedo	61478cbf4a	Merge commit '`c20c44514a`' into concedo_experimental # Conflicts: # .github/workflows/python-type-check.yml # examples/speculative/speculative.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/htp_iface.idl # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # scripts/jinja/jinja-tester.py # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/sync_vendor.py # tests/test-backend-ops.cpp	2026-05-01 00:07:46 +08:00
Concedo	37073bc13d	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/mmq.cuh # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-log.cpp	2026-04-30 17:37:52 +08:00
Georgi Gerganov	80afa33aad	spec : fix draft model checkpoints (#22521 ) * spec : fix draft model checkpoints * cont : clean-up * cont : gate the ngram-mod reset warning behind verbose flag	2026-04-30 08:32:18 +03:00
Concedo	45f8ff49bb	Merge commit '`52e5f0a5c1`' into concedo_experimental # Conflicts: # examples/gen-docs/gen-docs.cpp # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-vulkan/ggml-vulkan.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/binary.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/rms_norm_mul.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/ssm_scan.wgsl # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-reasoning-budget.cpp # tools/llama-bench/llama-bench.cpp # tools/rpc/rpc-server.cpp # tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte # tools/server/webui/src/lib/components/app/chat/ChatSidebar/ChatSidebar.svelte # tools/server/webui/src/routes/(chat)/+page.svelte	2026-04-29 22:27:36 +08:00
Georgi Gerganov	683c5acb90	spec : disacard last drafted token with low prob (#22506 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / python type-check (push) Waiting to run Details	2026-04-29 17:00:00 +03:00
Pascal	59237bfbbc	webui: fix slow mic stop and WAV encode (#22480 ) * webui: instant mic stop, race-free recorder restart * webui: faster WAV PCM encode via hoisted channels and Int16Array * chore: update webui build output * webui: drop setTimeout(0) hack and harden cancelRecording * chore: update webui build output	2026-04-29 12:58:35 +02:00
Aleksander Grygier	f42e29fdf1	webui: Server tools (#21237 ) * wip: server_tools * feat: Integrate with `/tools` endpoint * feat: Builtin + MCP + JSON Schema Tools WIP * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * feat: Improvements * chore: update webui build output * refactor: Updates after server updates * chore: update webui build output * change arg to --tools all * feat: UI improvements * chore: update webui build output * add readme mention * llama-gen-docs * chore: update webui build output * chore: update webui build output * chore: update webui build output * feat: Reorganize settings sections * feat: Separate dialogs for MCP Servers Settings and Import/Export * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * feat: WIP * WIP on allozaur/20677-webui-server-tools * feat: UI improvements * chore: Update package lock * chore: Run `npm audit fix` * feat: UI WIP * feat: UI * refactor: Desktop Icon Strip DRY * feat: Cleaner rendering and transition for ChatScreen * feat: UI improvements * feat: UI improvement * feat: Remove MCP Server "enable" switch from Tools submenu * chore: Run `npm audit fix` * feat: WIP * feat: Logic improvements * refactor: Cleanup * refactor: DRY * test: Fix Chat Sidebar UI Tests * chore: Update package lock * refactor: Cleanup * feat: Chat Message Action Card with Continue and Permission flow implementations * feat: Add agentic steering messages, draft messages and improve chat UX * fix: Search results UI * test: Fix unit test * feat: UI/UX improvements * refactor: Simplify `useToolsPanel` access in components * feat: Implement Processing Info Context API * feat: Implement 'Go back to chat' functionality for settings * feat: Enhance MCP Server management in Chat Form Attachments * style: Minor UI and branding adjustments * chore: Update webui static build output * chore: Formatting, linting & type checks * feat: Draft messages logic * feat: UI improvements * feat: Steering Messages improvements * refactor: Cleanup * refactor: Cleanup * feat: Improve UI * refactor: Settings navigation hook * refactor: DRY code * refactor: DRY ChatMessageUser UI components * refactor: Desktop Icon Strip DRY * refactor: Tools & permissions * fix: Navigation condition * refactor: Cleanup * refactor: Cleanup * refactor: Cleanup * fix: preserve reasoning_content in agentic flow --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-04-28 14:35:49 +03:00
Georgi Gerganov	14e733e36f	spec : refactor params (#22397 ) * spec : refactor params * cont : fix * cont : rename "sparam" to "sampling" * cont : add spec params category * cont : add info about removed arguments * cont : skip param length check for spec params * cont : adapt server tests	2026-04-28 09:07:33 +03:00
Aman Gupta	516e8d7a8a	server: use pos_next instead of n_tokens for m-rope (#22439 )	2026-04-28 08:41:00 +03:00
tha80	983ca8992e	server: (router) Forward form-data to model server (Fixes #22044 ) (#22118 ) * This commit enables the router to forward form-data to model server. Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode) * * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload. * Addressed Copilots third comment by extending the files representation to also include filename and content-type * Addressed Copilots fourth comment by making the RNG thread_local * Changed variable body from std::string to std::ostringstream in build_multipart_body as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053 * Added sanitize_field lambda in build_multipart_body for key, filename and content_type as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647 * explicitly checking if value/item is string before calling value/item.get<std::string>() as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279 * Added double quote to the sanitize lambda and throw on json parse failure --------- Co-authored-by: Ralph Paßgang <ralph@trust-it.de>	2026-04-27 23:55:00 +02:00
Concedo	340b22283e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/workflows/build-android.yml # .github/workflows/build.yml # .github/workflows/release.yml # .gitignore # docs/backend/SYCL.md # docs/backend/snapdragon/README.md # examples/model-conversion/scripts/causal/convert-model.sh # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/htp_iface.idl # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/libggml-htp.inf # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/mmvq.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl # scripts/server-test-structured.py # scripts/snapdragon/adb/run-bench.sh # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-mtmd.sh # scripts/snapdragon/adb/run-tool.sh # scripts/snapdragon/qdc/requirements.txt # scripts/snapdragon/windows/run-bench.ps1 # scripts/snapdragon/windows/run-cli.ps1 # scripts/snapdragon/windows/run-completion.ps1 # scripts/snapdragon/windows/run-mtmd.ps1 # scripts/snapdragon/windows/run-tool.ps1 # tests/test-backend-ops.cpp # tools/cli/cli.cpp # ty.toml	2026-04-25 12:13:14 +08:00
Piotr Wilkin (ilintar)	0adede866d	parser: fix structured output bug (#22302 ) Some checks failed Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details * fix very stupid structured output bug * Things just cannot be too easy.	2026-04-24 23:19:55 +02:00
Georgi Gerganov	ffdd983fb8	server : fix swa-full logic (#22288 )	2026-04-24 10:17:37 +03:00
Yes You Can Have Your Own	793d0a7931	server: rename debug tags to match --cache-idle-slots naming (#22292 )	2026-04-24 09:28:44 +03:00
srkizer	185cbff6f1	server : convert_anthropic_to_oai: also copy chat_template_kwargs (#22154 )	2026-04-23 13:32:46 -05:00
Song Li	c78fb909b2	server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) (#22267 ) * server: clamp n_discard to non-negative at JSON parse boundary (CVE-2026-21869) A negative n_discard from client JSON causes heap-buffer-overflow in update_slots() context-shift loop (CWE-787, CVSS 8.8). Clamp to 0 at ingress; n_discard=0 already triggers auto-discard (n_left/2). Ref: GHSA-8947-pfff-2f3c * cont : cleaner * cont : cleanerer * cont : cleanest --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-23 18:39:07 +02:00
kvc0	c807c6e3b0	server: (anthropic API) fix prefix caching (#21793 ) When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 \| task 10342 \| old: ... ; cch= \| defa0;You are slot update_slots: id 3 \| task 10342 \| new: ... ; cch= \| 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. I'm treating this as an anthropic message body API detail - I think this is the right way to do this, but by all means please correct me! It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.	2026-04-23 17:45:02 +02:00
Tarek Dakhran	550d684bd1	server: Enable transcriptions API for LFM2-Audio (#22000 )	2026-04-23 10:47:26 +02:00
Concedo	0755f27372	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/openvino.Dockerfile # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # common/chat.cpp # docs/backend/OPENVINO.md # examples/speculative-simple/speculative-simple.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/libggml-htp.inf # ggml/src/ggml-openvino/ggml-decoder.cpp # ggml/src/ggml-openvino/ggml-openvino-extra.cpp # ggml/src/ggml-openvino/ggml-openvino.cpp # ggml/src/ggml-openvino/ggml-quants.cpp # ggml/src/ggml-openvino/openvino/op/rope.cpp # ggml/src/ggml-openvino/openvino/op_table.cpp # ggml/src/ggml-openvino/openvino/op_table.h # ggml/src/ggml-openvino/openvino/translate_session.cpp # ggml/src/ggml-openvino/openvino/utils.cpp # ggml/src/ggml-openvino/openvino/utils.h # ggml/src/ggml-openvino/utils.cpp # ggml/src/ggml-openvino/utils.h # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/set_rows.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/cli/cli.cpp # tools/mtmd/CMakeLists.txt # tools/server/CMakeLists.txt	2026-04-23 00:55:05 +08:00
Piotr Wilkin (ilintar)	8bccdbbff9	chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs (#22217 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details * chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs * Fix ty errors. * Fix flake8 err	2026-04-22 18:10:56 +02:00
Georgi Gerganov	bcb5eeb645	speculative-simple : add checkpoint support (#22227 ) * speculative-simple : add checkpoint support * cont : fix build	2026-04-22 15:44:45 +03:00
Xuan-Son Nguyen	17f6245168	server: ignore reasoning content from transcription api (#21905 )	2026-04-22 12:10:50 +02:00
Ethan Turner	750579ff14	common: Refactoring sampler parameters (#20429 ) (#22233 ) This change refactors the reasoning_budget_message parameter from the common params into the sampling parameters specifically. It also removes the reasoning_budget common parameter and standardizes on the existing reasoning_budget_tokens parameter in the sampling configuration. Issue: https://github.com/ggml-org/llama.cpp/issues/20429 Original PR: https://github.com/ggml-org/llama.cpp/pull/20297	2026-04-22 10:40:19 +02:00
Piotr Wilkin (ilintar)	134d6e54d4	common/chat, server: refactor, move all conversion functions to common, add tests (#20690 ) * Refactor conversion functions	2026-04-22 10:28:45 +02:00
Xuan-Son Nguyen	04fe84b69d	server: allow cancel loading model (#21814 )	2026-04-22 00:26:09 +02:00
Concedo	19a12bb080	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # scripts/sync-ggml.last # tools/cli/cli.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp	2026-04-21 18:53:03 +08:00
Georgi Gerganov	cfe9838d26	fit-params : refactor + add option to output estimated memory per device (#22171 ) * fit-params : add option to output estimated memory per device * cont : minor * cont : refactor * cont : move fit params implementation to libcommon * cont : header * cont : headers * cont : codeowners	2026-04-21 09:54:36 +03:00
xris99	ff6b1062af	server : fix hardcoded proxy connection timeout in router mode (#18760 ) (#22003 ) Fixes: https://github.com/ggml-org/llama.cpp/issues/18760 Co-authored-by: Christian <christian@example.com>	2026-04-21 06:41:14 +02:00
Georgi Gerganov	cf8b0dbda9	server : remove /api endpoints (#22165 ) * server : remove /api endpoints * cont : remove /api/tags	2026-04-20 20:41:19 +03:00
Concedo	cd6788007e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-cross.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # examples/llama.android/lib/src/main/cpp/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/test-chat.cpp # tests/test-mtmd-c-api.c # tools/server/README.md	2026-04-20 20:19:11 +08:00
Georgi Gerganov	de71b5f81c	server : refactor "use checkpoint" logic (#22114 )	2026-04-20 08:42:37 +03:00
Yes You Can Have Your Own	9d49acb2a7	server: rename --clear-idle to --cache-idle-slots (#21741 )	2026-04-20 08:30:24 +03:00
Sascha Rogmann	455d8e4be8	server : speculative checkpointing (#19493 ) * server : speculative decoding using checkpoints * server : fix draft check with checkpoints * server : rename spec vars * server : log levels * server : refactored spec logic to speculative.cpp * server : renamed spec checkpoints option * server : fix spec checkpoints, logging * speculative : checkpoints with draft model, logging * server : n_tokens_cur and create_checkpoint in draft * server : fix server_speculative_callback (slot.id) * spec : fix ngram-map/begin idx_last_check * spec : init ckpt (begin() wasn't called) * chore: update webui build output * server : restore sampler in spec checkpoint and clear mem * cont : avoid --spec-use-checkpoints argument * cont : remove server_prompt_checkpoint_with_size * spec : rename (leave_draft_state) * cont : clean-up * cont : do not ignore partial drafts even if the are short * cont : spec callback owned by session * cont : simplify * cont : avoid empty speculative session * cont : simplify * cont : simplify * cont : enable mtmd speculative decoding * cont : keep the spec sampler alive * cont : simplify * cont : fix nullptr deref + draft checkpoints * cont : remove common_speculative_accept_response * cont : remove callback * cont : simplify * cont : minor * cont : simplify * cont : fix accepted number --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-19 10:24:06 +03:00
Cetarthoriphros	9e5647affa	server: Expose `media_tag` on /props endpoint. (#22028 )	2026-04-19 00:27:17 +02:00
Concedo	79882d669a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-android.yml # .github/workflows/build.yml # .github/workflows/release.yml # CMakeLists.txt # CODEOWNERS # common/CMakeLists.txt # common/common.h # docs/ops.md # docs/ops/Metal.csv # examples/batched/CMakeLists.txt # examples/convert-llama2c-to-ggml/CMakeLists.txt # examples/debug/CMakeLists.txt # examples/diffusion/CMakeLists.txt # examples/embedding/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/gen-docs/CMakeLists.txt # examples/idle/CMakeLists.txt # examples/lookahead/CMakeLists.txt # examples/lookup/CMakeLists.txt # examples/parallel/CMakeLists.txt # examples/passkey/CMakeLists.txt # examples/retrieval/CMakeLists.txt # examples/save-load-state/CMakeLists.txt # examples/speculative-simple/CMakeLists.txt # examples/speculative/CMakeLists.txt # examples/sycl/CMakeLists.txt # examples/training/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # pocs/vdot/CMakeLists.txt # src/CMakeLists.txt # tests/CMakeLists.txt # tests/test-quantize-stats.cpp # tools/batched-bench/CMakeLists.txt # tools/cli/CMakeLists.txt # tools/cli/cli.cpp # tools/completion/CMakeLists.txt # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/cvector-generator.cpp # tools/export-lora/CMakeLists.txt # tools/gguf-split/CMakeLists.txt # tools/gguf-split/gguf-split.cpp # tools/imatrix/CMakeLists.txt # tools/llama-bench/CMakeLists.txt # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/perplexity/CMakeLists.txt # tools/quantize/CMakeLists.txt # tools/quantize/quantize.cpp # tools/results/CMakeLists.txt # tools/server/CMakeLists.txt # tools/tokenize/CMakeLists.txt # tools/tts/CMakeLists.txt	2026-04-17 22:37:37 +08:00
Concedo	768527b031	Merge commit '`1e796eb41f`' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build-riscv.yml # .github/workflows/build-vulkan.yml # .github/workflows/build.yml # docs/backend/SYCL.md # docs/build.md # docs/development/HOWTO-add-model.md # embd_res/templates/Reka-Edge.jinja # ggml/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # tests/test-chat.cpp # tools/rpc/README.md	2026-04-17 21:47:29 +08:00
Georgi Gerganov	6990e2f1f7	libs : rename libcommon -> libllama-common (#21936 ) * cmake : allow libcommon to be shared * cmake : rename libcommon to libllama-common * cont : set -fPIC for httplib * cont : export all symbols * cont : fix build_info exports * libs : add libllama-common-base * log : add common_log_get_verbosity_thold()	2026-04-17 11:11:46 +03:00
Pascal	4adac43f6f	server: tests: fetch random media marker via /apply-template (#21962 ) (#21980 ) * server: tests: fetch random media marker via /apply-template (#21962 fix) * server: allow pinning media marker via LLAMA_MEDIA_MARKER env var get_media_marker() checks LLAMA_MEDIA_MARKER at first call and uses it as-is if set, falling back to the random marker otherwise. Tests no longer need to fetch the marker dynamically via /apply-template: the fixture sets LLAMA_MEDIA_MARKER=<__media__> so the hardcoded prompts work as before. Address review feedback from ngxson * server: make get_media_marker() thread-safe via magic statics Use a C++11 static local with a lambda initializer instead of a global static with an empty-check. The runtime guarantees initialization exactly once without explicit locking. Address review feedback from ggerganov * nits * nits	2026-04-16 20:46:21 +03:00
Xuan-Son Nguyen	408225bb1a	server: use random media marker (#21962 ) * server: use random media marker * nits * remove legacy <__image__> token * revert special char in random	2026-04-15 23:52:22 +02:00
Concedo	236ae27329	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/close-issue.yml # docs/multimodal.md # embd_res/templates/deepseek-ai-DeepSeek-V3.2.jinja # ggml/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # tests/peg-parser/test-gbnf-generation.cpp # tests/test-chat.cpp	2026-04-14 21:01:41 +08:00
Concedo	9c0b9b0bb1	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/multimodal.md # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/gated_delta_net.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/upscale.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # tests/test-backend-ops.cpp # tests/test-llama-archs.cpp # tools/mtmd/CMakeLists.txt	2026-04-14 20:06:04 +08:00
Xuan-Son Nguyen	e489a5ca0e	server: support OAI /v1/audio/transcriptions API (#21863 ) * server: support OAI /v1/audio/transcriptions API * address autoreview comments * correct default response_format value	2026-04-14 11:09:52 +02:00
Gaspard Petit	ce8fd4b1a6	server: Expose build_info in router mode (#21835 )	2026-04-13 11:14:42 +02:00
Rohan Jain	974c8c94cc	webui: add setting for first-line chat titles (#21797 ) * webui: add setting for first-line chat titles Add an opt-in setting (`titleGenerationUseFirstLine`) to use the first non-empty line of a prompt as the generated conversation title. Previously, the complete multi-line prompt was being used, which created long titles for complex queries. Coupled with "Ask for confirmation before changing conversation title", the dialog would overflow. * Update tools/server/webui/src/lib/utils/text.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/utils/text.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: Run build to update the bundle As requested in: https://github.com/ggml-org/llama.cpp/pull/21797#pullrequestreview-4094935065 * webui: Fix missing import for NEWLINE_SEPARATOR --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-04-13 09:30:46 +02:00
Aleksander Grygier	227ed28e12	webui: MCP Diagnostics improvements (#21803 ) * Add MCP Connection diagnostics and CORS hint to web-ui * tidy up test * webui: Refactor and improve MCP diagnostic logging --------- Co-authored-by: evalstate <1936278+evalstate@users.noreply.github.com>	2026-04-13 07:58:38 +02:00
Aleksander Grygier	9e209c5aee	fix: Proper messages rendering for "Show raw output" (#21672 )	2026-04-12 13:08:11 +02:00
Concedo	4c860ae4ae	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/download.cpp # docs/backend/OPENVINO.md # docs/backend/snapdragon/CMakeUserPresets.json # docs/backend/snapdragon/README.md # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/argsort-ops.c # ggml/src/ggml-hexagon/htp/binary-ops.c # ggml/src/ggml-hexagon/htp/cpy-ops.c # ggml/src/ggml-hexagon/htp/cumsum-ops.c # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/get-rows-ops.c # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hmx-ops.h # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/htp_iface.idl # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/repeat-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/set-rows-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/ssm-conv.c # ggml/src/ggml-hexagon/htp/sum-rows-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # models/templates/google-gemma-4-31B-it-interleaved.jinja # models/templates/google-gemma-4-31B-it.jinja # scripts/snapdragon/adb/run-bench.sh # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-tool.sh # scripts/snapdragon/windows/run-bench.ps1 # scripts/snapdragon/windows/run-cli.ps1 # scripts/snapdragon/windows/run-mtmd.ps1 # scripts/snapdragon/windows/run-tool.ps1 # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2026-04-11 11:19:32 +08:00
Concedo	a165a73120	Merge commit '`d6f3030047`' into concedo_experimental # Conflicts: # examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.py # examples/model-conversion/scripts/utils/semantic_check.py # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/amx/amx.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-openvino/ggml-openvino.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-virtgpu/ggml-backend-buffer.cpp # ggml/src/ggml-virtgpu/ggml-backend.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # pyproject.toml # requirements/requirements-convert_legacy_llama.txt # requirements/requirements-tool_bench.txt # src/llama-model.cpp # src/llama.cpp # tests/test-llama-archs.cpp # tests/test-tokenizer-0.py # tests/test-tokenizer-random.py # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp	2026-04-11 11:10:55 +08:00

1 2 3 4 5 ...

689 commits