Commit graph

466 commits

Author SHA1 Message Date
LostRuins Concedo
008405e8fd Merge commit '7aaeedc098' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	docs/ops.md
#	docs/ops/SYCL.csv
#	docs/ops/Vulkan.csv
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/acl_tensor.h
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/rms_norm.cl
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-backend-ops.cpp
2025-11-17 23:33:09 +08:00
LostRuins Concedo
3fe0e39b62 Merge commit '4dca015b7e' into concedo_experimental
# Conflicts:
#	.github/copilot-instructions.md
#	README.md
#	docs/ops.md
#	docs/ops/CPU.csv
#	docs/ops/CUDA.csv
#	docs/ops/Vulkan.csv
#	ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
2025-11-16 18:33:58 +08:00
Georgi Gerganov
5b2093becc
server : handle context overflow during decode (#17267)
* server : handle context overflow during decode

* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier
22e1ce2f81
webui: Fix clickability around chat processing statistics UI (#17278)
* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal
1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618)
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Ankur Verma
c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Georgi Gerganov
d396b43748
server : fix "can batch with" bug (#17263) 2025-11-14 14:03:45 +02:00
Aleksander Grygier
f1bad23f88
Better UX for handling multiple attachments in WebUI (#17246) 2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen
c4abcb2457
server: fixing naming conflict res_error (#17243) 2025-11-13 20:53:47 +01:00
LostRuins Concedo
26e9090088 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/nix/package.nix
#	.devops/rocm.Dockerfile
#	.devops/vulkan.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	build-xcframework.sh
#	ci/run.sh
#	common/CMakeLists.txt
#	common/download.cpp
#	docs/backend/CANN.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/kleidiai/kernels.cpp
#	ggml/src/ggml-cpu/kleidiai/kernels.h
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync_vendor.py
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-rope.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/rpc/CMakeLists.txt
#	tools/server/CMakeLists.txt
2025-11-13 15:45:50 +08:00
Aleksander Grygier
8e878f0cb4
Update packages + upgrade Storybook to v10 (#17201)
* chore: Update packages + upgrade Storybook to v10

* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen
00c94083b3
server: (refactor) implement generator-based API for task results (#17174)
* server: (refactor) implement generator-based API for task results

* improve

* moving some code

* fix "Response ended prematurely"

* add sink.done before return false

* rm redundant check

* rm unused var

* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen
ee8dd5c658
server: move res_error/res_ok to static function (#17167) 2025-11-12 14:17:24 +01:00
Adrien Gallouët
78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib (#17177)
* cmake : move OpenSSL linking to vendor/cpp-httplib

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* bring back httplib 0.27.0

* add -DLLAMA_HTTPLIB

* update cmake config for visionos

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen
1d45b4228f
vendor: split httplib to cpp/h files (#17150)
* vendor: split httplib to cpp/h files

* move defines

* include httplib if curl is not used

* add TODO

* fix build ios

* fix build visionos instead
2025-11-11 13:32:58 +01:00
Mike Abbott
4a5b8aff40
cmake : add version to all shared object files (#17091)
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned.  This applies a version to all generated so files, allowing the package to build without errors.
2025-11-11 13:19:50 +02:00
Nicolas B. Pierron
d2d626938a
Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
LostRuins Concedo
5125c0b879 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/vulkan.Dockerfile
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/set_rows.cl
#	ggml/src/ggml-vulkan/ggml-vulkan.cpp
#	ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
#	tests/test-backend-ops.cpp
#	tools/batched-bench/batched-bench.cpp
2025-11-11 17:10:11 +08:00
Gabe Goodhart
0c74f32632
memory: Hybrid context shift (#17009)
* feat(memory): Only fail partial erasure of recurrent tail

The recurrent state is always assumed to be the state as of the last update
from the final token in the sequence. When doing a partial erasure, if the
range does not include the final token, the erasure can be considered a
success since any memory used for the sequence prior to the final token
(which is no memory) has been successfully removed.

There is one potential case that this doesn't address which is the pruning
of cache to remove sensitive data from the context. This wouldn't work for
attention cache partial removal (in the middle) either since the KV state
is linearly-dependent and states in later sequence positions would still be
based on the state from the sensitive data, even if that data is no longer
cached, so I don't think this is relevant, but it is worth noting that the
semantics of this change for a partial erasure in the middle of the cache
are essentially "my context is already compressed" and not "all trace of
the removed tokens has been removed."

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(main): Check the output of seq_rm for prefix matching

This prefix matching is explicitly attempting to remove the tokens at the
end of the sequence that don't match. This is the operation that can't be
performed on a recurrent cache due to the state being updated in place, so
if this removal fails, we need to clear the whole cache.

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(memory): Fix condition for partial erasure failure if p0 > pos

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Co-authored-by: compilade <git@compilade.net>

* style: Fix extra parens

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-10 17:14:23 +02:00
Georgi Gerganov
f914544b16
batched-bench : add "separate text gen" mode (#17103) 2025-11-10 12:59:29 +02:00
Xuan-Son Nguyen
4b13a684c5
mtmd: fix patch_size initialized to random value in audio models (#17128)
* mtmd: fix patch_size initialized to random value in audio models

* add default hparams
2025-11-10 11:41:05 +01:00
LostRuins Concedo
df6e303fd3 merge https://github.com/ggml-org/llama.cpp/pull/17128 2025-11-10 11:24:04 +08:00
LostRuins Concedo
d02cb1b117 Revert "fix divide by zero error"
This reverts commit 6cce98eca5.
2025-11-10 11:22:50 +08:00
LostRuins Concedo
6cce98eca5 fix divide by zero error 2025-11-10 01:38:55 +08:00
Georgi Gerganov
b8595b16e6
mtmd : fix embedding size for image input (#17123) 2025-11-09 18:31:02 +02:00
Georgi Gerganov
cb1adf8851
server : handle failures to restore host cache (#17078)
* server : handle failures to restore host cache

* server : add tests for the prompt cache
2025-11-09 14:27:05 +02:00
LostRuins Concedo
60a74bdd89 make tool calling work with jinja. but still need to fix qwen omni first (+1 squashed commits)
Squashed commits:

[e394da61e] make tool calling work with jinja. but still need to fix qwen omni first
2025-11-09 16:56:14 +08:00
chansikpark
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title (#17007) 2025-11-08 20:52:35 +01:00
LostRuins Concedo
4fc022a51f revert qwen vl warmup size 2025-11-09 02:24:49 +08:00
LostRuins Concedo
d6a2ad8455 still not really working right 2025-11-09 01:57:48 +08:00
LostRuins Concedo
e6ca0aa8d0 Merge commit '2f0c2db43e' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	README.md
#	docs/backend/OPENCL.md
#	docs/ops.md
#	docs/ops/CUDA.csv
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/set_rows.tmpl.wgsl
#	scripts/sync-ggml.last
#	src/CMakeLists.txt
#	tools/server/README.md
2025-11-08 23:27:59 +08:00
Aidan
eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
Some checks failed
Python Type-Check / pyright type-check (push) Has been cancelled
* fix: correct time_ms calculation in send_partial_response

The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.

Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000

* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
LostRuins Concedo
64a1cd95a7 fixed missing headers 2025-11-08 11:09:49 +08:00
LostRuins Concedo
dfb0966ed2 not working 2025-11-08 10:49:10 +08:00
LostRuins Concedo
fdcb281a3a Merge commit '2f966b8ed8' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	docs/docker.md
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-thread-safety.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/mtmd/clip.cpp
2025-11-08 10:34:17 +08:00
LostRuins Concedo
7061cd1cc9 Merge commit 'e4a71599e5' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	tools/mtmd/clip.cpp
2025-11-08 10:28:49 +08:00
Georgi Gerganov
7956bb4d7f
bench : cache the llama_context state at computed depth (#16944)
* bench : cache llama_context state at depth

* cont : handle failures to restore the old state

* cont : print information when the state is being reused
2025-11-07 21:23:11 +02:00
Sigbjørn Skjæret
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
* add n_embd_full to support extended embed

* don't change output

* rename to n_embd_inp

* restore n_embd where applicable
2025-11-07 19:27:58 +01:00
Georgi Gerganov
16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
* kv-cache : pad the size of the small SWA cache for performance

* context : pad the total context to 256

* cont : future-proof the swa pad

* server : adjust test params to new logic
2025-11-07 20:03:25 +02:00
Georgi Gerganov
8c0d6bb455
server : print the samplers chain for each request (#17070) 2025-11-07 12:24:47 +02:00
Georgi Gerganov
b7f9010d24
server : disable checkpoints with mtmd (#17045) 2025-11-06 12:09:29 +02:00
Xuan-Son Nguyen
4882f0ff78
clip: implement minicpm-v sinusoidal embd using GGML (#17036)
* clip: implement minicpm-v sinusoidal embd using GGML

* fix repeat op
2025-11-06 11:02:54 +01:00
Xuan-Son Nguyen
92bb84f775
mtmd: allow QwenVL to process larger image by default (#17020) 2025-11-05 14:26:49 +01:00
Georgi Gerganov
13b339bcd9
server : do not default to multiple slots with speculative decoding (#17017)
* server : do not default to multiple slots with speculative decoding

* cont : fix
2025-11-05 14:32:55 +02:00
Xuan-Son Nguyen
2f0c2db43e
mtmd: improve struct initialization (#16981) 2025-11-05 11:26:37 +01:00
손희준
fd2f84f468
docs: Clarify the endpoint that webui uses (#17001) 2025-11-05 11:20:28 +01:00
Georgi Gerganov
66d8eccd42
server : do context shift only while generating (#17000) 2025-11-04 19:21:36 +02:00
Aleksander Grygier
e7da30b584
fix: Viewing multiple PDF attachments (#16974) 2025-11-03 18:53:26 +01:00
Georgi Gerganov
48bd26501b
server : add props.model_alias (#16943)
* server : add props.model_alias

* webui : npm run format
2025-11-03 14:38:23 +01:00