Concedo
5440ca4794
rolling builds
2026-03-31 13:12:11 +08:00
Concedo
0afcf4bc6d
up jimver cuda toolkit version
2026-03-30 21:43:29 +08:00
Concedo
894591da7c
increase ctx size slider
2026-03-30 21:41:31 +08:00
Concedo
a3a5897d93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/workflows/python-type-check.yml
# embd_res/templates/Qwen3.5-4B.jinja
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/model-conversion/scripts/utils/compare_tokens.py
# examples/model-conversion/scripts/utils/semantic_check.py
# examples/sycl/build.sh
# examples/sycl/run-llama2.sh
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/rope-ops.c
# scripts/gen-unicode-data.py
# tests/test-chat.cpp
2026-03-30 21:41:19 +08:00
Concedo
9864d46389
add password for musicui
2026-03-30 21:03:12 +08:00
Concedo
42ad89cd86
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/nix/package.nix
# .github/workflows/build-android.yml
# .github/workflows/build-cann.yml
# .github/workflows/build-msys.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# CMakeLists.txt
# docs/backend/CANN.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync_vendor.py
# tests/test-chat-auto-parser.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-reasoning-budget.cpp
# tools/cli/cli.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2026-03-30 20:45:38 +08:00
Aleksander Grygier
389c7d4955
webui: Fix branching logic on edit message ( #21175 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
* fix: Branching logic + small refactor
* chore: update webui build output
2026-03-30 14:40:50 +02:00
Concedo
923d5fc5d0
warning: clip_image_preprocess has been moved, now you must manually copy init_vision from mtmd into clip.cpp's setup_init_vision_shim_kcpp
2026-03-30 20:39:55 +08:00
Aman Gupta
278521c33a
llama-model-loader: print warning when using overrides with mmap ( #20978 )
...
* llama-model-loader: use pinned memory for tensor overrides
* change to warning
2026-03-30 17:40:17 +08:00
Sigbjørn Skjæret
e2eb39e81c
ci : bump ty to 0.0.26 ( #21156 )
...
* fix incorrect type ignore comments
* bump ty to 0.0.26
2026-03-30 09:29:15 +02:00
Xuan-Son Nguyen
abf9a62161
server: wrap headers for mcp proxy ( #21072 )
...
* server: wrap headers for mcp proxy
* Update tools/server/server-cors-proxy.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix build
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-30 08:59:16 +02:00
Sigbjørn Skjæret
7c203670f8
add missing ROPE_FACTORS_LONG/SHORT for MiniCPM ( #21150 )
2026-03-29 19:45:40 +02:00
Gaurav Garg
ec16a072f0
Optimize MOE GEMV kernel for BS > 1. ( #20905 )
...
* Optimize MOE GEMV kernel for BS > 1.
The previous MOE kernel for BS > 1 had too many thread blocks (nrows_x, nchannels_dst, ncols_dst), with very little work per block. block of (32, 4) was doing inner dot product for a single row.
New mul_mat_vec_q_moe kernel is dedicated for MoE multi-token kernel with grid (ceil(nrows_x/rpb), nchannels_dst), block (warp_size, ncols_dst). Each warp handles two rows independently with warp-level reduction only (no shared memory sync).
This change doesn't increase any compilation time as a single template instance is needed per type. This also simplifies the original GEMV kernel and gets rid of `is_multi_token_id` specialization.
* Remove em-dashes
* Cherry-pick changes from @am17an PR https://github.com/ggml-org/llama.cpp/pull/20885 to enable small_k optimization only for cases where it benefits
Increase max batch size for MMVQ kernels for MUL_MAT_ID to 8
* Make the max batch size for MOE GEMV kernel configurable based on GPU arch and datatype
---------
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-03-29 18:35:18 +02:00
Concedo
4fc3c28f1a
reasoning output parsing improvements
2026-03-29 23:35:24 +08:00
Max Krasnyansky
f5d1c4179f
hexagon: dma optimizations (mostly fixing regressions) ( #21137 )
...
* hex-fa: add simple dma cache for Mask
I noticed that we were refetch the mask rows over and over.
This simple cache avoids that.
* hex-dma: unset in-order desc bit which caused signficant perf regression
We don't rely on true in order processing of the DMA descriptors anywhere.
Turns out this mode caused significant regression of around 3-4 TPS during token gen.
* hex-rope: update comment to clarify that we don't need in-order DMA completions
2026-03-29 06:40:13 -07:00
Concedo
4a09f3805b
prepare for breaking merge
2026-03-29 14:09:29 +08:00
Davi Henrique Linhares
2405d59cb6
devops: including compute-runtime for intel.Dockerfile ( #21076 )
2026-03-29 13:34:03 +08:00
Neo Zhang
afe65aa282
[SYCL] Enhance build script to use half cores to build, avoid OS hang ( #21093 )
...
* use half cores to build, avoid OS hang
* reduce the output text num to short test time
* avoid to return 0
2026-03-29 09:02:45 +08:00
Sigbjørn Skjæret
65097181e4
fix **/x glob matching ( #21129 )
2026-03-28 22:27:38 +01:00
Piotr Wilkin (ilintar)
98ae0a0d36
common/parser: fix handling of tool definition with missing properties key ( #21128 )
2026-03-28 20:41:32 +01:00
Sigbjørn Skjæret
3a14a542f5
common : add character class support to glob_match ( #21111 )
...
* add character class support to glob_match
* remove pointless reference
2026-03-28 19:57:37 +01:00
Concedo
df6b7b5fdb
Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental
2026-03-29 01:25:07 +08:00
Concedo
3eedde8ab5
Merge commit ' ded446b34c' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
2026-03-29 01:24:31 +08:00
Wagner Bruna
9223f41320
sd: call SetCircularAxesAll directly ( #2078 )
2026-03-29 01:17:48 +08:00
Concedo
8760d22a84
switch back to newly updated jimver github cuda toolkit
2026-03-29 01:17:11 +08:00
Concedo
aac220f7e3
Merge commit ' 0fac87b157' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-android.yml
# .github/workflows/hip-quality-check.yml
# docs/multimodal.md
# scripts/hip/gcn-cdna-vgpr-check.py
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-tool.ps1
# tests/test-backend-ops.cpp
# tests/test-llama-archs.cpp
# tools/imatrix/imatrix.cpp
# tools/mtmd/CMakeLists.txt
2026-03-29 01:14:33 +08:00
BlueMöhre
968189729f
WebUI: Replace illegal nested button elements ( #21026 )
...
* remove/replace nested button elements
* map rest props to outer element
* solve TODO
* chore: update webui build output
2026-03-28 17:57:59 +01:00
Concedo
674b7f5eee
indicate support for claude messages api
2026-03-29 00:57:58 +08:00
Adrien
e397d3885c
common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter ( #21124 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
The regex-to-grammar converter in _visit_pattern() crashes with SIGSEGV
when a JSON schema "pattern" field contains a non-capturing group (?:...).
Root cause: when the parser sees '(' followed by '?', it pushes a warning
but does not advance past '?:'. The recursive transform() call then
interprets '?' as a quantifier and calls seq.back() on an empty vector,
causing undefined behavior.
This commonly occurs when serving OpenAI-compatible tool calls from
clients that include complex regex patterns in their JSON schemas (e.g.,
date validation patterns like ^(?:(?:\d\d[2468][048]|...)-02-29|...)$).
The fix:
- Skip '?:' after '(' to treat non-capturing groups as regular groups
- For unsupported syntax (?=, ?!, etc.), skip to matching ')' safely,
handling escaped characters to avoid miscounting parenthesis depth
- Adjust the ')' unbalanced-parentheses check using direct char
comparisons instead of substr
- Add test cases for non-capturing groups (C++ only, as the JS/Python
implementations do not yet support this syntax)
2026-03-28 17:55:38 +01:00
Concedo
e3b7905e1c
added anthropic messages api support
2026-03-29 00:55:32 +08:00
Concedo
5ad9e3ee31
crude openai responses streaming
2026-03-29 00:16:30 +08:00
Aldehir Rojas
e6f2ec01ff
common : add reasoning_format = none support to gpt-oss ( #21094 )
2026-03-28 09:33:39 -05:00
Georgi Gerganov
edfb440a2f
server : fix processing of multiple back-to-back mtmd chunks ( #21107 )
2026-03-28 16:27:36 +02:00
Adrien Gallouët
3d66da1809
ci : gracefully shut down the server ( #21110 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 14:49:57 +01:00
Woof Dog
82b703f8bc
Document custom default webui preferences in server README ( #19771 )
2026-03-28 14:19:16 +01:00
Concedo
94b266a6b0
musicui fix reset defaults
2026-03-28 21:09:40 +08:00
Aleksander Grygier
51a84efc53
webui: Conversation forking + branching improvements ( #21021 )
...
* refactor: Make `DialogConfirmation` extensible with children slot
* feat: Add conversation forking logic
* feat: Conversation forking UI
* feat: Update delete/edit dialogs and logic for forks
* refactor: Improve Chat Sidebar UX and add MCP Servers entry
* refactor: Cleanup
* feat: Update message in place when editing leaf nodes
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* chore: Cleanup
* refactor: Post-review improvements
* chore: update webui build output
* test: Update Storybook test
* chore: update webui build output
* chore: update webui build output
2026-03-28 13:38:15 +01:00
Concedo
1e787cd03a
improve responses api
2026-03-28 18:42:15 +08:00
Concedo
f768b2a4bd
whatever, i tried
2026-03-28 17:32:07 +08:00
Adrien Gallouët
b0f0dd3e51
vendor : update cpp-httplib to 0.40.0 ( #21100 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 08:59:44 +01:00
Ruben Ortlam
0eb4764182
vulkan: add noncontiguous GLU support ( #21081 )
...
* vulkan: add noncontiguous GLU support
* fix compile issue
2026-03-28 08:44:56 +01:00
Piotr Wilkin (ilintar)
1f5d15e665
common/parser: fix reasoning whitespace bugs + extra parser tests ( #21085 )
...
* fix whitespace reasoning issues + add reconstruction tests
* Proper fix
* fix Nemotron autoparser test expectations to include newline in marker
2026-03-28 07:29:26 +01:00
Concedo
f80fdd4314
updated sdui
2026-03-28 11:24:03 +08:00
Concedo
547659fdbf
allow planning music with llm (+1 squashed commits)
...
Squashed commits:
[9a3bbf072] allow planning music with llm
2026-03-28 11:19:39 +08:00
Sigbjørn Skjæret
c46758d28f
cli : add /glob command ( #21084 )
...
* add /glob command
* output error when max files reached
* support globbing outside curdir
2026-03-28 02:33:04 +01:00
Ts-sound
bf934f28db
docker : fix and enable ARM64 image build ( #20929 )
...
* CI: fix ARM64 image build error & enable compilation
* Update .github/workflows/docker.yml
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* CI: revert ggml/src/ggml-cpu/CMakeLists.txt
* Update .github/workflows/docker.yml
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* CI: update runs-on to ubuntu24.04, and update ARM64 build image ( ubuntu_version: "24.04")
* CI: change cpu.Dockerfile gcc to 14;
* CI : cpu.Dockerfile , update pip install .
* Update .github/workflows/docker.yml
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
---------
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-28 01:45:09 +01:00
Adrien Gallouët
5c1a7b8355
server : add custom socket options to disable SO_REUSEPORT ( #21056 )
...
* server : add custom socket options to disable SO_REUSEPORT
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add --reuse-port
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 --reuse-port
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
$ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update tools/server/README.md (llama-gen-docs)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Fix windows
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-28 01:12:43 +01:00
Aldehir Rojas
59d840209a
common : inhibit lazy grammar sampler while reasoning is active ( #20970 )
...
* common : inhibit grammar while reasoning budget is active
* cont : update force_pos in accept
* cont : fix tests
* cont : tweak should apply logic
* cont : return early not using grammar sampler
* Add tests
* cont : prevent backend sampling when reasoning budget enabled
* cont : fix typo
---------
Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
2026-03-27 18:30:40 +01:00
Concedo
3ec6381123
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/gguf-publish.yml
# ci/run.sh
# docs/backend/OPENVINO.md
# examples/llama.android/lib/src/main/cpp/ai_chat.cpp
# ggml/src/ggml-sycl/add-id.cpp
# requirements/requirements-pydantic.txt
# tests/test-gguf.cpp
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
# tools/gguf-split/README.md
# tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Concedo
2cdf02102e
preserve previous filename
2026-03-28 01:13:03 +08:00