Concedo
22c78f6c82
fix q3tts compile, update docs and lite
2026-03-14 23:33:18 +08:00
Concedo
1802b09e6f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/build.md
# docs/ops.md
# docs/ops/CPU.csv
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# src/llama-quant.cpp
# tests/test-json-schema-to-grammar.cpp
2026-03-14 17:56:16 +08:00
Concedo
ff3f8533d3
Merge commit ' c96f608d98' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/ops.md
# docs/ops/Vulkan.csv
# models/templates/LFM2-8B-A1B.jinja
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-14 17:14:34 +08:00
Concedo
8b9594b6ea
wip router mode
2026-03-14 17:07:05 +08:00
Concedo
1d067933f0
claude fixes for ace step, idk man who am i to argue with an agi
2026-03-14 12:27:26 +08:00
Concedo
349fc744e9
cleanup, fixed a regression in music gen with codes due to instruct prompt change
2026-03-14 11:32:47 +08:00
Concedo
6143a75426
improve autofit padding heuristics
2026-03-14 00:36:52 +08:00
Concedo
04915d99ee
Merge commit ' 451ef08432' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/ops.md
# docs/ops/Vulkan.csv
# src/llama-model-loader.cpp
# src/llama-model.cpp
# src/llama.cpp
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/test-json-parser.cpp
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/CMakeLists.txt
2026-03-13 23:33:37 +08:00
Concedo
d2c911884d
Merge commit ' 213c4a0b81' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# common/chat-peg-parser.cpp
# common/chat.cpp
# docs/backend/SYCL.md
# docs/development/parsing.md
# docs/ops.md
# docs/ops/SYCL.csv
# embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# embd_res/templates/Bielik-11B-v3.0-Instruct.jinja
# embd_res/templates/GLM-4.7-Flash.jinja
# embd_res/templates/LFM2-8B-A1B.jinja
# embd_res/templates/StepFun3.5-Flash.jinja
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/count-equal.cpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/presets.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# models/templates/Apertus-8B-Instruct.jinja
# models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
# models/templates/Qwen-QwQ-32B.jinja
# models/templates/Qwen3-Coder.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
# models/templates/moonshotai-Kimi-K2.jinja
# models/templates/unsloth-Apriel-1.5.jinja
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/tests.h
# tests/test-backend-ops.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-peg-parser.cpp
# tools/CMakeLists.txt
# tools/cli/cli.cpp
2026-03-13 21:35:56 +08:00
Concedo
4189508ef3
qwen3tts support 1.7b model
2026-03-13 21:15:24 +08:00
Concedo
a13641c00c
tts loader fixes
2026-03-13 18:33:10 +08:00
Concedo
0a38237ff5
original qwen3tts files
2026-03-13 15:24:18 +08:00
Concedo
4427bab37e
cover mode is now working
2026-03-13 14:55:39 +08:00
Concedo
84734eb409
better audio runtime reload
2026-03-13 14:02:56 +08:00
Concedo
8f23b8d81e
wip on ref audio, but it compiles
2026-03-12 23:46:10 +08:00
Concedo
d5a4c17e14
mp3 not default
2026-03-12 21:42:59 +08:00
Concedo
3fd9648726
added mp3 support
2026-03-12 21:00:50 +08:00
Concedo
3092694d2e
better resampler
2026-03-12 16:49:53 +08:00
Wagner Bruna
796f7bdeff
sd: fix LoRA multiplier logic to switch to at_runtime mode ( #2029 )
...
`0. in inputs.lora_multipliers` didn't work because the C array has
variable length.
Also fixed a few corner cases related to the default multipliers
(mainly to ensure robustness against future changes, since in most
cases the multiplier list is already sanitized by a previous
function).
2026-03-12 15:36:51 +08:00
Concedo
318a5486ce
duration
2026-03-12 15:33:51 +08:00
Concedo
5b22858dbd
updated docs
2026-03-12 00:20:20 +08:00
Concedo
3cc6e2ea17
make stereo default
2026-03-12 00:10:25 +08:00
Concedo
211d4fe632
lots of tweaks for ace step
2026-03-11 23:57:52 +08:00
Concedo
ecc4865244
improves code output quality
2026-03-10 23:07:52 +08:00
Concedo
8095bf9807
include overhead fromn music models
2026-03-10 22:52:20 +08:00
Concedo
6adcd0b5db
Merge commit ' 34df42f7be' into concedo_experimental
...
# Conflicts:
# README.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-arith.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6
Merge commit ' 2cd20b72ed' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# docs/backend/snapdragon/windows.md
# docs/build.md
# docs/multimodal/MobileVLM.md
# docs/ops.md
# docs/ops/WebGPU.csv
# examples/debug/README.md
# examples/llama.vim
# examples/model-conversion/README.md
# examples/sycl/README.md
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-drv.cpp
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cpy.cl
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/quants.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/server-bench.py
# scripts/snapdragon/windows/run-cli.ps1
# tests/test-alloc.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/completion/README.md
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/README.md
# tools/perplexity/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Concedo
c8800ed16c
gcc path fix
2026-03-10 21:40:32 +08:00
Ray Xu
8d880ac012
examples : fix empty items in json_schema_to_grammar.py [no ci] ( #19968 )
...
* Fix logic for retrieving schema items in `json_schema_to_grammar.py`
If `schema['items']` is `{}` and `prefixItems not in schema', as `{}` is Falsy, the original code here will raise an error.
I think if `schema['items']` is `{}`, them items should just be `{}`
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add tests for arrays with empty items
Add two unit tests to `tests/test-json-schema-to-grammar.cpp` that validate handling of arrays when 'items' is an empty schema and when 'prefixItems' is present alongside an empty 'items'. Both tests expect the same generated grammar, ensuring the JSON Schema->grammar conversion treats an empty 'items' schema (and the presence of 'prefixItems') correctly and covering this edge case.
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-10 14:38:18 +01:00
Concedo
b06dd2606e
ruff: linting
2026-03-10 21:32:36 +08:00
a3894281
0f1e9d14cc
docs: update CPU backend ops to mark POOL_1D as supported ( #20304 )
2026-03-10 21:31:24 +08:00
Wagner Bruna
3f42ed1af7
support for customizing LoRA multipliers through the sdapi ( #1982 )
...
* fix corner case in sd_oai_transform_params
Also fix typo in the function name.
* support for customizing loaded LoRA multipliers
The `sdloramult` flag now accepts a list of multipliers, one for each
LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra
VRAM usage or performance impact.
If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these
LoRAs will be available to multiplier changes via the `lora` sdapi field and
show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on
startup, and cached to avoid file reloads.
If the list of multipliers is shorter than the list of LoRAs, the multiplier
list is extended with the first multiplier (1.0 by default), to keep it
compatible with the previous behavior.
* support for `<lora:name:multiplier>` prompt syntax and metadata
* add a few tests for sanitize_lora_multipliers
2026-03-10 21:29:39 +08:00
Concedo
eafb5ff4c5
autofit improvement e.g. for strix (+1 squashed commits)
...
Squashed commits:
[6f6fd59c3 ] autofit improvement e.g. for strix
2026-03-10 21:20:02 +08:00
Georgi Gerganov
1274fbee9e
models : fix assert in mamba2 (cont) ( #20335 )
...
* models : fix assert in mamba2 (cont)
* cont : add n_group mod
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-10 15:00:08 +02:00
Georgi Gerganov
a7b3dee7a5
server : make 2 checkpoints near the end of the prompt ( #20288 )
...
* server : make 2 checkpoints near the end of the prompt
* cont : adjust checkpoints
2026-03-10 14:28:23 +02:00
Sigbjørn Skjæret
ec947d2b16
common : fix incorrect uses of stoul ( #20313 )
2026-03-10 11:40:26 +01:00
Charles Xu
0cd4f4720b
kleidiai : support for concurrent sme and neon kernel execution ( #20070 )
2026-03-10 09:25:25 +02:00
Taimur Ahmad
af237f3026
ggml-cpu: add RVV repack GEMM and GEMV for quantization types ( #19121 )
...
* ggml-cpu: add rvv ggml_quantize_mat_4x8 for q8_0
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
* ggml-cpu: add rvv repacking for iq4_nl
* ggml-cpu: add generic impl for iq4_nl gemm/gemv
* ggml-cpu: add rvv repacking for q8_0
* ggml-cpu: refactor; add rvv repacking for q4_0, q4_K
* ggml-cpu: refactor; add rvv repacking for q2_K
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
* ggml-cpu: refactor rvv repack
---------
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2026-03-10 08:49:52 +02:00
Julian Pscheid
1a5631beaa
metal: handle command buffer failures gracefully in synchronize ( #20306 )
...
Replace GGML_ABORT("fatal error") in ggml_metal_synchronize() with
error flag + return. This aligns synchronize error handling with
graph_compute, which already returns GGML_STATUS_FAILED for the same
condition.
When a command buffer fails (e.g., iOS GPU access revocation during
backgrounding, macOS eGPU disconnect, OOM), the backend enters an
error state instead of killing the host process. Subsequent
graph_compute calls return GGML_STATUS_FAILED immediately. Recovery
requires recreating the backend.
Failed extra command buffers are properly released on the error path
to avoid Metal object leaks.
2026-03-10 08:32:24 +02:00
ddh0
1dab5f5a44
llama-quant : fail early on missing imatrix, refactor type selection, code cleanup ( #19770 )
...
* quantize : imatrix-fail early + code cleanup
* fix manual override printing
it's in the preliminary loop now, so needs to be on its own line
* revert header changes per ggerganov
* remove old #includes
* clarify naming
rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality
* fix per barto
2026-03-10 08:16:05 +02:00
Concedo
500a1ab466
disable smartcache if slots is zero
2026-03-10 08:57:31 +08:00
Aldehir Rojas
c96f608d98
common: consolidate PEG string parsers ( #20263 )
...
* common : consolidate PEG string parsers
* cont : fix json_string_content()
2026-03-10 00:29:21 +01:00
Xuan-Son Nguyen
0842b9b465
model: fix step3.5 n_rot ( #20318 )
2026-03-09 23:42:24 +01:00
Xuan-Son Nguyen
59db9a357d
llama: dynamic head_dim and n_rot for SWA ( #20301 )
...
* llama: dynamic head_dim and n_rot for SWA
* also add gguf_writer wrappers
* fix build
* build_rope_shift arg reorder
2026-03-09 22:22:39 +01:00
Evan Huus
23fbfcb1ad
server: Parse port numbers from MCP server URLs in CORS proxy ( #20208 )
...
* Parse port numbers from MCP server URLs
* Pass scheme to http proxy for determining whether to use SSL
* Fix download on non-standard port and re-add port to logging
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-09 17:47:54 +01:00
Concedo
2bd6b87d5b
remove a file
2026-03-09 23:08:53 +08:00
Concedo
ee96e71bae
don't resample audio
2026-03-09 22:53:55 +08:00
Paul Flynn
e22cd0aa15
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K ( #20250 )
...
Enable mul_mv_ext small-batch kernels (BS 2-8) for BF16, Q2_K,
and Q3_K quantization types. These types previously fell through
to the slower single-row mul_mv path.
BF16 uses the float4 dequantize path (like F16). Q2_K and Q3_K
use the float4x4 K-quant path (like Q4_K/Q5_K/Q6_K).
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 16:48:12 +02:00
Georgi Gerganov
96cfc4992c
server : fix checkpoints n_tokens calculation ( #20287 )
2026-03-09 16:47:06 +02:00
Georgi Gerganov
ed0007aa32
metal : add upscale ( #20284 )
2026-03-09 16:45:11 +02:00