Concedo
349fc744e9
cleanup, fixed a regression in music gen with codes due to instruct prompt change
2026-03-14 11:32:47 +08:00
Concedo
6143a75426
improve autofit padding heuristics
2026-03-14 00:36:52 +08:00
Concedo
04915d99ee
Merge commit ' 451ef08432' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/ops.md
# docs/ops/Vulkan.csv
# src/llama-model-loader.cpp
# src/llama-model.cpp
# src/llama.cpp
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/test-json-parser.cpp
# tests/peg-parser/test-python-dict-parser.cpp
# tests/peg-parser/test-unicode.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat.cpp
# tools/CMakeLists.txt
2026-03-13 23:33:37 +08:00
Concedo
d2c911884d
Merge commit ' 213c4a0b81' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# common/chat-peg-parser.cpp
# common/chat.cpp
# docs/backend/SYCL.md
# docs/development/parsing.md
# docs/ops.md
# docs/ops/SYCL.csv
# embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja
# embd_res/templates/Bielik-11B-v3.0-Instruct.jinja
# embd_res/templates/GLM-4.7-Flash.jinja
# embd_res/templates/LFM2-8B-A1B.jinja
# embd_res/templates/StepFun3.5-Flash.jinja
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/count-equal.cpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/presets.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# models/templates/Apertus-8B-Instruct.jinja
# models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
# models/templates/Qwen-QwQ-32B.jinja
# models/templates/Qwen3-Coder.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# models/templates/deepseek-ai-DeepSeek-V3.1.jinja
# models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
# models/templates/moonshotai-Kimi-K2.jinja
# models/templates/unsloth-Apriel-1.5.jinja
# tests/CMakeLists.txt
# tests/peg-parser/test-basic.cpp
# tests/peg-parser/tests.h
# tests/test-backend-ops.cpp
# tests/test-chat-peg-parser.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-peg-parser.cpp
# tools/CMakeLists.txt
# tools/cli/cli.cpp
2026-03-13 21:35:56 +08:00
Concedo
4189508ef3
qwen3tts support 1.7b model
2026-03-13 21:15:24 +08:00
Concedo
a13641c00c
tts loader fixes
2026-03-13 18:33:10 +08:00
Concedo
0a38237ff5
original qwen3tts files
2026-03-13 15:24:18 +08:00
Concedo
4427bab37e
cover mode is now working
2026-03-13 14:55:39 +08:00
Concedo
84734eb409
better audio runtime reload
2026-03-13 14:02:56 +08:00
Concedo
8f23b8d81e
wip on ref audio, but it compiles
2026-03-12 23:46:10 +08:00
Concedo
d5a4c17e14
mp3 not default
2026-03-12 21:42:59 +08:00
Concedo
3fd9648726
added mp3 support
2026-03-12 21:00:50 +08:00
Concedo
3092694d2e
better resampler
2026-03-12 16:49:53 +08:00
Wagner Bruna
796f7bdeff
sd: fix LoRA multiplier logic to switch to at_runtime mode ( #2029 )
...
`0. in inputs.lora_multipliers` didn't work because the C array has
variable length.
Also fixed a few corner cases related to the default multipliers
(mainly to ensure robustness against future changes, since in most
cases the multiplier list is already sanitized by a previous
function).
2026-03-12 15:36:51 +08:00
Concedo
318a5486ce
duration
2026-03-12 15:33:51 +08:00
Concedo
5b22858dbd
updated docs
2026-03-12 00:20:20 +08:00
Concedo
3cc6e2ea17
make stereo default
2026-03-12 00:10:25 +08:00
Concedo
211d4fe632
lots of tweaks for ace step
2026-03-11 23:57:52 +08:00
Concedo
ecc4865244
improves code output quality
2026-03-10 23:07:52 +08:00
Concedo
8095bf9807
include overhead fromn music models
2026-03-10 22:52:20 +08:00
Concedo
6adcd0b5db
Merge commit ' 34df42f7be' into concedo_experimental
...
# Conflicts:
# README.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/cpy-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-arith.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
# tools/cli/cli.cpp
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-03-10 22:20:04 +08:00
Concedo
746664fde6
Merge commit ' 2cd20b72ed' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# docs/backend/snapdragon/windows.md
# docs/build.md
# docs/multimodal/MobileVLM.md
# docs/ops.md
# docs/ops/WebGPU.csv
# examples/debug/README.md
# examples/llama.vim
# examples/model-conversion/README.md
# examples/sycl/README.md
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-drv.cpp
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cpy.cl
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/quants.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/server-bench.py
# scripts/snapdragon/windows/run-cli.ps1
# tests/test-alloc.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/completion/README.md
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/README.md
# tools/perplexity/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Concedo
c8800ed16c
gcc path fix
2026-03-10 21:40:32 +08:00
Concedo
b06dd2606e
ruff: linting
2026-03-10 21:32:36 +08:00
Wagner Bruna
3f42ed1af7
support for customizing LoRA multipliers through the sdapi ( #1982 )
...
* fix corner case in sd_oai_transform_params
Also fix typo in the function name.
* support for customizing loaded LoRA multipliers
The `sdloramult` flag now accepts a list of multipliers, one for each
LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra
VRAM usage or performance impact.
If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these
LoRAs will be available to multiplier changes via the `lora` sdapi field and
show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on
startup, and cached to avoid file reloads.
If the list of multipliers is shorter than the list of LoRAs, the multiplier
list is extended with the first multiplier (1.0 by default), to keep it
compatible with the previous behavior.
* support for `<lora:name:multiplier>` prompt syntax and metadata
* add a few tests for sanitize_lora_multipliers
2026-03-10 21:29:39 +08:00
Concedo
eafb5ff4c5
autofit improvement e.g. for strix (+1 squashed commits)
...
Squashed commits:
[6f6fd59c3 ] autofit improvement e.g. for strix
2026-03-10 21:20:02 +08:00
Concedo
500a1ab466
disable smartcache if slots is zero
2026-03-10 08:57:31 +08:00
Concedo
2bd6b87d5b
remove a file
2026-03-09 23:08:53 +08:00
Concedo
ee96e71bae
don't resample audio
2026-03-09 22:53:55 +08:00
Aldehir Rojas
451ef08432
common : gracefully handle incomplete output ( #20191 )
...
* common : handle incomplete UTF-8 at end of input in PEG parser
* cont : if reached end prematurely, emit needs_more_input to propagate partial output
* cont: refactor peg parse context to add lenient flag
* cont : remove partial flag, keep lenient flag
2026-03-08 17:17:02 +01:00
Piotr Wilkin (ilintar)
9b24886f78
Fix compile bug ( #20203 )
...
* Fix compile bug
* Update common/chat-auto-parser-helpers.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-08 17:15:49 +01:00
Piotr Wilkin (ilintar)
62b8143ad2
Fix structured outputs ( #20223 )
...
* Fix structured outputs
* Update common/chat-auto-parser-generator.cpp
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-03-08 17:14:43 +01:00
Concedo
45c74da08b
adjust ace step, still wip on caption rework
2026-03-09 00:11:48 +08:00
JustCommitRandomness
9ddd74111f
OpenBSD changes for vulkan backend ( #2026 )
...
* OpenBSD also needs alloca.h
* Changes to compile vulkan backend with OpenBSD
* Update README.md
tweak details for OpenBSD vulkan backend
* Update README.md
2026-03-08 20:41:36 +08:00
GiantPrince
d088d5b74f
ggml-vulkan: Add ELU op support ( #20183 )
...
* ggml-Vulkan: add ELU support
* ggml-Vulkan: remove extra spaces and variables
* ggml-Vulkan: fix format issue
* ggml-Vulkan: fix format issue
* fix whitespace issue
* Update Vulkan.csv and ops.md
2026-03-08 12:38:17 +01:00
Jeff Bolz
cd18a50ea5
vulkan: Fix data races in coopmat1 mul_mat(_id) ( #20084 )
...
* vulkan: Fix data races in coopmat1 mul_mat(_id)
Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.
* switch to subgroup control barriers
2026-03-08 12:33:48 +01:00
Johannes Gäßler
a976ff081b
llama: end-to-end tests ( #19802 )
...
* tests: add end-to-end tests per model architecture
* fixup for rebase
* fix use-after-free in llama-model-loader.cpp
* fix CI
* fix WebGPU
* fix CI
* disable CI for macOS-latest-cmake-arm64
* use expert_weights_scale only if != 0.0f
* comments
2026-03-08 12:30:21 +01:00
Christopher Maher
a95047979a
readme : update infra list ( #20212 )
2026-03-08 12:42:28 +02:00
Piotr Wilkin (ilintar)
b283f6d5b3
Revert to OAI-compatible args ( #20213 )
...
* Revert to OAI-compatible args
* Apply workaround::func_args_not_string
2026-03-08 11:33:03 +01:00
decahedron1
ff52ee964d
server : correct index on finish in OAI completion streams ( #20226 )
2026-03-08 10:08:57 +01:00
Concedo
270d4ad2c1
fixed a typo
2026-03-08 12:56:08 +08:00
Concedo
73fc5c4767
handle jinja exceptions
2026-03-08 12:12:02 +08:00
Neo Zhang
213c4a0b81
[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 ( #20190 )
...
* support flash-attention for fp32/fp16/Q4/Q5/Q8
* rm warining
* update for JIT
2026-03-08 12:00:07 +08:00
Concedo
41df8b09e5
jinjatools now works mostly well
2026-03-08 11:55:22 +08:00
Concedo
a981d1ece9
updated lite
2026-03-08 02:33:18 +08:00
Wagner Bruna
9158bd8b4d
sd: sync to master-520-d950627 ( #2006 )
...
* sd: sync to master-509-4cdfff5
* sd: Anima support
* sd: sync to master-514-5792c66
* sd: additional workaround for Anima .safetensors model
* sd: sync to master-517-ba35dd7
* sd: sync to master-520-d950627
2026-03-08 01:23:03 +08:00
Concedo
ebe44e7819
modify q3tts loader
2026-03-08 00:53:33 +08:00
Concedo
0df18d2ae2
fixed single token bans
2026-03-07 22:50:53 +08:00
Concedo
a40038d8e6
further reverse the mxfp4 changes
2026-03-07 22:42:22 +08:00
Aman Gupta
c5a778891b
ggml: add GATED_DELTA_NET op ( #19504 )
...
* ggml: add GATED_DELTA_NET op
* remove the transpose
* add KDA
* add qwen35 dense
* llama : check for fused gated delta net backend support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-07 15:41:10 +08:00