Commit graph

11821 commits

Author SHA1 Message Date
Concedo
a1305ffff9 still not working 2026-02-26 10:48:21 +08:00
Concedo
5c5fe55f7d bump kv overrides max (+1 squashed commits)
Squashed commits:

[9bc8212a0] bump kv overrides max
2026-02-26 00:24:53 +08:00
Concedo
d8746a851f still bugged 2026-02-26 00:07:04 +08:00
Concedo
8a3ccfcba5 some fixes but some issues 2026-02-25 23:41:32 +08:00
Concedo
0eafc3cf2d ace step lowvram mode done, improved 2026-02-24 23:12:26 +08:00
Concedo
11a85d62fc lowvram for music lm 2026-02-24 22:21:17 +08:00
Concedo
aa58d1ed3b all working, but needs to optimize vram 2026-02-24 21:55:57 +08:00
Concedo
488c431331 not yet working 2026-02-24 17:47:50 +08:00
Concedo
0fd7d2c0e5 ace step diffusion loading 2026-02-24 15:24:15 +08:00
Concedo
749536f464 fixed wav header wrong size 2026-02-24 01:13:44 +08:00
askmyteapot
062e361968
Update ace-qwen3.cpp to build on MSVC (#1992)
need to include <sstream> otherwise build fails with lots of the below errors: 

```
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2297: '<<': not valid as right operand has type 'const cha
r [26]' [C:\koboldcpp\build\music_adapter.vcxproj]
  (compiling source file '../otherarch/acestep/music_adapter.cpp')

C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2679: binary '<<': no operator found which takes a right-h
and operand of type 'std::string' (or there is no acceptable conversion) [C:\koboldcpp\build\music_adapter.vcxproj]
  (compiling source file '../otherarch/acestep/music_adapter.cpp')
      C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717\include\__msvc_int128.hpp(
  753,46):
      could be 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept' [found us
  ing argument-dependent lookup]
          C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9):
          'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept': cannot conver
  t argument 2 from 'std::string' to 'const std::_Base128 &'
              C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57):
              Reason: cannot convert from 'std::string' to 'const std::_Base128'
              C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57):
              No user-defined-conversion operator available that can perform this conversion, or the operator cannot be
   called
```
2026-02-23 23:03:07 +08:00
Concedo
5311997581 updated ace step cpp 2026-02-23 23:01:10 +08:00
Concedo
2e713cfff5 fixed compile issue, trying out 8bit pcm 2026-02-23 21:19:03 +08:00
Wagner Bruna
a6c0a224b2
sd: sync to master-506-c9cd497 (#1991) 2026-02-23 17:35:59 +08:00
Concedo
06c0ffaead with am17an fix for henk to test 2026-02-23 17:30:19 +08:00
Concedo
c2b0cb26a8 ace step codes api 2026-02-23 14:04:45 +08:00
Concedo
d100c8660e added Tlacuilo 2026-02-23 10:48:56 +08:00
Concedo
4be93db21c ace step codes generation now working 2026-02-23 00:27:26 +08:00
Concedo
71d42fae85 Revert "Revert "Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)"""
This reverts commit edc04f3f7d.
2026-02-22 23:18:53 +08:00
Concedo
13db5aee9e stub files for loading ace step 2026-02-22 23:15:08 +08:00
Concedo
37ae068dee set default to GPU test 2026-02-22 17:03:43 +08:00
Concedo
fdf868f397 add ace step cpp license info 2026-02-22 13:24:28 +08:00
Concedo
5cd6e50eab initial files for ace step 2026-02-22 13:22:24 +08:00
Concedo
ac70ca35dd preliminary patches for acestep.cpp 2026-02-22 12:50:08 +08:00
Wagner Bruna
19588f18ea
sd: relax size restrictions for DiT models (#1986)
Round image dimensions to the specific multiple required by each
DiT model, which range from 32 (certain Wan models) to 1 (Chroma
Radiance), with most requiring multiples of 8 or 16. Unet models
keep being rounded to multiples of 64.

Current sd.cpp rounds the sizes internally; but it always rounds
up, so we still need to round on our side to apply image size
restrictions, and to trigger VAE tiling correctly.

Also, remove a legacy test that could abort a generation with
unsupported image sizes: it'd never run, because it was applied
after the image side adjustements.
2026-02-22 11:00:10 +08:00
Concedo
0a87f5501e updated sdui, fix img imports 2026-02-22 10:49:55 +08:00
Concedo
73f3ffaeb7 fix followup tool call check with assistant prefills 2026-02-22 10:33:00 +08:00
Concedo
edc04f3f7d Revert "Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)""
This reverts commit 131e3cb17a.
2026-02-22 09:33:25 +08:00
Concedo
d06700687f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/rocm.Dockerfile
#	.github/workflows/release.yml
#	CMakeLists.txt
#	ggml/src/ggml-cuda/common.cuh
#	scripts/sync_vendor.py
#	tests/test-chat.cpp
2026-02-22 09:33:13 +08:00
Mario Limonciello
35715657cb
Update ROCm docker container to 7.2 release (#19418)
Some checks failed
Python Type-Check / pyright type-check (push) Has been cancelled
Also update architectures
2026-02-21 21:53:39 +01:00
Mario Limonciello
f75c4e8bf5
Add a build target to generate ROCm artifacts using ROCm 7.2 (#19433)
This builds the following targets:
 * gfx1151
 * gfx1150
 * gfx1200
 * gfx1201
 * gfx1100
 * gfx1101
 * gfx1030
 * gfx908
 * gfx90a
 * gfx942
2026-02-21 19:56:26 +01:00
Concedo
78b4b87e54 fixed compile issue for tts on ci (+1 squashed commits)
Squashed commits:

[d6f778499] fixed compile issue for tts on ci
2026-02-22 02:28:11 +08:00
Adrien Gallouët
99156f3a5f
vendor : update cpp-httplib to 0.33.1 (#19778)
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-21 19:12:31 +01:00
Concedo
7068a74998 tts upstream bugfix 2026-02-22 00:46:03 +08:00
Concedo
313d37a602 cache used voices 2026-02-22 00:43:57 +08:00
Concedo
5536fb29f2 add some default voices for qwen3tts 2026-02-21 23:45:15 +08:00
Gaurav Garg
a0c91e8f9f
Improve CUDA graph capture (#19754)
* Improve CUDA graph capture

Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because:

- The first call always incurs CUDA graph capture overhead even if the graph is unstable
- Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode)

The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize.
This also fixes issues such as https://github.com/ggml-org/llama.cpp/discussions/19708

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Remove EM dashes

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-02-21 15:09:36 +05:30
Concedo
2db018a1d7 qwen3tts support reference audio 2026-02-21 17:30:21 +08:00
crsawyer
07968d53e4
fix: UI single model selection in router mode (#19767) 2026-02-21 09:28:39 +01:00
Concedo
72219fdbf5 basic qwen3 tts working 2026-02-21 12:03:53 +08:00
Concedo
1af7095cb5 add qwen3 tts repo files 2026-02-21 10:54:55 +08:00
Concedo
ad0618e351 bump defaults, updated lite, fixed glm4.7 autoguess template 2026-02-21 08:51:53 +08:00
Mengsheng Wu
ba3b9c8844
hexagon : fix build release (#19444) (#19587) 2026-02-20 16:40:00 -08:00
Aldehir Rojas
94b0200a01
common : merge qwen3-coder and nemotron nano 3 parsers (#19765)
* common : migrate qwen3-coder to PEG parsing variant

* cont : add JSON parameter test
2026-02-20 23:22:22 +01:00
Concedo
131e3cb17a Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)"
This reverts commit ad8207af77.
2026-02-20 21:34:17 +08:00
Concedo
81065fd801 fix ci build error 2026-02-20 21:32:07 +08:00
Taimur Ahmad
b908baf182
ggml-cpu: add RVV vec dot kernels for quantization types (#18784)
* ggml-cpu: add rvv vec_dot for iq2_s

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for iq3_s

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for tq1_0, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

ggml-cpu: add rvv vec_dot for tq1_0, tq2_0

* ggml-cpu: add rvv vec_dot for iq1_s, iq1_m

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add vlen switch for rvv vec_dot

---------

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2026-02-20 13:30:07 +02:00
ddh0
492bc31978
quantize : add --dry-run option (#19526)
* clean slate for branch

* use 6 characters for tensor dims

* add --dry-run to llama-quantize

* use 6 characters for tensor dims (cont.)

* no need to re-calculate ggml_nbytes for tensor

* fix indent

* show model and quant BPW when quant completes

* add example to --help

* new function `tensor_requires_imatrix`, add courtesy warning about imatrix

* missing __func__, move imatrix flag set

* logic error

* fixup tensor_requires_imatrix

* add missing `GGML_TYPE`s

* simplify and rename `tensor_type_requires_imatrix`

* simplify for style

* add back Q2_K edge case for imatrix

* guard ftype imatrix warning

* comment ref #12557

* remove per @compilade

* remove unused `params` parameter

* move `bool dry_run` per GG

* move `bool dry_run` per GG

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-20 09:20:16 +01:00
Concedo
e626de2430 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/stepfun-ai-Step-3.5-Flash.jinja
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/mtmd/CMakeLists.txt
2026-02-20 15:16:26 +08:00
Concedo
07c45ced56 Merge commit 'c78e682245' into concedo_experimental
# Conflicts:
#	src/models/qwen35.cpp
#	src/models/qwen35moe.cpp
2026-02-20 14:41:32 +08:00