Commit graph

662 commits

Author SHA1 Message Date
Concedo
950676fdb7 split utils.cpp into 2 files to support sd.cpp 2026-05-04 15:04:12 +08:00
Wagner Bruna
276c651a12
sd: sync to master-593-3d6064b (#2175)
* sd: sync to master-593-3d6064b

* sd: use the same sdtype_adapter object for all builds

Since master-592-b8079e2, no sd.cpp source depends on the ggml
backend build anymore.

* sd: fix main_gpu selection

* sd: report backend devices to the Python layer
2026-05-04 14:05:34 +08:00
Wagner Bruna
25fab4113e
refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179)
All C++ handling code currently:
- build a comma-separated list from the info_vulkan array
- if GGML_VK_VISIBLE_DEVICES isn't set
  - set GGML_VK_VISIBLE_DEVICES to the list

Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this
can be done in the same way at the Python level, before all loading
functions.

Caveat: load_model had the default `inputs.vulkan_info = "0"`,
so the default GPU would be "0" only when loading a text model.
2026-05-02 23:10:29 +08:00
Wagner Bruna
e2bdd6d7aa
sd: sync to master-591-331cfa5 (#2155)
* sd: sync to master-585-44cca3d

* sd: sync to master-587-b8bdffc

* sd: sync to master-591-331cfa5
2026-05-01 16:33:28 +08:00
Concedo
eca9f4c1df use original precision for q3tts 2026-04-30 17:28:11 +08:00
Concedo
2741d7e7bd switch back from ulaw to wav16 2026-04-30 17:27:55 +08:00
Wagner Bruna
c04832bb2b
sd: add eta support (#2164) 2026-04-25 19:04:13 +08:00
Concedo
2cde0bffd2 minor text edit 2026-04-23 20:06:04 +08:00
Wagner Bruna
bad9b61064
sd: sync to master-582-7023fc4 (#2150)
* sd: remove sampler alias handling from the C++ layer

It's already handled at the Python layer.

* sd: sync to master-580-7d33d4b

* sd: sync to master-582-7023fc4
2026-04-21 23:01:33 +08:00
Concedo
271c4c332c hack to allow kokoro to remain functional even with much higher GGML_SCHED_MAX_SPLIT_INPUTS 2026-04-19 20:40:07 +08:00
Concedo
afaf3b960e try to make kokoro take less graph size 2026-04-19 19:00:35 +08:00
Wagner Bruna
1be08b9d15
sd: report all sampler aliases and centralize name mapping (#2149)
* debug: allow loading backend libraries without normal arg parsing

This is just to be able to test backend functions directly, with e.g.:

>> import koboldcpp
>> koboldcpp.init_libraries()
>> koboldcpp.sd_get_info()

* sd: report all sampler aliases and centralize name mapping
2026-04-19 01:51:42 +08:00
Concedo
0b37cb9a57 added preliminary support for reasoning budget 2026-04-18 11:56:33 +08:00
Concedo
ae292c496e handle SWA conflicting with rewind, increased default SWA padding. 2026-04-16 17:00:26 +08:00
Concedo
535df844dd touchup for min/max tokens ui 2026-04-16 14:56:22 +08:00
Llama
c592bd01da
Pass img_min_params and img_max_params to ctx_clip_params (#2133)
* Pass img_min_params and img_max_params to ctx_clip_params

These values determine the minimum and maximum size (in
tokens) of vision embeddings. The default value of -1
uses a model-dependent default size, for example for
Gemma 4 the default is a 280 token embedding. For higher
quality results (at the cost of using more memory and
slower speed) you can increase the size of the embedding
to 1120 tokens.

* Change dict to mydict to match change to method
2026-04-16 12:27:06 +08:00
Concedo
d9724a4caa kcpp musicgen - disable flash attention as its not stable on vulkan. due to optimizations should still fit in 6gb in lowvram. 2026-04-12 18:28:30 +08:00
Concedo
7bf7b0aefc optimize lowvram for music 2026-04-12 18:17:08 +08:00
Concedo
ad6eaffd3c updated docs, adjusted acestep threads 2026-04-09 22:33:30 +08:00
Concedo
5529748a01 Merge commit 'de1aa6fa73' into concedo_experimental
# Conflicts:
#	docs/build.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	ggml/src/ggml-sycl/dequantize.hpp
#	ggml/src/ggml-sycl/dmmv.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/quants.hpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	tests/test-backend-ops.cpp
#	tests/test-quantize-fns.cpp
2026-04-09 17:16:33 +08:00
Wagner Bruna
f371bb14d4
sd: sync to master-560-e8323ca (#2082)
* sd: sync to master-540-f16a110

* tae post-merge fixes

* build fixes

* restore image mask for non-inpainting models

* sd: sync to master-551-99c1de3

* avoid nlohmann/json.hpp include diffs

* Euler A now works on Flux

* sd: sync to master-555-7397dda

avi_writer.h got removed upstream, but I've simply kept the local
copy for now.

* sd: sync to master-558-8afbeb6

* sd: sync to master-560-e8323ca
2026-04-09 14:44:59 +08:00
Concedo
6aa49b91b1 fixed acestep bad on vulkan 2026-04-08 22:22:07 +08:00
Concedo
9b02806191 updated acestep convert 2026-04-08 18:39:28 +08:00
Concedo
355f75769e acestep xl now loads and works! 2026-04-08 18:36:18 +08:00
Concedo
4b478b70fa ace step xl tentative changes (not yet working) 2026-04-08 18:00:39 +08:00
Concedo
a8841063b5 adjust q3tts chunking 2026-04-07 22:55:02 +08:00
Concedo
a1fc912452 try don't trigger the magic if input is following the jinja template.exactly for thinking models (+1 squashed commits)
Squashed commits:

[5542e81dc] try don't trigger the magic if input is following the jinja template.
2026-04-07 21:47:38 +08:00
Concedo
c8d6546a14 experimental q3tts batch test 2026-04-07 20:49:46 +08:00
Wagner Bruna
9223f41320
sd: call SetCircularAxesAll directly (#2078) 2026-03-29 01:17:48 +08:00
Concedo
2cdf02102e preserve previous filename 2026-03-28 01:13:03 +08:00
Wagner Bruna
e3c6227d46
sd: report back image generation parameters and metadata (#2062)
* sd: refactor image generation result handling

* sd: report back image generation metadata
2026-03-28 00:49:03 +08:00
Concedo
4a5c903718 sd model model replacement logic: adjusted approach for easy merge 2026-03-26 21:57:42 +08:00
Concedo
efdc52fe8b q3tts custom voice support 2026-03-24 23:38:18 +08:00
Concedo
0d50cafd8b added CustomVoice support 2026-03-23 18:50:08 +08:00
Wagner Bruna
abe55fa424
sd: fix metadata for generated images (#2061)
* sd: fix metadata for generated images

* sd: refactor output image conversion
2026-03-23 17:04:32 +08:00
Alistair Stewart
5ff6cefce0
Fix music generation token stopping (#2057)
* Fix music generation token stopping for quantized models

In Phase 1 lyrics mode, the FSM transitions to CODES state after
TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was
not efficiently generating TOKEN_IM_END to stop the generation,
causing it to continue until hitting the 8192 token limit.

This fix forces TOKEN_IM_END to be generated immediately after
TOKEN_THINK_END in lyrics mode, ensuring clean completion of the
planning phase without excessive token generation.

Testing shows generation now completes in ~500ms instead of 80+
seconds with timeout errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Clarify comment - fix applies to all models, not just quantized

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve fix: only force TOKEN_IM_END at token limit

Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END,
only force it when we've reached the token limit. This allows the model
to generate lyrics after the thinking block while still preventing KV
cache exhaustion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-23 17:02:14 +08:00
Wagner Bruna
592dedee28
sd: ensure previous generation results are cleaned up on all code paths (#2060) 2026-03-22 23:18:09 +08:00
Concedo
3bda0bf102 passthrough mode without any gens 2026-03-22 23:09:08 +08:00
Concedo
f846c83a7a pre-seed the tts so it can be shown 2026-03-22 10:36:42 +08:00
Concedo
fdfb713d91 added --sdmaingpu allowing image models to be independently placed on any gpu 2026-03-21 17:34:12 +08:00
Concedo
a3d3800f3e added passthrough mode for esrgan upscale, triggered by img2img denoise 0.0 with 1 step 2026-03-21 16:19:10 +08:00
Wagner Bruna
51187d5362
sd: support changing preloaded LoRA multipliers (#2041)
* sd: remove C++ support for enforcing fixed LoRA multipliers

The logic at the Python level is enough.

* sd: support changing preloaded LoRA multipliers

We keep the same rules as before:
- Any LoRA with multiplier 0 can be changed
- If all LoRAs have multiplier != 0, they are fixed and optimized

but tweak the corner case of LoRAs specified more than once to
allow adjusting the multiplier if the same LoRA is also specified
with a zero multiplier, as if they were two different LoRAs.

So the following keeps working as before:
- --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1
- --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0
- --sdlora /loras/ : dynamic, default 0
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2

But now we have:
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1
- --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1
2026-03-17 10:09:55 +08:00
Wagner Bruna
6e7b9a1549
sd: sync to master-529-630ee03 (#2040) 2026-03-17 00:23:28 +08:00
Wagner Bruna
feea014774
sd: support for dynamic LoRA loading from a directory (#2036)
* backend support for controlling LoRA cache and fixed multipliers

The generation LoRA multipliers are now added to the initial
multipliers, so e.g. a merged LCM model will behave the same as
a normal model with a preloaded LCM LoRA.

* frontend support
2026-03-16 20:39:21 +08:00
Concedo
b88fc44d0e add some debug prints 2026-03-16 16:27:49 +08:00
Concedo
2093ca4c73 ace step optimizations 2026-03-15 20:58:45 +08:00
Wagner Bruna
b437d18319
add support for cache modes to accelerate image generation (#2021)
* sd: sync to master-525-d6dd6d7

* sd: add support for cache modes for inference acceleration

* keep gendefaults as a JSON object inside the config file

* covered more invalid cases on gendefaults parsing
2026-03-15 15:27:14 +08:00
Concedo
b1c500ae2b Merge commit '2948e6049a' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CONTRIBUTING.md
#	docs/backend/VirtGPU/development.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/GigaChat3-10B-A1.8B.jinja
#	embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00
Concedo
22c78f6c82 fix q3tts compile, update docs and lite 2026-03-14 23:33:18 +08:00
Concedo
1d067933f0 claude fixes for ace step, idk man who am i to argue with an agi 2026-03-14 12:27:26 +08:00