Commit graph

634 commits

Author SHA1 Message Date
Wagner Bruna
9223f41320
sd: call SetCircularAxesAll directly (#2078) 2026-03-29 01:17:48 +08:00
Concedo
2cdf02102e preserve previous filename 2026-03-28 01:13:03 +08:00
Wagner Bruna
e3c6227d46
sd: report back image generation parameters and metadata (#2062)
* sd: refactor image generation result handling

* sd: report back image generation metadata
2026-03-28 00:49:03 +08:00
Concedo
4a5c903718 sd model model replacement logic: adjusted approach for easy merge 2026-03-26 21:57:42 +08:00
Concedo
efdc52fe8b q3tts custom voice support 2026-03-24 23:38:18 +08:00
Concedo
0d50cafd8b added CustomVoice support 2026-03-23 18:50:08 +08:00
Wagner Bruna
abe55fa424
sd: fix metadata for generated images (#2061)
* sd: fix metadata for generated images

* sd: refactor output image conversion
2026-03-23 17:04:32 +08:00
Alistair Stewart
5ff6cefce0
Fix music generation token stopping (#2057)
* Fix music generation token stopping for quantized models

In Phase 1 lyrics mode, the FSM transitions to CODES state after
TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was
not efficiently generating TOKEN_IM_END to stop the generation,
causing it to continue until hitting the 8192 token limit.

This fix forces TOKEN_IM_END to be generated immediately after
TOKEN_THINK_END in lyrics mode, ensuring clean completion of the
planning phase without excessive token generation.

Testing shows generation now completes in ~500ms instead of 80+
seconds with timeout errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Clarify comment - fix applies to all models, not just quantized

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve fix: only force TOKEN_IM_END at token limit

Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END,
only force it when we've reached the token limit. This allows the model
to generate lyrics after the thinking block while still preventing KV
cache exhaustion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-23 17:02:14 +08:00
Wagner Bruna
592dedee28
sd: ensure previous generation results are cleaned up on all code paths (#2060) 2026-03-22 23:18:09 +08:00
Concedo
3bda0bf102 passthrough mode without any gens 2026-03-22 23:09:08 +08:00
Concedo
f846c83a7a pre-seed the tts so it can be shown 2026-03-22 10:36:42 +08:00
Concedo
fdfb713d91 added --sdmaingpu allowing image models to be independently placed on any gpu 2026-03-21 17:34:12 +08:00
Concedo
a3d3800f3e added passthrough mode for esrgan upscale, triggered by img2img denoise 0.0 with 1 step 2026-03-21 16:19:10 +08:00
Wagner Bruna
51187d5362
sd: support changing preloaded LoRA multipliers (#2041)
* sd: remove C++ support for enforcing fixed LoRA multipliers

The logic at the Python level is enough.

* sd: support changing preloaded LoRA multipliers

We keep the same rules as before:
- Any LoRA with multiplier 0 can be changed
- If all LoRAs have multiplier != 0, they are fixed and optimized

but tweak the corner case of LoRAs specified more than once to
allow adjusting the multiplier if the same LoRA is also specified
with a zero multiplier, as if they were two different LoRAs.

So the following keeps working as before:
- --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1
- --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0
- --sdlora /loras/ : dynamic, default 0
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2

But now we have:
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1
- --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1
2026-03-17 10:09:55 +08:00
Wagner Bruna
6e7b9a1549
sd: sync to master-529-630ee03 (#2040) 2026-03-17 00:23:28 +08:00
Wagner Bruna
feea014774
sd: support for dynamic LoRA loading from a directory (#2036)
* backend support for controlling LoRA cache and fixed multipliers

The generation LoRA multipliers are now added to the initial
multipliers, so e.g. a merged LCM model will behave the same as
a normal model with a preloaded LCM LoRA.

* frontend support
2026-03-16 20:39:21 +08:00
Concedo
b88fc44d0e add some debug prints 2026-03-16 16:27:49 +08:00
Concedo
2093ca4c73 ace step optimizations 2026-03-15 20:58:45 +08:00
Wagner Bruna
b437d18319
add support for cache modes to accelerate image generation (#2021)
* sd: sync to master-525-d6dd6d7

* sd: add support for cache modes for inference acceleration

* keep gendefaults as a JSON object inside the config file

* covered more invalid cases on gendefaults parsing
2026-03-15 15:27:14 +08:00
Concedo
b1c500ae2b Merge commit '2948e6049a' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CONTRIBUTING.md
#	docs/backend/VirtGPU/development.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/GigaChat3-10B-A1.8B.jinja
#	embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00
Concedo
22c78f6c82 fix q3tts compile, update docs and lite 2026-03-14 23:33:18 +08:00
Concedo
1d067933f0 claude fixes for ace step, idk man who am i to argue with an agi 2026-03-14 12:27:26 +08:00
Concedo
349fc744e9 cleanup, fixed a regression in music gen with codes due to instruct prompt change 2026-03-14 11:32:47 +08:00
Concedo
4189508ef3 qwen3tts support 1.7b model 2026-03-13 21:15:24 +08:00
Concedo
a13641c00c tts loader fixes 2026-03-13 18:33:10 +08:00
Concedo
0a38237ff5 original qwen3tts files 2026-03-13 15:24:18 +08:00
Concedo
4427bab37e cover mode is now working 2026-03-13 14:55:39 +08:00
Concedo
84734eb409 better audio runtime reload 2026-03-13 14:02:56 +08:00
Concedo
8f23b8d81e wip on ref audio, but it compiles 2026-03-12 23:46:10 +08:00
Concedo
d5a4c17e14 mp3 not default 2026-03-12 21:42:59 +08:00
Concedo
3fd9648726 added mp3 support 2026-03-12 21:00:50 +08:00
Concedo
3092694d2e better resampler 2026-03-12 16:49:53 +08:00
Concedo
318a5486ce duration 2026-03-12 15:33:51 +08:00
Concedo
3cc6e2ea17 make stereo default 2026-03-12 00:10:25 +08:00
Concedo
211d4fe632 lots of tweaks for ace step 2026-03-11 23:57:52 +08:00
Concedo
ecc4865244 improves code output quality 2026-03-10 23:07:52 +08:00
Concedo
c8800ed16c gcc path fix 2026-03-10 21:40:32 +08:00
Wagner Bruna
3f42ed1af7
support for customizing LoRA multipliers through the sdapi (#1982)
* fix corner case in sd_oai_transform_params

Also fix typo in the function name.

* support for customizing loaded LoRA multipliers

The `sdloramult` flag now accepts a list of multipliers, one for each
LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra
VRAM usage or performance impact.

If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these
LoRAs will be available to multiplier changes via the `lora` sdapi field and
show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on
startup, and cached to avoid file reloads.

If the list of multipliers is shorter than the list of LoRAs, the multiplier
list is extended with the first multiplier (1.0 by default), to keep it
compatible with the previous behavior.

* support for `<lora:name:multiplier>` prompt syntax and metadata

* add a few tests for sanitize_lora_multipliers
2026-03-10 21:29:39 +08:00
Concedo
ee96e71bae don't resample audio 2026-03-09 22:53:55 +08:00
Concedo
45c74da08b adjust ace step, still wip on caption rework 2026-03-09 00:11:48 +08:00
Wagner Bruna
9158bd8b4d
sd: sync to master-520-d950627 (#2006)
* sd: sync to master-509-4cdfff5

* sd: Anima support

* sd: sync to master-514-5792c66

* sd: additional workaround for Anima .safetensors model

* sd: sync to master-517-ba35dd7

* sd: sync to master-520-d950627
2026-03-08 01:23:03 +08:00
Concedo
ebe44e7819 modify q3tts loader 2026-03-08 00:53:33 +08:00
JustCommitRandomness
2fbc3b2ae5
Adjust int types in format strings (#2009)
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD

* complete the changes needed to fix the format string specifers

* avoid using inttypes, directly cast to size_t (u64 usually) instead

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 19:06:18 +08:00
JustCommitRandomness
389773070f
OpenBSD also needs alloca.h (#2012) 2026-03-05 12:32:31 +08:00
Concedo
8658af1018 qwen3tts default to cpu unless gpu selected 2026-03-05 11:11:46 +08:00
Concedo
4f1b22c415 kv snapshots save and load last logits for correctness. added some text for musicui, updated docs 2026-03-04 21:57:28 +08:00
Concedo
707f7b37bf optimize pp 2026-03-03 21:02:51 +08:00
Concedo
ae67caa2f7 ace qwen rep pen for codes 2026-03-02 21:18:06 +08:00
Concedo
de9840afac qwen image max ref image size fix from 512x512 to 1024x1024 2026-03-02 21:08:52 +08:00
Concedo
b632d2ce1c print timestamp when image generated 2026-03-02 18:38:21 +08:00