Commit graph

1351 commits

Author SHA1 Message Date
Concedo
c4b1a17e1a tools debug 2026-03-19 23:13:02 +08:00
Concedo
2f63f94fd8 fix router nocertify mode 2026-03-19 12:45:19 +08:00
Concedo
8cf9ba34e9 fixed SSL in routermode 2026-03-19 12:43:11 +08:00
Concedo
15e86010d8 autofit will clear moecpu and overridetensors 2026-03-18 21:20:57 +08:00
Concedo
d85272a958 fixed wrong encoding (+1 squashed commits)
Squashed commits:

[a87d059a8] fixed wrong encoding
2026-03-17 15:54:54 +08:00
Concedo
e09ddc8fff mcp fix (+1 squashed commits)
Squashed commits:

[c5a959a07] mcp fix
2026-03-17 15:45:05 +08:00
Concedo
837fe9d832 mcp stdio fixes 2026-03-17 15:34:05 +08:00
Concedo
39f9007d12 handle notifications in mcp 2026-03-17 15:13:42 +08:00
Concedo
6d3f01d139 compact css, fix .py variable name error 2026-03-17 11:11:46 +08:00
henk717
927d3c68bb
502 Loading page (#2042)
* Proper Loading page

* Loading page wording

* Different wording
2026-03-17 10:59:44 +08:00
Wagner Bruna
51187d5362
sd: support changing preloaded LoRA multipliers (#2041)
* sd: remove C++ support for enforcing fixed LoRA multipliers

The logic at the Python level is enough.

* sd: support changing preloaded LoRA multipliers

We keep the same rules as before:
- Any LoRA with multiplier 0 can be changed
- If all LoRAs have multiplier != 0, they are fixed and optimized

but tweak the corner case of LoRAs specified more than once to
allow adjusting the multiplier if the same LoRA is also specified
with a zero multiplier, as if they were two different LoRAs.

So the following keeps working as before:
- --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1
- --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0
- --sdlora /loras/ : dynamic, default 0
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2

But now we have:
- --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1
- --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1
2026-03-17 10:09:55 +08:00
Wagner Bruna
0c66ed863d
sd: additional validation for the LoRA list (#2043)
* sd: additional validation for the LoRA list

* sd: sanitize LoRA list before downloading
2026-03-17 10:09:10 +08:00
Concedo
ea15dfab83 added auto unload for admin mode 2026-03-16 23:56:34 +08:00
Concedo
b656b7c929 router works with singleinstance 2026-03-16 22:53:24 +08:00
Concedo
6c8c55afb5 PR cleanup 2026-03-16 21:04:16 +08:00
Wagner Bruna
feea014774
sd: support for dynamic LoRA loading from a directory (#2036)
* backend support for controlling LoRA cache and fixed multipliers

The generation LoRA multipliers are now added to the initial
multipliers, so e.g. a merged LCM model will behave the same as
a normal model with a preloaded LCM LoRA.

* frontend support
2026-03-16 20:39:21 +08:00
Concedo
310bd97972 router mode is fully functional 2026-03-15 23:07:47 +08:00
Concedo
5a65005d27 allow swap back to initial model 2026-03-15 21:35:23 +08:00
Concedo
ccd4745e0c ollama streaming emulation 2026-03-15 18:25:37 +08:00
Concedo
2e725e4f10 fixed port bug 2026-03-15 17:00:50 +08:00
Wagner Bruna
b437d18319
add support for cache modes to accelerate image generation (#2021)
* sd: sync to master-525-d6dd6d7

* sd: add support for cache modes for inference acceleration

* keep gendefaults as a JSON object inside the config file

* covered more invalid cases on gendefaults parsing
2026-03-15 15:27:14 +08:00
Concedo
3b9385a627 updated colab, wip model router 2026-03-15 00:38:29 +08:00
Concedo
8b9594b6ea wip router mode 2026-03-14 17:07:05 +08:00
Concedo
6143a75426 improve autofit padding heuristics 2026-03-14 00:36:52 +08:00
Concedo
8f23b8d81e wip on ref audio, but it compiles 2026-03-12 23:46:10 +08:00
Concedo
d5a4c17e14 mp3 not default 2026-03-12 21:42:59 +08:00
Concedo
3fd9648726 added mp3 support 2026-03-12 21:00:50 +08:00
Wagner Bruna
796f7bdeff
sd: fix LoRA multiplier logic to switch to at_runtime mode (#2029)
`0. in inputs.lora_multipliers` didn't work because the C array has
variable length.

Also fixed a few corner cases related to the default multipliers
(mainly to ensure robustness against future changes, since in most
cases the multiplier list is already sanitized by a previous
function).
2026-03-12 15:36:51 +08:00
Concedo
3cc6e2ea17 make stereo default 2026-03-12 00:10:25 +08:00
Concedo
211d4fe632 lots of tweaks for ace step 2026-03-11 23:57:52 +08:00
Concedo
8095bf9807 include overhead fromn music models 2026-03-10 22:52:20 +08:00
Concedo
b06dd2606e ruff: linting 2026-03-10 21:32:36 +08:00
Wagner Bruna
3f42ed1af7
support for customizing LoRA multipliers through the sdapi (#1982)
* fix corner case in sd_oai_transform_params

Also fix typo in the function name.

* support for customizing loaded LoRA multipliers

The `sdloramult` flag now accepts a list of multipliers, one for each
LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra
VRAM usage or performance impact.

If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these
LoRAs will be available to multiplier changes via the `lora` sdapi field and
show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on
startup, and cached to avoid file reloads.

If the list of multipliers is shorter than the list of LoRAs, the multiplier
list is extended with the first multiplier (1.0 by default), to keep it
compatible with the previous behavior.

* support for `<lora:name:multiplier>` prompt syntax and metadata

* add a few tests for sanitize_lora_multipliers
2026-03-10 21:29:39 +08:00
Concedo
eafb5ff4c5 autofit improvement e.g. for strix (+1 squashed commits)
Squashed commits:

[6f6fd59c3] autofit improvement e.g. for strix
2026-03-10 21:20:02 +08:00
Concedo
270d4ad2c1 fixed a typo 2026-03-08 12:56:08 +08:00
Concedo
73fc5c4767 handle jinja exceptions 2026-03-08 12:12:02 +08:00
Concedo
41df8b09e5 jinjatools now works mostly well 2026-03-08 11:55:22 +08:00
Concedo
2c38638b3d Merge commit '2afcdb9777' into concedo_experimental
# Conflicts:
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
2026-03-06 21:13:15 +08:00
Gustavo Rocha Dias
cbecc34667
Fix OAI-compatible token usage and unique request IDs (#2015)
* fix: token usage fix for mistral-vibe

* fix: generate unique request IDs for OAI-compatible responses

* fix: prompt_tokens reporting KV cache size instead of actual count during streaming

* fixes for PR #2015
For (1), this is not a good idea. If it returned 0 (e.g. during an error), this value may not be updated and will return the value of a previous or different request. It's better to return 0 in those cases.
For (2), this is a good idea but we don't need that level of randomness. I'll probably swap it with a 6 digit random number instead.
For (3), the official openai spec gates it behind stream_options.include_usage = true so i'll do that too

* missed 1 item

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 20:57:22 +08:00
Concedo
8658af1018 qwen3tts default to cpu unless gpu selected 2026-03-05 11:11:46 +08:00
Concedo
5d35193749 fixed a sse stream issue 2026-03-03 21:30:28 +08:00
Concedo
7df210833e missed one case for autofit 2026-03-03 21:05:59 +08:00
Concedo
d7fb3df10a support 1 level deep admindir 2026-03-02 16:23:34 +08:00
Concedo
c9e651f7e5 updated lite, fix some cuda spams, fix qwen3tts voice loading 2026-03-01 00:41:56 +08:00
Wagner Bruna
5c40f07d4a
sd: sync to 0752cc9 (master-507-b314d80 +1) (#1999)
* sd: sync to 0752cc9 (master-507-b314d80 +1)

* sd: add flow-shift support to gendefaults
2026-02-28 12:22:32 +08:00
Concedo
14d82bb38e allow music llm and diffusion gen models to be loaded independently 2026-02-27 21:56:48 +08:00
Concedo
ba42f22fc8 stereo is working 2026-02-27 20:36:44 +08:00
Wagner Bruna
d400b37215
config file saving enhancements (#1994)
* process --exportconfig and --exporttemplate after --config

This allows using `--config oldfile.kcpps --exportconfig newfile.kcpps`
to update old config items, copy a config file with changed parameters,
download and save a remote config, etc.

* filter out command flags from the saved config files

Also ident files saved by command-line.
2026-02-26 14:55:01 +08:00
Concedo
5c5fe55f7d bump kv overrides max (+1 squashed commits)
Squashed commits:

[9bc8212a0] bump kv overrides max
2026-02-26 00:24:53 +08:00
Concedo
8a3ccfcba5 some fixes but some issues 2026-02-25 23:41:32 +08:00