Commit graph

56 commits

Author SHA1 Message Date
Wagner Bruna
25fab4113e
refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179)
All C++ handling code currently:
- build a comma-separated list from the info_vulkan array
- if GGML_VK_VISIBLE_DEVICES isn't set
  - set GGML_VK_VISIBLE_DEVICES to the list

Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this
can be done in the same way at the Python level, before all loading
functions.

Caveat: load_model had the default `inputs.vulkan_info = "0"`,
so the default GPU would be "0" only when loading a text model.
2026-05-02 23:10:29 +08:00
Concedo
d9724a4caa kcpp musicgen - disable flash attention as its not stable on vulkan. due to optimizations should still fit in 6gb in lowvram. 2026-04-12 18:28:30 +08:00
Concedo
7bf7b0aefc optimize lowvram for music 2026-04-12 18:17:08 +08:00
Concedo
ad6eaffd3c updated docs, adjusted acestep threads 2026-04-09 22:33:30 +08:00
Concedo
6aa49b91b1 fixed acestep bad on vulkan 2026-04-08 22:22:07 +08:00
Concedo
9b02806191 updated acestep convert 2026-04-08 18:39:28 +08:00
Concedo
355f75769e acestep xl now loads and works! 2026-04-08 18:36:18 +08:00
Concedo
4b478b70fa ace step xl tentative changes (not yet working) 2026-04-08 18:00:39 +08:00
Alistair Stewart
5ff6cefce0
Fix music generation token stopping (#2057)
* Fix music generation token stopping for quantized models

In Phase 1 lyrics mode, the FSM transitions to CODES state after
TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was
not efficiently generating TOKEN_IM_END to stop the generation,
causing it to continue until hitting the 8192 token limit.

This fix forces TOKEN_IM_END to be generated immediately after
TOKEN_THINK_END in lyrics mode, ensuring clean completion of the
planning phase without excessive token generation.

Testing shows generation now completes in ~500ms instead of 80+
seconds with timeout errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Clarify comment - fix applies to all models, not just quantized

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve fix: only force TOKEN_IM_END at token limit

Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END,
only force it when we've reached the token limit. This allows the model
to generate lyrics after the thinking block while still preventing KV
cache exhaustion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-23 17:02:14 +08:00
Concedo
b88fc44d0e add some debug prints 2026-03-16 16:27:49 +08:00
Concedo
2093ca4c73 ace step optimizations 2026-03-15 20:58:45 +08:00
Concedo
1d067933f0 claude fixes for ace step, idk man who am i to argue with an agi 2026-03-14 12:27:26 +08:00
Concedo
349fc744e9 cleanup, fixed a regression in music gen with codes due to instruct prompt change 2026-03-14 11:32:47 +08:00
Concedo
4427bab37e cover mode is now working 2026-03-13 14:55:39 +08:00
Concedo
84734eb409 better audio runtime reload 2026-03-13 14:02:56 +08:00
Concedo
8f23b8d81e wip on ref audio, but it compiles 2026-03-12 23:46:10 +08:00
Concedo
d5a4c17e14 mp3 not default 2026-03-12 21:42:59 +08:00
Concedo
3fd9648726 added mp3 support 2026-03-12 21:00:50 +08:00
Concedo
3092694d2e better resampler 2026-03-12 16:49:53 +08:00
Concedo
318a5486ce duration 2026-03-12 15:33:51 +08:00
Concedo
3cc6e2ea17 make stereo default 2026-03-12 00:10:25 +08:00
Concedo
211d4fe632 lots of tweaks for ace step 2026-03-11 23:57:52 +08:00
Concedo
ecc4865244 improves code output quality 2026-03-10 23:07:52 +08:00
Concedo
ee96e71bae don't resample audio 2026-03-09 22:53:55 +08:00
Concedo
45c74da08b adjust ace step, still wip on caption rework 2026-03-09 00:11:48 +08:00
Concedo
ae67caa2f7 ace qwen rep pen for codes 2026-03-02 21:18:06 +08:00
Concedo
42134db6b4 finally fixed smartcache for qwen 2026-03-02 00:47:38 +08:00
Concedo
6c5a7a27af clamp music duration 2026-03-01 01:15:26 +08:00
Concedo
d643d945f5 clamp music inference steps to 100 max 2026-02-28 12:12:50 +08:00
Concedo
14d82bb38e allow music llm and diffusion gen models to be loaded independently 2026-02-27 21:56:48 +08:00
Concedo
19eb78844c audio codes working 2026-02-27 21:23:00 +08:00
Concedo
ba42f22fc8 stereo is working 2026-02-27 20:36:44 +08:00
Concedo
5a57ed8ca4 revert to 8 step 2026-02-26 22:07:01 +08:00
Concedo
173702d1a4 music lowvram indicator 2026-02-26 21:30:47 +08:00
Concedo
adebf63877 ace converter 2026-02-26 19:53:02 +08:00
Concedo
ac8f12f259 still a bit wonky 2026-02-26 17:50:49 +08:00
Concedo
81fb4d773c swap resampling function 2026-02-26 17:37:53 +08:00
Concedo
fb3f7d92bc reenable cfg 2026-02-26 14:51:15 +08:00
Concedo
b7d2fe68e7 adjust 2026-02-26 14:46:41 +08:00
Concedo
edbc4fe592 music lm finally working 2026-02-26 14:00:58 +08:00
Concedo
cf042af701 Revert "still not working"
This reverts commit a1305ffff9.
2026-02-26 10:55:55 +08:00
Concedo
a1305ffff9 still not working 2026-02-26 10:48:21 +08:00
Concedo
d8746a851f still bugged 2026-02-26 00:07:04 +08:00
Concedo
8a3ccfcba5 some fixes but some issues 2026-02-25 23:41:32 +08:00
Concedo
0eafc3cf2d ace step lowvram mode done, improved 2026-02-24 23:12:26 +08:00
Concedo
11a85d62fc lowvram for music lm 2026-02-24 22:21:17 +08:00
Concedo
aa58d1ed3b all working, but needs to optimize vram 2026-02-24 21:55:57 +08:00
Concedo
488c431331 not yet working 2026-02-24 17:47:50 +08:00
Concedo
0fd7d2c0e5 ace step diffusion loading 2026-02-24 15:24:15 +08:00
Concedo
5311997581 updated ace step cpp 2026-02-23 23:01:10 +08:00