From 5ff6cefce05904d59fe5c9d046e081e8d84ef1b7 Mon Sep 17 00:00:00 2001 From: Alistair Stewart Date: Mon, 23 Mar 2026 09:02:14 +0000 Subject: [PATCH] Fix music generation token stopping (#2057) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --------- Co-authored-by: Claude --- otherarch/acestep/ace-qwen3.cpp | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/otherarch/acestep/ace-qwen3.cpp b/otherarch/acestep/ace-qwen3.cpp index f3df01f8a..a1439209d 100644 --- a/otherarch/acestep/ace-qwen3.cpp +++ b/otherarch/acestep/ace-qwen3.cpp @@ -1007,6 +1007,14 @@ static std::vector generate_phase1_batch( continue; } } + + // Safety check: if we've reached the token limit, force TOKEN_IM_END + // to prevent KV cache exhaustion (FATAL: kv_len > max_seq) + if ((int)seqs[i].gen_tokens.size() >= max_new_tokens - 1 && !seqs[i].done) { + forced_tokens.clear(); + forced_tokens.push_back(TOKEN_IM_END); + } + seqs[i].gen_tokens.push_back(tok); } seqs[i].last_token = tok;