mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-07 09:02:04 +00:00
Fix music generation token stopping (#2057)
* Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
parent
993925ba96
commit
5ff6cefce0
1 changed files with 8 additions and 0 deletions
|
|
@ -1007,6 +1007,14 @@ static std::vector<std::string> generate_phase1_batch(
|
|||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
// Safety check: if we've reached the token limit, force TOKEN_IM_END
|
||||
// to prevent KV cache exhaustion (FATAL: kv_len > max_seq)
|
||||
if ((int)seqs[i].gen_tokens.size() >= max_new_tokens - 1 && !seqs[i].done) {
|
||||
forced_tokens.clear();
|
||||
forced_tokens.push_back(TOKEN_IM_END);
|
||||
}
|
||||
|
||||
seqs[i].gen_tokens.push_back(tok);
|
||||
}
|
||||
seqs[i].last_token = tok;
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue