mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-19 08:00:25 +00:00
* Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| mp3 | ||
| ace-qwen3.cpp | ||
| acestep_convert.py | ||
| ACESTEP_LICENSE | ||
| backend.h | ||
| bpe.h | ||
| cond.h | ||
| debug.h | ||
| dit-vae.cpp | ||
| dit.h | ||
| gguf_weights.h | ||
| music_adapter.cpp | ||
| quantize-acestep.cpp | ||
| qwen3-lm.h | ||
| qwen3.h | ||
| request.cpp | ||
| request.h | ||
| tokenizer.h | ||
| vae-enc.h | ||
| vae.h | ||
| weight_ctx.h | ||