Fix the penultimate token sometimes being lost with SSE streaming (#1031)

The token immediately before an eot token was lost when SSE streaming was enabled if that token was contained entirely within a stop sequence. As an example of when this could happen, consider this prompt: Type the phrase 'pleas' once. In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token 'as' is contained within this instruct mode stop sequence: <|eot_id|><|start_header_id|>assistant<|end_header_id|> due to the word 'assistant'. Since `string_contains_sequence_substring` returns True for 'as', this token is added to `tokenReserve` instead of being streamed immediately. If the '<|eot_id|>' token was generated next, the text in `tokenReserve` would be discarded.
2025-09-10 17:14:36 +00:00 · 2024-07-29 05:16:47 -07:00 · 2024-07-29 05:16:47 -07:00 · 26f1df5e5f
commit 26f1df5e5f
parent 948646ff7a
1 changed files with 1 additions and 1 deletions
--- a/koboldcpp.py
+++ b/koboldcpp.py
@ -1447,7 +1447,7 @@ class ServerRequestHandler(http.server.SimpleHTTPRequestHandler):
                        tokenReserve += tokenStr
                        await asyncio.sleep(async_sleep_short) #if a stop sequence could trigger soon, do not send output
                    else:
-                        if tokenStr!="":
+                        if tokenStr!="" or tokenReserve!="":
                            tokenStr = tokenReserve + tokenStr
                            tokenReserve = ""