Fix the penultimate token sometimes being lost with SSE streaming (#1031)

The token immediately before an eot token was lost when SSE streaming
was enabled if that token was contained entirely within a stop sequence.
As an example of when this could happen, consider this prompt:
  Type the phrase 'pleas' once.
In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token
'as' is contained within this instruct mode stop sequence:
  <|eot_id|><|start_header_id|>assistant<|end_header_id|>
due to the word 'assistant'. Since `string_contains_sequence_substring`
returns True for 'as', this token is added to `tokenReserve` instead of
being streamed immediately. If the '<|eot_id|>' token was generated
next, the text in `tokenReserve` would be discarded.
This commit is contained in:
Llama 2024-07-29 05:16:47 -07:00 committed by GitHub
parent 948646ff7a
commit 26f1df5e5f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1447,7 +1447,7 @@ class ServerRequestHandler(http.server.SimpleHTTPRequestHandler):
tokenReserve += tokenStr
await asyncio.sleep(async_sleep_short) #if a stop sequence could trigger soon, do not send output
else:
if tokenStr!="":
if tokenStr!="" or tokenReserve!="":
tokenStr = tokenReserve + tokenStr
tokenReserve = ""