mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-09 13:55:27 +00:00
prevent rpc process from crashing on long prompt
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。 这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
This commit is contained in:
parent
797dac7e31
commit
4538bdae97
1 changed files with 4 additions and 0 deletions
|
@ -374,6 +374,10 @@ class BalanceServeInterface(BackendInterfaceBase):
|
|||
top_p = 0.0001
|
||||
query_add.sample_options.top_p = top_p
|
||||
query_add.estimated_length = min(self.args.cache_lens, query_length+self.args.max_new_tokens)
|
||||
|
||||
if query_add.estimated_length < query_add.query_length:
|
||||
raise Exception(f'query too long: estimated_length={query_add.estimated_length} < query_length={query_add.query_length}')
|
||||
|
||||
query_id = self.sched_client.add_query(query_add)
|
||||
queue = asyncio.Queue(maxsize=self.args.max_new_tokens)
|
||||
self.queue_map[query_id] = queue
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue