Commit graph

10 commits

Author SHA1 Message Date
djw
3f9bbf1181 support qwen3, dont speak human language 2025-04-28 08:44:47 +00:00
Atream
46493789eb
fix chat template encoding 2025-04-24 12:44:16 +08:00
qiyuxinlin
4f9950e30c kill serve lead to kill sched and engine 2025-04-22 09:25:44 +00:00
qiyuxinlin
03a65d6bea roll back ktransformers backend, add max_tokens, max_completion_tokens param 2025-04-21 12:55:37 +00:00
qiyuxinlin
38e841900d Move KV cache creation to balance_serve 2025-04-18 10:10:07 +00:00
sean.su
8699109129 Refactor the chat interface to support tool calling and parameter processing
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.

Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.

Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.

Updated ktransformers.py and transformers.py to support the passing of tool parameters.

Modified the default value of top_p in config.py to 1.0 to increase generation diversity.

Extended the message model in chat.py to support the transmission of tool call information.

These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
wangkuigang-yewu-cmss
4538bdae97 prevent rpc process from crashing on long prompt
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
dongjw
ec03bcbd7f fix temperature=0, flashinfer sample error 2025-04-07 12:30:47 +08:00
dongjw
5c7ed7b579 fix top_p = 0 bug 2025-04-01 20:38:33 +08:00
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00