Commit graph

17 commits

Author SHA1 Message Date
djw
48bc6185b5 support smt and qlm4 2025-07-25 12:48:51 +00:00
qiyuxinlin
712ad1fa3c smallthinker right 2025-07-25 12:46:14 +00:00
djw
b66d96db97 support smt and glm4 2025-07-24 08:40:58 +00:00
wang jiahao
a2e95e467a
Update balance_serve.py 2025-07-12 13:14:35 +08:00
qiyuxinlin
e8e83308a9 fix flashinfer float_workspace_buffer small 2025-05-14 09:33:52 +00:00
qiyuxinlin
c6aa379de2 support safetensor load, delete architectures argument 2025-05-09 10:38:29 +00:00
djw
33cbd47086 support qwen3 2025-04-28 18:15:35 +00:00
djw
3f9bbf1181 support qwen3, dont speak human language 2025-04-28 08:44:47 +00:00
Atream
46493789eb
fix chat template encoding 2025-04-24 12:44:16 +08:00
qiyuxinlin
4f9950e30c kill serve lead to kill sched and engine 2025-04-22 09:25:44 +00:00
qiyuxinlin
03a65d6bea roll back ktransformers backend, add max_tokens, max_completion_tokens param 2025-04-21 12:55:37 +00:00
qiyuxinlin
38e841900d Move KV cache creation to balance_serve 2025-04-18 10:10:07 +00:00
sean.su
8699109129 Refactor the chat interface to support tool calling and parameter processing
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.

Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.

Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.

Updated ktransformers.py and transformers.py to support the passing of tool parameters.

Modified the default value of top_p in config.py to 1.0 to increase generation diversity.

Extended the message model in chat.py to support the transmission of tool call information.

These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
wangkuigang-yewu-cmss
4538bdae97 prevent rpc process from crashing on long prompt
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
dongjw
ec03bcbd7f fix temperature=0, flashinfer sample error 2025-04-07 12:30:47 +08:00
dongjw
5c7ed7b579 fix top_p = 0 bug 2025-04-01 20:38:33 +08:00
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00