vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-05 20:19:51 +00:00

Author	SHA1	Message	Date
djw	3f9bbf1181	support qwen3, dont speak human language	2025-04-28 08:44:47 +00:00
Atream	46493789eb	fix chat template encoding	2025-04-24 12:44:16 +08:00
qiyuxinlin	4f9950e30c	kill serve lead to kill sched and engine	2025-04-22 09:25:44 +00:00
qiyuxinlin	03a65d6bea	roll back ktransformers backend, add max_tokens, max_completion_tokens param	2025-04-21 12:55:37 +00:00
qiyuxinlin	38e841900d	Move KV cache creation to balance_serve	2025-04-18 10:10:07 +00:00
sean.su	8699109129	Refactor the chat interface to support tool calling and parameter processing Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.	2025-04-14 15:23:37 +08:00
wangkuigang-yewu-cmss	4538bdae97	prevent rpc process from crashing on long prompt 当prompt超过cache_len的时候，rpc进程会crash掉，导致整体不可用。这里增加一个检查，让过长的prompt在请求早期就被提前过滤掉	2025-04-13 16:13:16 +08:00
dongjw	ec03bcbd7f	fix temperature=0, flashinfer sample error	2025-04-07 12:30:47 +08:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00

10 commits