vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-10 14:51:06 +00:00

Author	SHA1	Message	Date
Jesse	e204a0bb6b	Merge `8c8cb207aa` into `ee2ede0412`	2025-08-05 15:24:17 +08:00
qiyuxinlin	9e1560bb82	GLM4 and SmallThinker	2025-07-25 16:56:36 +00:00
djw	17246bf84f	support smt and glm4	2025-07-25 15:03:27 +00:00
djw	48bc6185b5	support smt and qlm4	2025-07-25 12:48:51 +00:00
qiyuxinlin	712ad1fa3c	smallthinker right	2025-07-25 12:46:14 +00:00
qiyuxinlin	71c1d4eed7	smallthink run	2025-07-24 15:08:29 +00:00
djw	590fcb41cd	support smt and glm4	2025-07-24 12:31:01 +00:00
djw	b66d96db97	support smt and glm4	2025-07-24 08:40:58 +00:00
wang jiahao	a2e95e467a	Update balance_serve.py	2025-07-12 13:14:35 +08:00
Jesse CreateThis	8c8cb207aa	Apply magikRUKKOLA's patch from issue #1417	2025-07-06 19:45:06 +00:00
ouqingliang	90cff820cf	update kvc disk path config.	2025-06-30 15:09:35 +00:00
Ye Zhou	00949d5e8d	Mirror #1247 in server mode	2025-05-29 15:30:40 +08:00
qiyuxinlin	b40f13abeb	fix deduplicate_and_sort cudagraphs	2025-05-15 04:09:34 +00:00
qiyuxinlin	e8e83308a9	fix flashinfer float_workspace_buffer small	2025-05-14 09:33:52 +00:00
qiyuxinlin	c6aa379de2	support safetensor load, delete architectures argument	2025-05-09 10:38:29 +00:00
Atream	7adb7281f4	fix-cache-lens	2025-04-30 03:37:43 +00:00
qiyuxinlin	27990dc6fb	fix load bug	2025-04-28 21:08:13 +00:00
djw	33cbd47086	support qwen3	2025-04-28 18:15:35 +00:00
djw	0da3792b27	support qwen3	2025-04-28 14:05:24 +00:00
djw	3f9bbf1181	support qwen3, dont speak human language	2025-04-28 08:44:47 +00:00
qiyuxinlin	7af83f9efb	fix load default max_new_tokens	2025-04-25 04:20:12 +00:00
Atream	46493789eb	fix chat template encoding	2025-04-24 12:44:16 +08:00
Alisehen	f7d939313b	Merge remote-tracking branch 'origin/main' into check-para	2025-04-23 02:40:14 +00:00
Alisehen	99540ad01f	add check parameters	2025-04-23 02:38:43 +00:00
Alisehen	c995bdbbfa	add check-para	2025-04-22 09:30:08 +00:00
qiyuxinlin	4f9950e30c	kill serve lead to kill sched and engine	2025-04-22 09:25:44 +00:00
qiyuxinlin	b17ab8653c	update speed test	2025-04-22 07:38:05 +00:00
qiyuxinlin	03a65d6bea	roll back ktransformers backend, add max_tokens, max_completion_tokens param	2025-04-21 12:55:37 +00:00
wang jiahao	a1162eea01	Merge pull request #1158 from Creeper-MZ/function_call Some checks failed Book-CI / test (push) Failing after 9s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Update Function call	2025-04-19 16:31:37 +08:00
Creeper-MZ	133ba746e9	优化提示词，解决部分Deepseek r1的兼容性优化提示词，解决部分Deepseek r1的兼容性 fix non stream	2025-04-19 01:20:27 -04:00
wang jiahao	0892d37d2d	Merge pull request #1172 from kvcache-ai/move_create_sched Some checks failed Book-CI / test (push) Failing after 4s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Move KV cache creation to balance_serve	2025-04-18 18:19:29 +08:00
qiyuxinlin	38e841900d	Move KV cache creation to balance_serve	2025-04-18 10:10:07 +00:00
Atream	e44c45e782	Merge pull request #1163 from cyhasuka/main Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation	2025-04-18 00:50:58 -06:00
Atream	e6fb4d5a58	remove hard code max_length	2025-04-18 12:11:18 +08:00
Creeper-MZ	62c4023160	Fixed #1155	2025-04-17 10:21:51 -04:00
Creeper-MZ	4fb19bfcae	Update chat.py	2025-04-17 09:19:14 -04:00
Yuhao Tsui	8ce34b3b5c	Modify the performance calculation module Modify the performance data calculation module from estimation to retrieving from `raw_usage`.	2025-04-17 16:57:53 +08:00
wang jiahao	6e4da83d4b	Merge pull request #978 from cyhasuka/main Some checks failed Deploy / deploy (ubuntu-latest) (push) Failing after 3s Details Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Feat: Support Non-streaming chat in Ollama backend	2025-04-17 14:34:35 +08:00
Creeper-MZ	cb266c98d4	Fix a bug	2025-04-16 23:31:33 -04:00
Creeper-MZ	6bc2e85343	Update chat.py	2025-04-16 15:54:23 -04:00
Creeper-MZ	88f688e2c8	更改token注入逻辑，减少token注入量，防止遗忘 Update chat.py Update chat.py Update chat.py	2025-04-16 15:52:24 -04:00
kevin	c8db24d5eb	Update config.py Update config.py	2025-04-16 17:32:08 +08:00
sean.su	8699109129	Refactor the chat interface to support tool calling and parameter processing Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.	2025-04-14 15:23:37 +08:00
Creeper-MZ	a7e8d7c1af	updata function_call	2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss	4538bdae97	prevent rpc process from crashing on long prompt 当prompt超过cache_len的时候，rpc进程会crash掉，导致整体不可用。这里增加一个检查，让过长的prompt在请求早期就被提前过滤掉	2025-04-13 16:13:16 +08:00
Yuhao Tsui	877aec858e	Merge branch 'kvcache-ai:main' into main	2025-04-09 11:46:39 +08:00
qiyuxinlin	64de784328	format kvc2, delete quant_configs, move model_configs to ~/.ktransformers	2025-04-08 10:06:07 +00:00
dongjw	ec03bcbd7f	fix temperature=0, flashinfer sample error	2025-04-07 12:30:47 +08:00
Qin's repo	2c3a3a1e1c	slove [Bug] #1023 Only modified the mixed single and double quotes in server/config/config.py	2025-04-03 14:37:32 +08:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00

1 2 3

126 commits