vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-06 12:40:02 +00:00

Author	SHA1	Message	Date
qiyuxinlin	b17ab8653c	update speed test	2025-04-22 07:38:05 +00:00
qiyuxinlin	03a65d6bea	roll back ktransformers backend, add max_tokens, max_completion_tokens param	2025-04-21 12:55:37 +00:00
wang jiahao	a1162eea01	Merge pull request #1158 from Creeper-MZ/function_call Some checks failed Book-CI / test (push) Failing after 9s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Update Function call	2025-04-19 16:31:37 +08:00
Creeper-MZ	133ba746e9	优化提示词，解决部分Deepseek r1的兼容性优化提示词，解决部分Deepseek r1的兼容性 fix non stream	2025-04-19 01:20:27 -04:00
wang jiahao	0892d37d2d	Merge pull request #1172 from kvcache-ai/move_create_sched Some checks failed Book-CI / test (push) Failing after 4s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Move KV cache creation to balance_serve	2025-04-18 18:19:29 +08:00
qiyuxinlin	38e841900d	Move KV cache creation to balance_serve	2025-04-18 10:10:07 +00:00
Atream	e44c45e782	Merge pull request #1163 from cyhasuka/main Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation	2025-04-18 00:50:58 -06:00
Atream	e6fb4d5a58	remove hard code max_length	2025-04-18 12:11:18 +08:00
Creeper-MZ	62c4023160	Fixed #1155	2025-04-17 10:21:51 -04:00
Creeper-MZ	4fb19bfcae	Update chat.py	2025-04-17 09:19:14 -04:00
Yuhao Tsui	8ce34b3b5c	Modify the performance calculation module Modify the performance data calculation module from estimation to retrieving from `raw_usage`.	2025-04-17 16:57:53 +08:00
wang jiahao	6e4da83d4b	Merge pull request #978 from cyhasuka/main Some checks failed Deploy / deploy (ubuntu-latest) (push) Failing after 3s Details Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Feat: Support Non-streaming chat in Ollama backend	2025-04-17 14:34:35 +08:00
Creeper-MZ	cb266c98d4	Fix a bug	2025-04-16 23:31:33 -04:00
Creeper-MZ	6bc2e85343	Update chat.py	2025-04-16 15:54:23 -04:00
Creeper-MZ	88f688e2c8	更改token注入逻辑，减少token注入量，防止遗忘 Update chat.py Update chat.py Update chat.py	2025-04-16 15:52:24 -04:00
kevin	c8db24d5eb	Update config.py Update config.py	2025-04-16 17:32:08 +08:00
sean.su	8699109129	Refactor the chat interface to support tool calling and parameter processing Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.	2025-04-14 15:23:37 +08:00
Creeper-MZ	a7e8d7c1af	updata function_call	2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss	4538bdae97	prevent rpc process from crashing on long prompt 当prompt超过cache_len的时候，rpc进程会crash掉，导致整体不可用。这里增加一个检查，让过长的prompt在请求早期就被提前过滤掉	2025-04-13 16:13:16 +08:00
Yuhao Tsui	877aec858e	Merge branch 'kvcache-ai:main' into main	2025-04-09 11:46:39 +08:00
qiyuxinlin	64de784328	format kvc2, delete quant_configs, move model_configs to ~/.ktransformers	2025-04-08 10:06:07 +00:00
dongjw	ec03bcbd7f	fix temperature=0, flashinfer sample error	2025-04-07 12:30:47 +08:00
Qin's repo	2c3a3a1e1c	slove [Bug] #1023 Only modified the mixed single and double quotes in server/config/config.py	2025-04-03 14:37:32 +08:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00
Azure-Tang	31677181c3	Fix ktransformers-server flashinfer wrapper position arg issue; Fix db position issue	2025-04-01 07:30:23 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Yuhao Tsui	84164f584c	Update completions.py	2025-03-26 15:39:46 +08:00
Yuhao Tsui	e5694f91c0	Merge branch 'kvcache-ai:main' into main	2025-03-10 09:10:28 +08:00
Atream	09c043d8a6	Merge pull request #842 from BITcyman/fix-openai_chat_completion [fix] thread context bug	2025-03-07 22:56:19 +08:00
BITcyman	08a8b553d6	[fix] thread context bug	2025-03-07 14:52:16 +00:00
Atream	d453c320f1	fix flashinfer precision	2025-03-07 14:07:00 +00:00
BITcyman	299c4dca64	[update] support openai chat completion api	2025-03-07 08:51:09 +00:00
Yuhao Tsui	d050d8655f	Update completions.py	2025-03-06 11:16:33 +08:00
chenmz00	b2ba795cfd	fix: list models API Fix the list models API to match the corresponding OpenAI API format.	2025-03-05 21:49:27 +08:00
Azure	662c1e4c14	small fix about max new token	2025-03-05 09:25:41 +00:00
wang jiahao	48b9800790	Merge pull request #759 from 3wweiweiwu/fix_top_p_typo fix typo for top_p	2025-03-02 13:58:11 +08:00
1668068727@qq.com	7cdf8139f0	fix ollama api temperature bug	2025-03-02 13:55:26 +08:00
Wix Woo	3aa0cfc29d	fix typo for top_p	2025-03-01 20:15:36 +00:00
Atream	ca1dc1e7d1	Merge branch 'main' into main	2025-03-01 23:24:10 +08:00
Atream	fa03ea48dd	Merge branch 'main' into feat-chunk-prefill-flashinfer	2025-03-01 11:35:09 +00:00
Atream	f35e8d41d8	support chunk prefill, support 139K context for 24G VRAM	2025-03-01 11:28:25 +00:00
liam	80e0536fb0	Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main	2025-03-01 00:12:21 +08:00
liam	8ddc990668	⚡ fix server cache lens	2025-03-01 00:09:57 +08:00
qiyuxinlin	22df52e94e	fix temperature	2025-02-27 21:00:44 +08:00
lazymio	b121ca4df8	Fix according to upstream changes	2025-02-27 18:11:35 +08:00
wang jiahao	26f7b4af11	Merge branch 'main' into temperature_top_p_from_request	2025-02-27 18:08:55 +08:00
Atream	798e1d0cfa	Merge pull request #532 from xv44586/fix-sse-formatting fix: fix SSE formatting	2025-02-27 12:19:23 +08:00
Atream	f403cde6d4	Merge pull request #650 from ceerRep/main feat: basic api key support	2025-02-27 12:16:53 +08:00
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00

1 2

100 commits