vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-10 06:14:58 +00:00

Author	SHA1	Message	Date
sean.su	8699109129	Refactor the chat interface to support tool calling and parameter processing Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.	2025-04-14 15:23:37 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
BITcyman	299c4dca64	[update] support openai chat completion api	2025-03-07 08:51:09 +00:00
wang jiahao	48b9800790	Merge pull request #759 from 3wweiweiwu/fix_top_p_typo fix typo for top_p	2025-03-02 13:58:11 +08:00
1668068727@qq.com	7cdf8139f0	fix ollama api temperature bug	2025-03-02 13:55:26 +08:00
Wix Woo	3aa0cfc29d	fix typo for top_p	2025-03-01 20:15:36 +00:00
Atream	fa03ea48dd	Merge branch 'main' into feat-chunk-prefill-flashinfer	2025-03-01 11:35:09 +00:00
Atream	f35e8d41d8	support chunk prefill, support 139K context for 24G VRAM	2025-03-01 11:28:25 +00:00
liam	80e0536fb0	Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main	2025-03-01 00:12:21 +08:00
liam	8ddc990668	⚡ fix server cache lens	2025-03-01 00:09:57 +08:00
qiyuxinlin	22df52e94e	fix temperature	2025-02-27 21:00:44 +08:00
lazymio	b121ca4df8	Fix according to upstream changes	2025-02-27 18:11:35 +08:00
wang jiahao	26f7b4af11	Merge branch 'main' into temperature_top_p_from_request	2025-02-27 18:08:55 +08:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
lazymio	07eb712a73	Left out	2025-02-24 21:51:14 +08:00
lazymio	8704c09192	Allow temperature and top_p from requests	2025-02-24 21:01:33 +08:00
Atream	7e1fe256c8	optimize GPU	2025-02-21 05:06:57 +00:00
ceerrep	73d072f609	Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache	2025-02-18 11:44:28 +08:00
Xie Weiyu	f029588b61	fix server warmup	2025-02-18 11:39:45 +08:00
ceerrep	c70b6f4d5b	fix: use 'cuda:0' by default if torch_device is 'cuda'	2025-02-18 11:15:17 +08:00
Xie Weiyu	c176e516b5	server mix mla	2025-02-17 20:40:28 +08:00
ceerrep	ee24eb8dc3	fix: fix server for triton kernel	2025-02-17 18:08:45 +08:00
ceerrep	bb0ccc7b1a	feat: add prefix cache for server	2025-02-17 00:10:55 +08:00
hrz6976	2c3dcd9774	Add a lock to server inference()	2025-02-13 10:05:22 +00:00
Azure	c4d9bc6670	support KExpertsMarlin backend	2025-02-07 05:57:40 +00:00
Azure	907251c743	done support deepseekv3	2025-02-04 15:53:38 +00:00
Azure	476b1d8dc6	support deepseekv3; runable but have precition problem	2025-01-31 08:27:24 +00:00
liam	dd1d8667f3	✨: refactor local_chat and fix message slice bug in server	2024-11-04 14:02:19 +08:00
chenxl	b9f0819a86	None for load config	2024-08-22 15:52:25 +00:00
TangJingqi	170b7a6001	fix server don't accept yaml path as param; fix server static cache device problem	2024-08-21 14:19:43 +08:00
Atream	412055d450	[feature] experts can be injected using CPUInfer [fix] fix ktransformers interface when use new CUDAGraphRunner [fix] fix YAML and optimize logic, the top rule has the highest priority	2024-08-14 16:10:54 +08:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

34 commits