vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-06 12:40:02 +00:00

Author	SHA1	Message	Date
rnwang04	142fb7ce6c	Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first	2025-05-14 19:37:27 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Azure-Tang	ed8437413b	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
Atream	6f43bbe55f	fix-singleton	2025-03-14 04:16:53 +00:00
宁鹏涛	71286ec1c0	Update local_chat.py 修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真	2025-03-01 21:52:48 +08:00
Atream	f35e8d41d8	support chunk prefill, support 139K context for 24G VRAM	2025-03-01 11:28:25 +00:00
Atream	e645d84794	use generation config from json file in official repo	2025-02-27 11:48:34 +00:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
Atream	e8e02e5ccc	support Moonlight	2025-02-23 14:21:18 +00:00
DDong Jianwei	95d937c51d	tmp	2025-02-23 18:51:42 +08:00
Atream	7e1fe256c8	optimize GPU	2025-02-21 05:06:57 +00:00
Atream	038bc30888	fix precision bug imported by position_ids in 0.2.0	2025-02-17 09:23:14 +00:00
liam	4385e85096	⚡ support force thinking	2025-02-12 12:43:53 +08:00
liam	d07087a7e2	⚡ support R1 force thinking	2025-02-11 15:43:41 +08:00
liam	098602b08f	⚡ v0.2 ongoing	2025-02-09 22:41:14 +08:00
liam	c18ecd7b7f	⚡ add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu	2025-02-08 13:15:52 +08:00
liam	0262f954c7	Merge branch 'feat-DeepSeekV3' of github.com:kvcache-ai/ktransformers into feat-DeepSeekV3	2025-02-06 22:41:25 +08:00
liam	3dca28d23b	⚡ fix moe.cpp int overflow problem	2025-02-06 22:39:16 +08:00
Azure	027b11266c	modify moeinfer param	2025-02-06 14:07:38 +00:00
Azure	ee24a27001	update v3 single gpu rule yaml;	2025-02-04 16:14:35 +00:00
Azure	f873558a89	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00
Azure	5a50b34627	fix hard coding caused by rope dim calculation, load from config now	2025-01-31 15:25:50 +00:00
Azure	476b1d8dc6	support deepseekv3; runable but have precition problem	2025-01-31 08:27:24 +00:00
anyanqilin	a72dc6ed15	wjh change	2024-11-04 14:02:19 +08:00
liam	7c94df4bcf	🚑️: back transformer.py bugs version, and fix typo error in local_chat.py	2024-11-04 14:02:19 +08:00
liam	dd1d8667f3	✨: refactor local_chat and fix message slice bug in server	2024-11-04 14:02:19 +08:00
TangJingqi	6735beb5b6	Fix cannot offload whole layer in cpu	2024-08-29 19:10:14 +08:00
chenxl	4d1d561d28	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

32 commits