vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-05 12:09:48 +00:00

Author	SHA1	Message	Date
Atream	7e1fe256c8	optimize GPU	2025-02-21 05:06:57 +00:00
Atream	038bc30888	fix precision bug imported by position_ids in 0.2.0	2025-02-17 09:23:14 +00:00
liam	4385e85096	⚡ support force thinking	2025-02-12 12:43:53 +08:00
liam	d07087a7e2	⚡ support R1 force thinking	2025-02-11 15:43:41 +08:00
liam	098602b08f	⚡ v0.2 ongoing	2025-02-09 22:41:14 +08:00
liam	c18ecd7b7f	⚡ add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu	2025-02-08 13:15:52 +08:00
liam	0262f954c7	Merge branch 'feat-DeepSeekV3' of github.com:kvcache-ai/ktransformers into feat-DeepSeekV3	2025-02-06 22:41:25 +08:00
liam	3dca28d23b	⚡ fix moe.cpp int overflow problem	2025-02-06 22:39:16 +08:00
Azure	027b11266c	modify moeinfer param	2025-02-06 14:07:38 +00:00
Azure	ee24a27001	update v3 single gpu rule yaml;	2025-02-04 16:14:35 +00:00
Azure	f873558a89	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00
Azure	5a50b34627	fix hard coding caused by rope dim calculation, load from config now	2025-01-31 15:25:50 +00:00
Azure	476b1d8dc6	support deepseekv3; runable but have precition problem	2025-01-31 08:27:24 +00:00
anyanqilin	a72dc6ed15	wjh change	2024-11-04 14:02:19 +08:00
liam	7c94df4bcf	🚑️: back transformer.py bugs version, and fix typo error in local_chat.py	2024-11-04 14:02:19 +08:00
liam	dd1d8667f3	✨: refactor local_chat and fix message slice bug in server	2024-11-04 14:02:19 +08:00
TangJingqi	6735beb5b6	Fix cannot offload whole layer in cpu	2024-08-29 19:10:14 +08:00
chenxl	4d1d561d28	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

20 commits