vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-05 12:09:48 +00:00

Author	SHA1	Message	Date
qiyuxinlin	c6aa379de2	support safetensor load, delete architectures argument	2025-05-09 10:38:29 +00:00
qiyuxinlin	27990dc6fb	fix load bug	2025-04-28 21:08:13 +00:00
djw	3f9bbf1181	support qwen3, dont speak human language	2025-04-28 08:44:47 +00:00
chenht2022	f3d842a0ca	support AMX	2025-04-25 14:47:16 +00:00
root	921061666c	fix some bugs	2025-04-17 00:48:09 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	6f43bbe55f	fix-singleton	2025-03-14 04:16:53 +00:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Atream	e8e02e5ccc	support Moonlight	2025-02-23 14:21:18 +00:00
DDong Jianwei	95d937c51d	tmp	2025-02-23 18:51:42 +08:00
Atream	038bc30888	fix precision bug imported by position_ids in 0.2.0	2025-02-17 09:23:14 +00:00
Atream	c189d55bd1	toy support for experts on GPU, no CUDA Graph	2025-02-15 15:16:00 +00:00
Azure	c4d9bc6670	support KExpertsMarlin backend	2025-02-07 05:57:40 +00:00
Azure	907251c743	done support deepseekv3	2025-02-04 15:53:38 +00:00
Azure	f873558a89	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00
Azure	476b1d8dc6	support deepseekv3; runable but have precition problem	2025-01-31 08:27:24 +00:00
xhedit	234faf7987	typo fix: KMisrtal -> KMistral	2024-09-12 15:58:01 +00:00
TangJingqi	6735beb5b6	Fix cannot offload whole layer in cpu	2024-08-29 19:10:14 +08:00
chenxl	4d1d561d28	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
TangJingqi	67043b4b5c	[fix] format classes and files name	2024-08-15 10:44:59 +08:00
Atream	412055d450	[feature] experts can be injected using CPUInfer [fix] fix ktransformers interface when use new CUDAGraphRunner [fix] fix YAML and optimize logic, the top rule has the highest priority	2024-08-14 16:10:54 +08:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenht2022	c1cc7d2cd2	1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic.	2024-08-08 09:04:36 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

26 commits