vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-16 18:19:42 +00:00

Author	SHA1	Message	Date
Atream	3b9e16cec7	Update attention.py	2025-04-09 10:54:00 +08:00
qiyuxinlin	64de784328	format kvc2, delete quant_configs, move model_configs to ~/.ktransformers	2025-04-08 10:06:07 +00:00
Azure	77c6cc82ac	Merge pull request #1063 from aubreyli/KLinearCPUInfer.forward-fix Fix TypeError when invoke KLinearCPUInfer.forward()	2025-04-07 15:10:46 +08:00
dongjw	ec03bcbd7f	fix temperature=0, flashinfer sample error	2025-04-07 12:30:47 +08:00
Aubrey Li	12a4c631df	Fix TypeError when invoke KLinearCPUInfer.forward() Fix the following error: File "/home/aubrey/work/ktransformers/ktransformers/operators/linear.py", line 825, in forward y = self.generate_linear.forward(x, bsz_tensor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: KLinearCPUInfer.forward() takes 2 positional arguments but 3 were given	2025-04-07 12:03:35 +08:00
ZiWei Yuan	a5608dcb80	🔖 release v0.2.4post1	2025-04-04 16:01:25 +08:00
dongjw	be84d04253	Fix bug with non-base-multiple chunk_size, update test examples, and resolve issue with writing model_config. Hugging Face URL input is still unsupported.	2025-04-04 15:41:07 +08:00
liam	b151a98cab	🔧 update config.yaml setting default config	2025-04-03 11:55:50 +00:00
Atream	e36ddc36a8	Update modeling_deepseek_v3.py	2025-04-03 17:13:06 +08:00
Qin's repo	2c3a3a1e1c	slove [Bug] #1023 Only modified the mixed single and double quotes in server/config/config.py	2025-04-03 14:37:32 +08:00
dongjw	1b7672937b	update install doc and fix local_chat bug	2025-04-03 12:42:41 +08:00
dongjw	56a18ad02c	change tag v0.2.4	2025-04-01 21:07:13 +08:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00
Azure-Tang	31677181c3	Fix ktransformers-server flashinfer wrapper position arg issue; Fix db position issue	2025-04-01 07:30:23 +00:00
Azure-Tang	203b853c75	rm KMoEGateDeepSeekV3, fall back to KMoEGate	2025-04-01 07:13:05 +00:00
Azure-Tang	3a5330b215	Merge branch 'main' into work-concurrent	2025-04-01 06:48:19 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	8d0292aa44	refactor folders	2025-03-31 22:45:37 +08:00
Jiaqi Liao	05f6cede37	Merge pull request #943 from SkqLiao/main fix benchmark params for human eval benchmark	2025-03-20 18:49:34 +08:00
SkqLiao	6d4626a5d9	fix params	2025-03-20 18:48:51 +08:00
Atream	633af5d235	Update gate.py	2025-03-20 14:54:01 +08:00
SkqLiao	8cc4df980e	use DeepSeek V3 instead of R1 for benchmarking	2025-03-20 11:59:03 +08:00
Jiaqi Liao	32a91c78c1	Merge pull request #935 from SkqLiao/main Fix benchmarking slow issue on self-hosted actions	2025-03-20 10:14:37 +08:00
SkqLiao	19c824f9d0	change cpu-infer due to actual cpu cores on self-hosted server.	2025-03-20 10:10:52 +08:00
Jiaqi Liao	649489dc67	Merge pull request #931 from SkqLiao/main Add Human Eval Benchmark Test for CI/CD	2025-03-19 21:35:24 +08:00
SkqLiao	bc369b256c	add CI/CD for human eval score benchmarking	2025-03-19 21:25:21 +08:00
Atream	b453333f60	Update gate.py	2025-03-19 16:14:54 +08:00
Atream	44599229cd	Update gate.py	2025-03-19 12:16:48 +08:00
Atream	114995355b	fix-gate-compile	2025-03-19 11:27:18 +08:00
Atream	167506b779	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-03-17 17:05:01 +08:00
Atream	c9a0c44213	Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml	2025-03-17 17:03:52 +08:00
liam	19f058ec9e	🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml	2025-03-17 15:08:12 +08:00
Azure-Tang	85c32fdd10	Fix rocm example yaml	2025-03-15 22:27:02 -04:00
Azure-Tang	4a31237346	fix rocm compilation	2025-03-15 12:34:03 -04:00
Atream	3934b9dfc1	rollback-triton-prefill	2025-03-15 14:21:21 +00:00
ZiWei Yuan	9b76cab1a5	Merge pull request #898 from kvcache-ai/develop-0.2.3post2 Release 0.2.3post2	2025-03-15 18:11:42 +08:00
liam	b5ef7c26dc	🔖 release v0.2.3post2	2025-03-15 18:04:10 +08:00
Azure	117a8d2f2a	fix compilation	2025-03-14 19:49:20 +00:00
SkqLiao	0f1684c28d	local chat for cicd test	2025-03-15 02:31:19 +08:00
Azure	3986e2d2cf	Merge pull request #178 from fxzjshm/hip [Feat] Port to ROCm/HIP	2025-03-15 02:31:07 +08:00
Azure-Tang	e5b001d76f	Update readme; Format code; Add example yaml.	2025-03-14 14:25:52 -04:00
Atream	a889288fc1	use compile for gate, slight performance improvement	2025-03-14 12:43:28 +00:00
Azure-Tang	ed8437413b	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
Atream	6f43bbe55f	fix-singleton	2025-03-14 04:16:53 +00:00
Lander-Hatsune	d166fb9f6e	cpuinfer: filter repeated backend instantiation	2025-03-10 22:03:04 +08:00
Atream	09c043d8a6	Merge pull request #842 from BITcyman/fix-openai_chat_completion [fix] thread context bug	2025-03-07 22:56:19 +08:00
BITcyman	08a8b553d6	[fix] thread context bug	2025-03-07 14:52:16 +00:00
Atream	f8c1821f1d	Update __init__.py	2025-03-07 22:08:48 +08:00
Atream	d453c320f1	fix flashinfer precision	2025-03-07 14:07:00 +00:00
BITcyman	299c4dca64	[update] support openai chat completion api	2025-03-07 08:51:09 +00:00

1 2 3 4 5

248 commits