vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-06 20:49:55 +00:00

Author	SHA1	Message	Date
qiyuxinlin	b17ab8653c	update speed test	2025-04-22 07:38:05 +00:00
qiyuxinlin	03a65d6bea	roll back ktransformers backend, add max_tokens, max_completion_tokens param	2025-04-21 12:55:37 +00:00
dongjw	be84d04253	Fix bug with non-base-multiple chunk_size, update test examples, and resolve issue with writing model_config. Hugging Face URL input is still unsupported.	2025-04-04 15:41:07 +08:00
Azure-Tang	3a5330b215	Merge branch 'main' into work-concurrent	2025-04-01 06:48:19 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Jiaqi Liao	05f6cede37	Merge pull request #943 from SkqLiao/main fix benchmark params for human eval benchmark	2025-03-20 18:49:34 +08:00
SkqLiao	6d4626a5d9	fix params	2025-03-20 18:48:51 +08:00
SkqLiao	8cc4df980e	use DeepSeek V3 instead of R1 for benchmarking	2025-03-20 11:59:03 +08:00
Jiaqi Liao	32a91c78c1	Merge pull request #935 from SkqLiao/main Fix benchmarking slow issue on self-hosted actions	2025-03-20 10:14:37 +08:00
SkqLiao	19c824f9d0	change cpu-infer due to actual cpu cores on self-hosted server.	2025-03-20 10:10:52 +08:00
Jiaqi Liao	649489dc67	Merge pull request #931 from SkqLiao/main Add Human Eval Benchmark Test for CI/CD	2025-03-19 21:35:24 +08:00
SkqLiao	bc369b256c	add CI/CD for human eval score benchmarking	2025-03-19 21:25:21 +08:00
Azure-Tang	ed8437413b	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
Atream	d453c320f1	fix flashinfer precision	2025-03-07 14:07:00 +00:00
liam	8eeb6dd432	⚡ update compile option for avx512vpopcntdq	2025-03-06 12:18:04 +08:00
liam	848fe8ab97	⚡ release v0.2.3	2025-03-05 20:21:04 +08:00
liam	dc10480ef6	⚡ add humaneval support	2025-03-04 20:54:49 +08:00
liam	0ca0b99fab	⚡ update git ignore add docker dev container	2025-02-25 17:22:11 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Azure	7b7c6a657d	Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225 '	2025-02-22 13:05:08 +00:00
liam	97e1dc97f6	⚡ fix .so bug	2025-02-20 21:24:46 +08:00
liam	592e13d453	⚡ add mmlu_pro test	2025-02-18 14:43:38 +08:00
liam	07a0555016	⚡ fix device and add test	2025-02-18 12:52:17 +08:00
TangJingqi	67043b4b5c	[fix] format classes and files name	2024-08-15 10:44:59 +08:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

27 commits