Commit graph

230 commits

Author SHA1 Message Date
Jiaqi Liao
05f6cede37
Merge pull request #943 from SkqLiao/main
fix benchmark params for human eval benchmark
2025-03-20 18:49:34 +08:00
SkqLiao
6d4626a5d9 fix params 2025-03-20 18:48:51 +08:00
Atream
633af5d235
Update gate.py 2025-03-20 14:54:01 +08:00
SkqLiao
8cc4df980e use DeepSeek V3 instead of R1 for benchmarking 2025-03-20 11:59:03 +08:00
Jiaqi Liao
32a91c78c1
Merge pull request #935 from SkqLiao/main
Fix benchmarking slow issue on self-hosted actions
2025-03-20 10:14:37 +08:00
SkqLiao
19c824f9d0 change cpu-infer due to actual cpu cores on self-hosted server. 2025-03-20 10:10:52 +08:00
Jiaqi Liao
649489dc67
Merge pull request #931 from SkqLiao/main
Add Human Eval Benchmark Test for CI/CD
2025-03-19 21:35:24 +08:00
SkqLiao
bc369b256c add CI/CD for human eval score benchmarking 2025-03-19 21:25:21 +08:00
Atream
b453333f60
Update gate.py 2025-03-19 16:14:54 +08:00
Atream
44599229cd
Update gate.py 2025-03-19 12:16:48 +08:00
Atream
114995355b
fix-gate-compile 2025-03-19 11:27:18 +08:00
Atream
167506b779
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml 2025-03-17 17:05:01 +08:00
Atream
c9a0c44213
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml 2025-03-17 17:03:52 +08:00
liam
19f058ec9e 🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml 2025-03-17 15:08:12 +08:00
Azure-Tang
85c32fdd10 Fix rocm example yaml 2025-03-15 22:27:02 -04:00
Azure-Tang
4a31237346 fix rocm compilation 2025-03-15 12:34:03 -04:00
Atream
3934b9dfc1 rollback-triton-prefill 2025-03-15 14:21:21 +00:00
ZiWei Yuan
9b76cab1a5
Merge pull request #898 from kvcache-ai/develop-0.2.3post2
Release 0.2.3post2
2025-03-15 18:11:42 +08:00
liam
b5ef7c26dc 🔖 release v0.2.3post2 2025-03-15 18:04:10 +08:00
Azure
117a8d2f2a fix compilation 2025-03-14 19:49:20 +00:00
SkqLiao
0f1684c28d local chat for cicd test 2025-03-15 02:31:19 +08:00
Azure
3986e2d2cf
Merge pull request #178 from fxzjshm/hip
[Feat] Port to ROCm/HIP
2025-03-15 02:31:07 +08:00
Azure-Tang
e5b001d76f Update readme; Format code; Add example yaml. 2025-03-14 14:25:52 -04:00
Atream
a889288fc1 use compile for gate, slight performance improvement 2025-03-14 12:43:28 +00:00
Azure-Tang
ed8437413b merge main; Add torch q8 linear 2025-03-14 05:52:07 -04:00
Atream
6f43bbe55f fix-singleton 2025-03-14 04:16:53 +00:00
Lander-Hatsune
d166fb9f6e cpuinfer: filter repeated backend instantiation 2025-03-10 22:03:04 +08:00
Atream
09c043d8a6
Merge pull request #842 from BITcyman/fix-openai_chat_completion
[fix] thread context bug
2025-03-07 22:56:19 +08:00
BITcyman
08a8b553d6 [fix] thread context bug 2025-03-07 14:52:16 +00:00
Atream
f8c1821f1d
Update __init__.py 2025-03-07 22:08:48 +08:00
Atream
d453c320f1 fix flashinfer precision 2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
ZiWei Yuan
63b1c8525b
Merge pull request #820 from kvcache-ai/develop-0.2.3
Develop 0.2.3 ready to release
2025-03-06 14:46:09 +08:00
liam
8eeb6dd432 update compile option for avx512vpopcntdq 2025-03-06 12:18:04 +08:00
chenmz00
b2ba795cfd
fix: list models API
Fix the list models API to match the corresponding OpenAI API format.
2025-03-05 21:49:27 +08:00
liam
9c343b4f71 🔖 release v0.2.3 2025-03-05 20:24:11 +08:00
liam
848fe8ab97 release v0.2.3 2025-03-05 20:21:04 +08:00
Azure
d7becadcf7 Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3 2025-03-05 09:26:23 +00:00
Azure
662c1e4c14 small fix about max new token 2025-03-05 09:25:41 +00:00
liam
dc10480ef6 add humaneval support 2025-03-04 20:54:49 +08:00
Yi Pan
01755a60c0
fix: wrong shape in KLinearMarlin. 2025-03-03 17:34:45 +08:00
Atream
8963ae7817
Update __init__.py 2025-03-03 16:49:50 +08:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0 fix ollama api temperature bug 2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d fix typo for top_p 2025-03-01 20:15:36 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main 2025-03-01 23:24:10 +08:00
宁鹏涛
71286ec1c0
Update local_chat.py
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
2025-03-01 21:52:48 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00