Commit graph

323 commits

Author SHA1 Message Date
Atream
44599229cd
Update gate.py 2025-03-19 12:16:48 +08:00
Atream
114995355b
fix-gate-compile 2025-03-19 11:27:18 +08:00
Atream
167506b779
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml 2025-03-17 17:05:01 +08:00
Atream
c9a0c44213
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml 2025-03-17 17:03:52 +08:00
liam
19f058ec9e 🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml 2025-03-17 15:08:12 +08:00
Azure-Tang
85c32fdd10 Fix rocm example yaml 2025-03-15 22:27:02 -04:00
Azure-Tang
4a31237346 fix rocm compilation 2025-03-15 12:34:03 -04:00
Atream
3934b9dfc1 rollback-triton-prefill 2025-03-15 14:21:21 +00:00
ZiWei Yuan
9b76cab1a5
Merge pull request #898 from kvcache-ai/develop-0.2.3post2
Release 0.2.3post2
2025-03-15 18:11:42 +08:00
liam
b5ef7c26dc 🔖 release v0.2.3post2 2025-03-15 18:04:10 +08:00
Azure
117a8d2f2a fix compilation 2025-03-14 19:49:20 +00:00
SkqLiao
0f1684c28d local chat for cicd test 2025-03-15 02:31:19 +08:00
Azure
3986e2d2cf
Merge pull request #178 from fxzjshm/hip
[Feat] Port to ROCm/HIP
2025-03-15 02:31:07 +08:00
Azure-Tang
e5b001d76f Update readme; Format code; Add example yaml. 2025-03-14 14:25:52 -04:00
Atream
a889288fc1 use compile for gate, slight performance improvement 2025-03-14 12:43:28 +00:00
Azure-Tang
ed8437413b merge main; Add torch q8 linear 2025-03-14 05:52:07 -04:00
Atream
6f43bbe55f fix-singleton 2025-03-14 04:16:53 +00:00
Lander-Hatsune
d166fb9f6e cpuinfer: filter repeated backend instantiation 2025-03-10 22:03:04 +08:00
Yuhao Tsui
e5694f91c0
Merge branch 'kvcache-ai:main' into main 2025-03-10 09:10:28 +08:00
Atream
09c043d8a6
Merge pull request #842 from BITcyman/fix-openai_chat_completion
[fix] thread context bug
2025-03-07 22:56:19 +08:00
BITcyman
08a8b553d6 [fix] thread context bug 2025-03-07 14:52:16 +00:00
Atream
f8c1821f1d
Update __init__.py 2025-03-07 22:08:48 +08:00
Atream
d453c320f1 fix flashinfer precision 2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
ZiWei Yuan
63b1c8525b
Merge pull request #820 from kvcache-ai/develop-0.2.3
Develop 0.2.3 ready to release
2025-03-06 14:46:09 +08:00
liam
8eeb6dd432 update compile option for avx512vpopcntdq 2025-03-06 12:18:04 +08:00
Yuhao Tsui
d050d8655f
Update completions.py 2025-03-06 11:16:33 +08:00
chenmz00
b2ba795cfd
fix: list models API
Fix the list models API to match the corresponding OpenAI API format.
2025-03-05 21:49:27 +08:00
liam
9c343b4f71 🔖 release v0.2.3 2025-03-05 20:24:11 +08:00
liam
848fe8ab97 release v0.2.3 2025-03-05 20:21:04 +08:00
Azure
d7becadcf7 Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3 2025-03-05 09:26:23 +00:00
Azure
662c1e4c14 small fix about max new token 2025-03-05 09:25:41 +00:00
liam
dc10480ef6 add humaneval support 2025-03-04 20:54:49 +08:00
Yi Pan
01755a60c0
fix: wrong shape in KLinearMarlin. 2025-03-03 17:34:45 +08:00
Atream
8963ae7817
Update __init__.py 2025-03-03 16:49:50 +08:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0 fix ollama api temperature bug 2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d fix typo for top_p 2025-03-01 20:15:36 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main 2025-03-01 23:24:10 +08:00
宁鹏涛
71286ec1c0
Update local_chat.py
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
2025-03-01 21:52:48 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
Shuaiyi
a34a25d5cc Delete unused code 2025-02-27 13:18:19 +00:00
wang jiahao
7a19f3b781
Merge pull request #721 from kvcache-ai/fix_temperature
fix temperature
2025-02-27 21:01:21 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
Atream
85e2cc7bf4
Merge pull request #719 from kvcache-ai/fix-use-generation-json
use generation config from json file in official repo
2025-02-27 19:49:41 +08:00
Atream
e645d84794 use generation config from json file in official repo 2025-02-27 11:48:34 +00:00
lazymio
b121ca4df8
Fix according to upstream changes 2025-02-27 18:11:35 +08:00