Commit graph

877 commits

Author SHA1 Message Date
swu-hyk
ec7e912fee modify 2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25 implementation of chat routing for Ollama 2025-02-26 17:05:00 +08:00
Chen Hongtao
9660b2cc1e
Merge pull request #685 from vproxy-tools/main
fix numa cpu distribution
2025-02-26 15:35:19 +08:00
ZiWei Yuan
e7ebb26370
Merge pull request #684 from KMSorSMS/main
fix dockerfile in devcontainer and fix expert torch
2025-02-26 15:06:51 +08:00
liam
ffb86c66e3 fix experts torch 2025-02-26 15:04:40 +08:00
liam
de082f141c fix cd error 2025-02-26 14:54:47 +08:00
wkgcass
b2bff17775 fix numa cpu distribution
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
akemimadoka
8817777e11 Fix RuntimeError on Windows caused by integer overflow in np.prod 2025-02-26 03:50:12 +08:00
Azure
99f6e42113
Merge pull request #668 from KMSorSMS/main
📝 update benchmark.md
2025-02-26 00:21:09 +08:00
liam
3ad12751cf 📝 update more detail and fix typo 2025-02-26 00:17:02 +08:00
Azure
31bc990677
Merge pull request #667 from Azure-Tang/update-readme
[update] Update doc.
2025-02-26 00:01:46 +08:00
liam
05339ad0ef 📝 update benchmark.md 2025-02-25 23:57:58 +08:00
Azure
bb6920ed72 update doc 2025-02-25 15:46:15 +00:00
ZiWei Yuan
9c71bcb0bb
Merge pull request #665 from KMSorSMS/v0.2.2rc1
 release v0.2.2rc1
2025-02-25 22:07:19 +08:00
liam
ddf3339339 release v0.2.2rc1 2025-02-25 22:06:36 +08:00
Azure
8333a4d874
Merge pull request #663 from kvcache-ai/develop-0.2.2
[release]  Release 0.2.2rc.
2025-02-25 21:47:36 +08:00
Azure
c6e4e1c3c5
Merge pull request #662 from Azure-Tang/support-fp8
[update] Update readme.
2025-02-25 21:45:19 +08:00
Azure
91c1619296 Merge branch 'develop-0.2.2' into support-fp8
Update README.md
2025-02-25 13:43:26 +00:00
Atream
13974eb264
Update DeepseekR1_V3_tutorial.md 2025-02-25 21:36:52 +08:00
Atream
03f8bc9f79
Update DeepseekR1_V3_tutorial.md add long context 2025-02-25 21:35:31 +08:00
Azure
2c0cce90d0 add fp8 multi gpu yaml example 2025-02-25 13:32:09 +00:00
Atream
d9b2895bd3 Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2 2025-02-25 12:47:48 +00:00
Atream
477ac28a9c fix-update-flashinfer_wrapper_local_chat 2025-02-25 12:47:31 +00:00
Azure
7e5962af3d fix fp8 multi gpu; update FQA 2025-02-25 10:52:29 +00:00
ZiWei Yuan
89b55052b8
Merge pull request #659 from KMSorSMS/develop-0.2.2
📝 add benchmark.md
2025-02-25 17:47:05 +08:00
liam
1b5ac67fca 📝 add benchmark.md 2025-02-25 17:45:17 +08:00
ZiWei Yuan
1aa10e93b3
Merge pull request #658 from KMSorSMS/develop-0.2.2
 update git ignore add docker dev container
2025-02-25 17:22:34 +08:00
liam
0ca0b99fab update git ignore add docker dev container 2025-02-25 17:22:11 +08:00
Azure
5474be5299 Merge branch 'main' into develop-0.2.2 2025-02-25 09:04:22 +00:00
Azure
021822dd01 update FAQ 2025-02-25 09:02:32 +00:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42 support absorb for prefill long context 2025-02-25 08:52:02 +00:00
Azure
050b745a6e
Merge pull request #643 from Azure-Tang/support-fp8
[feat] Support fp8 linear kernel;
2025-02-25 16:22:12 +08:00
Azure
36fbeee341 Update doc 2025-02-25 08:21:18 +00:00
ceerrep
f639fbc19e feat: basic api key support 2025-02-25 14:11:39 +08:00
Azure
4dc5518e4d update fp8 kernel tutorial 2025-02-24 15:37:01 +00:00
Atream
7b2a6690ab
Merge pull request #608 from makllama/fix_musa_ext
musa: support bf16
2025-02-24 23:12:54 +08:00
Atream
6f9ea689a9
Merge pull request #645 from makllama/torch2.2
Ensure backward compatibility with PyTorch 2.2
2025-02-24 23:12:33 +08:00
Xiaodong Ye
f88c05a6f1 Ensure backward compatibility with Torch 2.2
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-24 21:55:30 +08:00
lazymio
07eb712a73
Left out 2025-02-24 21:51:14 +08:00
lazymio
91062a834f
Default values 2025-02-24 21:38:01 +08:00
lazymio
76487c4dcb
Revert repetition_penalty as it is not in API spec 2025-02-24 21:30:03 +08:00
lazymio
05ad288453
Also /chat/completions 2025-02-24 21:08:36 +08:00
lazymio
bf36547f98
Also allow repetition_penalty 2025-02-24 21:07:35 +08:00
lazymio
8704c09192
Allow temperature and top_p from requests 2025-02-24 21:01:33 +08:00
Azure
ca7366d2db Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8 2025-02-24 11:58:10 +00:00
Azure
581a524f65 Add data loader to read special weights for fp8; Add special weight process script 2025-02-24 11:34:17 +00:00
Atream
e9b1216a9a Merge branch 'main' into feat-absorb-for-long-prefill 2025-02-24 09:44:17 +00:00
Atream
4b5991e77e
Merge pull request #638 from kvcache-ai/feat-moonlight
fix KExpertsMarlin on GPU with out CUDA Graph
2025-02-24 17:32:05 +08:00
Atream
f327695079 fix KExpertsMarlin on GPU with out CUDA Graph 2025-02-24 09:30:54 +00:00