Atream
1d5d5faef6
Merge pull request #626 from cyhasuka/main
...
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
2025-02-27 12:13:10 +08:00
Atream
8db6a4d402
Merge branch 'main' into main
2025-02-27 12:12:32 +08:00
wang jiahao
3c8c580580
Merge pull request #691 from swu-hyk/ollama_api_chat
...
feat:implementation of chat routing for Ollama
2025-02-27 11:17:48 +08:00
Azure
ca93cf7548
Merge pull request #702 from Azure-Tang/update-readme
...
[UPDATE] Update documents.
2025-02-26 23:45:24 +08:00
Azure
c05ebb74b1
Update fp8 doc; Update install.md broken link
2025-02-26 15:43:08 +00:00
Atream
3ebe17eb63
Merge pull request #699 from kvcache-ai/Atream-patch-1
...
Update DeepseekR1_V3_tutorial.md
2025-02-26 22:04:45 +08:00
Atream
369f4d917d
Update DeepseekR1_V3_tutorial.md
2025-02-26 22:04:29 +08:00
Atream
9650893adc
Merge pull request #697 from kvcache-ai/fix-yaml
...
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-02-26 21:54:01 +08:00
Atream
90eb87b3fc
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-02-26 21:53:50 +08:00
swu-hyk
ec7e912fee
modify
2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25
implementation of chat routing for Ollama
2025-02-26 17:05:00 +08:00
Chen Hongtao
9660b2cc1e
Merge pull request #685 from vproxy-tools/main
...
fix numa cpu distribution
2025-02-26 15:35:19 +08:00
ZiWei Yuan
e7ebb26370
Merge pull request #684 from KMSorSMS/main
...
fix dockerfile in devcontainer and fix expert torch
2025-02-26 15:06:51 +08:00
liam
ffb86c66e3
⚡ fix experts torch
2025-02-26 15:04:40 +08:00
liam
de082f141c
⚡ fix cd error
2025-02-26 14:54:47 +08:00
wkgcass
b2bff17775
fix numa cpu distribution
...
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
Azure
99f6e42113
Merge pull request #668 from KMSorSMS/main
...
📝 update benchmark.md
2025-02-26 00:21:09 +08:00
liam
3ad12751cf
📝 update more detail and fix typo
2025-02-26 00:17:02 +08:00
Azure
31bc990677
Merge pull request #667 from Azure-Tang/update-readme
...
[update] Update doc.
2025-02-26 00:01:46 +08:00
liam
05339ad0ef
📝 update benchmark.md
2025-02-25 23:57:58 +08:00
Azure
bb6920ed72
update doc
2025-02-25 15:46:15 +00:00
ZiWei Yuan
9c71bcb0bb
Merge pull request #665 from KMSorSMS/v0.2.2rc1
...
⚡ release v0.2.2rc1
2025-02-25 22:07:19 +08:00
liam
ddf3339339
⚡ release v0.2.2rc1
2025-02-25 22:06:36 +08:00
Azure
8333a4d874
Merge pull request #663 from kvcache-ai/develop-0.2.2
...
[release] Release 0.2.2rc.
2025-02-25 21:47:36 +08:00
Azure
c6e4e1c3c5
Merge pull request #662 from Azure-Tang/support-fp8
...
[update] Update readme.
2025-02-25 21:45:19 +08:00
Azure
91c1619296
Merge branch 'develop-0.2.2' into support-fp8
...
Update README.md
2025-02-25 13:43:26 +00:00
Atream
13974eb264
Update DeepseekR1_V3_tutorial.md
2025-02-25 21:36:52 +08:00
Atream
03f8bc9f79
Update DeepseekR1_V3_tutorial.md add long context
2025-02-25 21:35:31 +08:00
Azure
2c0cce90d0
add fp8 multi gpu yaml example
2025-02-25 13:32:09 +00:00
Atream
d9b2895bd3
Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2
2025-02-25 12:47:48 +00:00
Atream
477ac28a9c
fix-update-flashinfer_wrapper_local_chat
2025-02-25 12:47:31 +00:00
Azure
7e5962af3d
fix fp8 multi gpu; update FQA
2025-02-25 10:52:29 +00:00
ZiWei Yuan
89b55052b8
Merge pull request #659 from KMSorSMS/develop-0.2.2
...
📝 add benchmark.md
2025-02-25 17:47:05 +08:00
liam
1b5ac67fca
📝 add benchmark.md
2025-02-25 17:45:17 +08:00
ZiWei Yuan
1aa10e93b3
Merge pull request #658 from KMSorSMS/develop-0.2.2
...
⚡ update git ignore add docker dev container
2025-02-25 17:22:34 +08:00
liam
0ca0b99fab
⚡ update git ignore add docker dev container
2025-02-25 17:22:11 +08:00
Azure
5474be5299
Merge branch 'main' into develop-0.2.2
2025-02-25 09:04:22 +00:00
Azure
021822dd01
update FAQ
2025-02-25 09:02:32 +00:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
...
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42
support absorb for prefill long context
2025-02-25 08:52:02 +00:00
Azure
050b745a6e
Merge pull request #643 from Azure-Tang/support-fp8
...
[feat] Support fp8 linear kernel;
2025-02-25 16:22:12 +08:00
Azure
36fbeee341
Update doc
2025-02-25 08:21:18 +00:00
Azure
4dc5518e4d
update fp8 kernel tutorial
2025-02-24 15:37:01 +00:00
Atream
7b2a6690ab
Merge pull request #608 from makllama/fix_musa_ext
...
musa: support bf16
2025-02-24 23:12:54 +08:00
Atream
6f9ea689a9
Merge pull request #645 from makllama/torch2.2
...
Ensure backward compatibility with PyTorch 2.2
2025-02-24 23:12:33 +08:00
Xiaodong Ye
f88c05a6f1
Ensure backward compatibility with Torch 2.2
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-24 21:55:30 +08:00
Azure
ca7366d2db
Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8
2025-02-24 11:58:10 +00:00
Azure
581a524f65
Add data loader to read special weights for fp8; Add special weight process script
2025-02-24 11:34:17 +00:00
Atream
e9b1216a9a
Merge branch 'main' into feat-absorb-for-long-prefill
2025-02-24 09:44:17 +00:00
Atream
4b5991e77e
Merge pull request #638 from kvcache-ai/feat-moonlight
...
fix KExpertsMarlin on GPU with out CUDA Graph
2025-02-24 17:32:05 +08:00