Atream
|
477ac28a9c
|
fix-update-flashinfer_wrapper_local_chat
|
2025-02-25 12:47:31 +00:00 |
|
Azure
|
7e5962af3d
|
fix fp8 multi gpu; update FQA
|
2025-02-25 10:52:29 +00:00 |
|
ZiWei Yuan
|
89b55052b8
|
Merge pull request #659 from KMSorSMS/develop-0.2.2
📝 add benchmark.md
|
2025-02-25 17:47:05 +08:00 |
|
liam
|
1b5ac67fca
|
📝 add benchmark.md
|
2025-02-25 17:45:17 +08:00 |
|
ZiWei Yuan
|
1aa10e93b3
|
Merge pull request #658 from KMSorSMS/develop-0.2.2
⚡ update git ignore add docker dev container
|
2025-02-25 17:22:34 +08:00 |
|
liam
|
0ca0b99fab
|
⚡ update git ignore add docker dev container
|
2025-02-25 17:22:11 +08:00 |
|
Azure
|
5474be5299
|
Merge branch 'main' into develop-0.2.2
|
2025-02-25 09:04:22 +00:00 |
|
Azure
|
021822dd01
|
update FAQ
|
2025-02-25 09:02:32 +00:00 |
|
Atream
|
b443c7dfa2
|
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
|
2025-02-25 16:53:21 +08:00 |
|
Atream
|
f4c198bd42
|
support absorb for prefill long context
|
2025-02-25 08:52:02 +00:00 |
|
Azure
|
050b745a6e
|
Merge pull request #643 from Azure-Tang/support-fp8
[feat] Support fp8 linear kernel;
|
2025-02-25 16:22:12 +08:00 |
|
Azure
|
36fbeee341
|
Update doc
|
2025-02-25 08:21:18 +00:00 |
|
ceerrep
|
f639fbc19e
|
feat: basic api key support
|
2025-02-25 14:11:39 +08:00 |
|
Azure
|
4dc5518e4d
|
update fp8 kernel tutorial
|
2025-02-24 15:37:01 +00:00 |
|
Atream
|
7b2a6690ab
|
Merge pull request #608 from makllama/fix_musa_ext
musa: support bf16
|
2025-02-24 23:12:54 +08:00 |
|
Atream
|
6f9ea689a9
|
Merge pull request #645 from makllama/torch2.2
Ensure backward compatibility with PyTorch 2.2
|
2025-02-24 23:12:33 +08:00 |
|
Xiaodong Ye
|
f88c05a6f1
|
Ensure backward compatibility with Torch 2.2
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2025-02-24 21:55:30 +08:00 |
|
lazymio
|
07eb712a73
|
Left out
|
2025-02-24 21:51:14 +08:00 |
|
lazymio
|
91062a834f
|
Default values
|
2025-02-24 21:38:01 +08:00 |
|
lazymio
|
76487c4dcb
|
Revert repetition_penalty as it is not in API spec
|
2025-02-24 21:30:03 +08:00 |
|
lazymio
|
05ad288453
|
Also /chat/completions
|
2025-02-24 21:08:36 +08:00 |
|
lazymio
|
bf36547f98
|
Also allow repetition_penalty
|
2025-02-24 21:07:35 +08:00 |
|
lazymio
|
8704c09192
|
Allow temperature and top_p from requests
|
2025-02-24 21:01:33 +08:00 |
|
Azure
|
ca7366d2db
|
Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8
|
2025-02-24 11:58:10 +00:00 |
|
Azure
|
581a524f65
|
Add data loader to read special weights for fp8; Add special weight process script
|
2025-02-24 11:34:17 +00:00 |
|
Atream
|
e9b1216a9a
|
Merge branch 'main' into feat-absorb-for-long-prefill
|
2025-02-24 09:44:17 +00:00 |
|
Atream
|
4b5991e77e
|
Merge pull request #638 from kvcache-ai/feat-moonlight
fix KExpertsMarlin on GPU with out CUDA Graph
|
2025-02-24 17:32:05 +08:00 |
|
Atream
|
f327695079
|
fix KExpertsMarlin on GPU with out CUDA Graph
|
2025-02-24 09:30:54 +00:00 |
|
Yuhao Tsui
|
cea07d1998
|
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
This change explicitly clears CUDA cache during weight loading to mitigate memory fragmentation issues, particularly beneficial for low-VRAM GPUs.
|
2025-02-24 10:09:42 +08:00 |
|
akemimadoka
|
706e69f4fc
|
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
|
2025-02-24 01:37:50 +08:00 |
|
Atream
|
eb039b723d
|
Merge pull request #621 from kvcache-ai/feat-moonlight
support moonlight, use ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
|
2025-02-23 22:39:08 +08:00 |
|
Atream
|
f5f6c6b95d
|
update yaml
|
2025-02-23 14:33:58 +00:00 |
|
Atream
|
e8e02e5ccc
|
support Moonlight
|
2025-02-23 14:21:18 +00:00 |
|
DDong Jianwei
|
95d937c51d
|
tmp
|
2025-02-23 18:51:42 +08:00 |
|
Atream
|
006e8c6abc
|
remove causal mask
|
2025-02-23 07:40:47 +00:00 |
|
Atream
|
cdb6f896bb
|
Merge pull request #612 from kvcache-ai/fix-bf16-load
fix bf16 load, TODO: refactor cpu dequant
|
2025-02-23 15:37:23 +08:00 |
|
Atream
|
036ae25a89
|
fix bf16 load, TODO: refactor cpu dequant
|
2025-02-23 15:37:09 +08:00 |
|
Xiaodong Ye
|
18b1d18367
|
musa: support bf16
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2025-02-23 10:19:19 +08:00 |
|
Azure
|
7b7c6a657d
|
Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225'
|
2025-02-22 13:05:08 +00:00 |
|
Atream
|
94ab2de3b9
|
Merge pull request #523 from miaooo0000OOOO/main
optimize CMake multi core parallel
|
2025-02-22 17:38:18 +08:00 |
|
Atream
|
72d09f3f6e
|
Merge pull request #597 from kvcache-ai/feat-more-context
Feat more context
|
2025-02-22 17:17:09 +08:00 |
|
Atream
|
f7f1059873
|
fix merge bug, this branch also padding Marlin
|
2025-02-22 09:00:09 +00:00 |
|
Atream
|
e90896314c
|
Merge pull request #577 from JiamingMai/dev
Fix the link address in the doc install.md
|
2025-02-22 16:45:41 +08:00 |
|
Atream
|
954796123c
|
Merge pull request #582 from twobob/patch-1
Adjust the installation link to the correct section of docs
|
2025-02-22 16:44:48 +08:00 |
|
Atream
|
024009675e
|
Merge branch 'main' into feat-more-context
|
2025-02-22 06:17:39 +00:00 |
|
Atream
|
5ec33d046d
|
optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
|
2025-02-22 06:13:01 +00:00 |
|
_
|
5ed441a0f5
|
Update README.md
|
2025-02-21 14:15:50 +00:00 |
|
JiamingMai
|
45faddf668
|
fix the link addresses
|
2025-02-21 17:53:20 +08:00 |
|
Atream
|
7e1fe256c8
|
optimize GPU
|
2025-02-21 05:06:57 +00:00 |
|
Azure
|
25c5bddd08
|
Merge pull request #506 from makllama/musa
feat: Support Moore Threads GPU
|
2025-02-20 22:50:31 +08:00 |
|