djw
|
48bc6185b5
|
support smt and qlm4
|
2025-07-25 12:48:51 +00:00 |
|
djw
|
590fcb41cd
|
support smt and glm4
|
2025-07-24 12:31:01 +00:00 |
|
djw
|
613f0b7c37
|
support smt and glm4
|
2025-07-24 09:39:19 +00:00 |
|
djw
|
b66d96db97
|
support smt and glm4
|
2025-07-24 08:40:58 +00:00 |
|
rnwang04
|
adc0906967
|
add XPU support for qwen3moe local chat
|
2025-05-22 21:01:41 +08:00 |
|
rnwang04
|
142fb7ce6c
|
Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first
|
2025-05-14 19:37:27 +00:00 |
|
wang jiahao
|
8456222852
|
Merge pull request #1276 from kvcache-ai/support_load_safetensor
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support safetensor load, delete architectures argument
|
2025-05-12 11:10:26 +08:00 |
|
qiyuxinlin
|
c6aa379de2
|
support safetensor load, delete architectures argument
|
2025-05-09 10:38:29 +00:00 |
|
Atream
|
30eab48a75
|
Merge pull request #799 from aubreyli/cpu_offloading
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Restore CPU offloading capability
|
2025-05-09 00:38:54 -06:00 |
|
qiyuxinlin
|
48dfbc8f9f
|
change inject yaml
|
2025-04-29 08:09:39 +00:00 |
|
djw
|
33cbd47086
|
support qwen3
|
2025-04-28 18:15:35 +00:00 |
|
djw
|
68c2b2e6e6
|
support qwen3
|
2025-04-28 18:02:07 +00:00 |
|
djw
|
0da3792b27
|
support qwen3
|
2025-04-28 14:05:24 +00:00 |
|
djw
|
3f9bbf1181
|
support qwen3, dont speak human language
|
2025-04-28 08:44:47 +00:00 |
|
chenht2022
|
f3d842a0ca
|
support AMX
|
2025-04-25 14:47:16 +00:00 |
|
Azure-Tang
|
203b853c75
|
rm KMoEGateDeepSeekV3, fall back to KMoEGate
|
2025-04-01 07:13:05 +00:00 |
|
Atream
|
25cee5810e
|
add balance-serve, support concurrence
|
2025-03-31 22:55:32 +08:00 |
|
Aubrey Li
|
f4d52d1f0c
|
Restore CPU offloading capability
|
2025-03-21 10:04:31 +08:00 |
|
Atream
|
167506b779
|
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
|
2025-03-17 17:05:01 +08:00 |
|
Atream
|
c9a0c44213
|
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
|
2025-03-17 17:03:52 +08:00 |
|
liam
|
19f058ec9e
|
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
|
2025-03-17 15:08:12 +08:00 |
|
Azure-Tang
|
85c32fdd10
|
Fix rocm example yaml
|
2025-03-15 22:27:02 -04:00 |
|
Azure
|
3986e2d2cf
|
Merge pull request #178 from fxzjshm/hip
[Feat] Port to ROCm/HIP
|
2025-03-15 02:31:07 +08:00 |
|
Azure-Tang
|
e5b001d76f
|
Update readme; Format code; Add example yaml.
|
2025-03-14 14:25:52 -04:00 |
|
Atream
|
a889288fc1
|
use compile for gate, slight performance improvement
|
2025-03-14 12:43:28 +00:00 |
|
Azure-Tang
|
ed8437413b
|
merge main; Add torch q8 linear
|
2025-03-14 05:52:07 -04:00 |
|
Atream
|
90eb87b3fc
|
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
|
2025-02-26 21:53:50 +08:00 |
|
Azure
|
91c1619296
|
Merge branch 'develop-0.2.2' into support-fp8
Update README.md
|
2025-02-25 13:43:26 +00:00 |
|
Azure
|
2c0cce90d0
|
add fp8 multi gpu yaml example
|
2025-02-25 13:32:09 +00:00 |
|
Atream
|
477ac28a9c
|
fix-update-flashinfer_wrapper_local_chat
|
2025-02-25 12:47:31 +00:00 |
|
Atream
|
b443c7dfa2
|
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
|
2025-02-25 16:53:21 +08:00 |
|
Atream
|
f4c198bd42
|
support absorb for prefill long context
|
2025-02-25 08:52:02 +00:00 |
|
Azure
|
ca7366d2db
|
Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8
|
2025-02-24 11:58:10 +00:00 |
|
Azure
|
581a524f65
|
Add data loader to read special weights for fp8; Add special weight process script
|
2025-02-24 11:34:17 +00:00 |
|
Atream
|
f327695079
|
fix KExpertsMarlin on GPU with out CUDA Graph
|
2025-02-24 09:30:54 +00:00 |
|
Atream
|
f5f6c6b95d
|
update yaml
|
2025-02-23 14:33:58 +00:00 |
|
DDong Jianwei
|
95d937c51d
|
tmp
|
2025-02-23 18:51:42 +08:00 |
|
Atream
|
5ec33d046d
|
optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
|
2025-02-22 06:13:01 +00:00 |
|
Atream
|
7e1fe256c8
|
optimize GPU
|
2025-02-21 05:06:57 +00:00 |
|
Atream
|
c189d55bd1
|
toy support for experts on GPU, no CUDA Graph
|
2025-02-15 15:16:00 +00:00 |
|
Azure
|
b7653b9c4f
|
add V3/R1 8 gpu yaml example
|
2025-02-14 02:56:13 +00:00 |
|
MorphisZhang
|
aea4243712
|
Add optimization config for Deepseek V3/R1 with 4 GPUs
|
2025-02-13 16:32:28 +08:00 |
|
Azure
|
0564ac8465
|
update marlin expert example
|
2025-02-12 04:11:00 +00:00 |
|
liam
|
83401dbb3b
|
⚡ ready to publish
|
2025-02-10 12:29:23 +08:00 |
|
Azure
|
c4d9bc6670
|
support KExpertsMarlin backend
|
2025-02-07 05:57:40 +00:00 |
|
Azure
|
ee24a27001
|
update v3 single gpu rule yaml;
|
2025-02-04 16:14:35 +00:00 |
|
Azure
|
907251c743
|
done support deepseekv3
|
2025-02-04 15:53:38 +00:00 |
|
Azure
|
f748cd29f0
|
fix rope; update moegate
|
2025-02-01 18:05:45 +00:00 |
|
Azure
|
f873558a89
|
update rope calculation; update modeling.py; update gate for moe
|
2025-02-01 07:32:21 +00:00 |
|
Azure
|
476b1d8dc6
|
support deepseekv3; runable but have precition problem
|
2025-01-31 08:27:24 +00:00 |
|