TOCEN
|
15dddc20c0
|
处理检视意见
|
2025-10-23 11:28:42 +08:00 |
|
TOCEN
|
bd0ae78d9d
|
fix:npu运行报错修复
|
2025-10-21 21:54:59 +08:00 |
|
root
|
a40ecaa64a
|
合并fix some bugs
|
2025-10-20 12:34:36 +00:00 |
|
TOCEN
|
7be6dfa1d6
|
fix:修复balance_server tp=1 不开图下沉报错
|
2025-09-22 20:52:07 +08:00 |
|
无脸男
|
9299c25e43
|
optimize.py
|
2025-09-08 14:48:54 +08:00 |
|
rnwang04
|
142fb7ce6c
|
Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first
|
2025-05-14 19:37:27 +00:00 |
|
qiyuxinlin
|
c6aa379de2
|
support safetensor load, delete architectures argument
|
2025-05-09 10:38:29 +00:00 |
|
Atream
|
5ec33d046d
|
optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
|
2025-02-22 06:13:01 +00:00 |
|
Atream
|
412055d450
|
[feature] experts can be injected using CPUInfer
[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority
|
2024-08-14 16:10:54 +08:00 |
|
chenxl
|
f5f79f5c0e
|
[ADD] support multi-gpu qlen>1 q5_k
|
2024-08-12 11:41:26 +00:00 |
|
chenxl
|
18c42e67df
|
Initial commit
|
2024-07-27 16:06:58 +08:00 |
|