rnwang04
|
142fb7ce6c
|
Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first
|
2025-05-14 19:37:27 +00:00 |
|
qiyuxinlin
|
c6aa379de2
|
support safetensor load, delete architectures argument
|
2025-05-09 10:38:29 +00:00 |
|
Atream
|
5ec33d046d
|
optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
|
2025-02-22 06:13:01 +00:00 |
|
Atream
|
412055d450
|
[feature] experts can be injected using CPUInfer
[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority
|
2024-08-14 16:10:54 +08:00 |
|
chenxl
|
f5f79f5c0e
|
[ADD] support multi-gpu qlen>1 q5_k
|
2024-08-12 11:41:26 +00:00 |
|
chenxl
|
18c42e67df
|
Initial commit
|
2024-07-27 16:06:58 +08:00 |
|