Atream
|
5ec33d046d
|
optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
|
2025-02-22 06:13:01 +00:00 |
|
TangJingqi
|
8747c099f2
|
update yaml example; update version idx; update docker file
|
2024-08-29 22:39:20 +08:00 |
|
chenxl
|
4d1d561d28
|
[feature] release 0.1.3
|
2024-08-28 16:11:43 +00:00 |
|
TangJingqi
|
c47205dce9
|
fix name
|
2024-08-15 11:25:12 +08:00 |
|
TangJingqi
|
67043b4b5c
|
[fix] format classes and files name
|
2024-08-15 10:44:59 +08:00 |
|
Atream
|
412055d450
|
[feature] experts can be injected using CPUInfer
[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority
|
2024-08-14 16:10:54 +08:00 |
|
chenxl
|
f5f79f5c0e
|
[ADD] support multi-gpu qlen>1 q5_k
|
2024-08-12 11:41:26 +00:00 |
|