Update DeepseekR1_V3_tutorial.md add long context

This commit is contained in:
Atream 2025-02-25 21:35:31 +08:00 committed by GitHub
parent d9b2895bd3
commit 03f8bc9f79
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -154,6 +154,18 @@ the output quality doesn't change. But the speed of decoding and prefill
is speed up which is inspiring. So our showcase makes use of this finding* is speed up which is inspiring. So our showcase makes use of this finding*
## How to Run ## How to Run
### V0.2.2 longer context
If you want to use long context(longer than 20K) for prefill, enable the matrix absorption MLA during the prefill phase, which will significantly reduce the size of the kv cache. Modify yaml file like this:
```
- match:
name: "^model\\.layers\\..*\\.self_attn$"
replace:
class: ktransformers.operators.attention.KDeepseekV2Attention # optimized MLA implementation
kwargs:
generate_device: "cuda"
prefill_device: "cuda"
absorb_for_prefill: True # change this to True to enable long context(prefill may slower).
```
### V0.2 & V0.2.1 Showcase ### V0.2 & V0.2.1 Showcase
#### Single socket version (32 cores) #### Single socket version (32 cores)
Our local_chat test command is: Our local_chat test command is: