mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-28 11:49:51 +00:00
⚡ update v0.3 preview
This commit is contained in:
parent
6dd4fa0e87
commit
fd481af193
2 changed files with 7 additions and 1 deletions
|
|
@ -47,6 +47,12 @@ The main acceleration comes from
|
|||
- Intel AMX instruction set and our specially designed cache friendly memory layout
|
||||
- Expert selection strategy that selects fewer experts based on offline profile results of out of domain data
|
||||
|
||||
|
||||
*From our research on DeepSeekV2, DeepSeekV3 and DeepSeekR1,
|
||||
when we slightly decrease the activation experts num in inference,
|
||||
the output quality doesn't change,But the speed of decoding and prefill
|
||||
is speed up which is inspiring. So our showcase makes use of this finding*
|
||||
|
||||
## how to run
|
||||
### v0.2 showcase
|
||||
#### single socket version(32 cores)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue