mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-26 10:50:59 +00:00
[docs]: fix and add MiniMax-M2 tutorial images. (#1752)
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
This commit is contained in:
parent
be668074de
commit
63796374c1
3 changed files with 2 additions and 0 deletions
Binary file not shown.
|
Before Width: | Height: | Size: 73 KiB After Width: | Height: | Size: 75 KiB |
BIN
doc/assets/MiniMax-M2_speed.png
Normal file
BIN
doc/assets/MiniMax-M2_speed.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 146 KiB |
|
|
@ -168,6 +168,8 @@ The following benchmarks were measured with single concurrency (Prefill tps / De
|
|||
| 1 x RTX 5090 (32 GB) | 2 x AMD EPYC 9355 | PCIe 5.0 | 408 / 32.1 | 1196 / 31.4 | 2540 / 27.6 |
|
||||
| 2 x RTX 5090 (32 GB) | 2 x AMD EPYC 9355 | PCIe 5.0 | 414 / 35.9 | 1847 / 35.5 | 4007 / 33.1 |
|
||||
|
||||

|
||||
|
||||
### Comparison with llama.cpp
|
||||
|
||||
We benchmarked KT-Kernel + Sglang against llama.cpp to demonstrate the performance advantages of our CPU-GPU heterogeneous inference approach.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue