mirror of
https://github.com/Lizonghang/prima.cpp.git
synced 2025-09-07 14:09:02 +00:00
Update README.md
This commit is contained in:
parent
87eb1aa7ec
commit
35adc76337
1 changed files with 2 additions and 12 deletions
14
README.md
14
README.md
|
@ -9,19 +9,9 @@ Worried about OOM or your device stucking? Never again! prima.cpp keeps its **me
|
|||
|
||||
How about speed? prima.cpp is built on [llama.cpp](https://github.com/ggerganov/llama.cpp), but it’s **15x faster!** 🚀 On my poor devices, QwQ-32B generates 11 tokens per second, and Llama 3-70B generates 1.5 tokens per second. That's about the same speed as audiobook apps, from slow to fast speaking. We plan to power a **Home Siri** soon, then we can have private chats without privacy concerns.
|
||||
|
||||
<figure>
|
||||
<video width="960" height="540" controls>
|
||||
<source src="https://github.com/Lizonghang/prima.cpp/raw/main/figures/qwq%2032b.mp4" type="video/mp4">
|
||||
</video>
|
||||
<figcaption>Prima.cpp vs llama.cpp on QwQ 32B.</figcaption>
|
||||
</figure>
|
||||
https://github.com/Lizonghang/prima.cpp/raw/main/figures/qwq%2032b.mp4
|
||||
|
||||
<figure>
|
||||
<video width="960" height="540" controls>
|
||||
<source src="https://github.com/Lizonghang/prima.cpp/raw/main/figures/qwq%2032b.mp4" type="video/mp4">
|
||||
</video>
|
||||
<figcaption>Prima.cpp vs llama.cpp on DeepSeek R1 70B</figcaption>
|
||||
</figure>
|
||||
https://github.com/Lizonghang/prima.cpp/raw/main/figures/qwq%2032b.mp4
|
||||
|
||||
And, if your devices are more powerful, you could unlock even more possibilities, like running LLM agents right in your home! If you do, we’d love to hear about it, just share your cluster setup and token throughput with us!
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue