Update README.md

This commit is contained in:
Li, Zonghang 2025-04-07 22:14:44 +08:00 committed by GitHub
parent 98d73778a6
commit a3a1f4499b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -9,14 +9,14 @@ Worried about OOM or your device stucking? Never again! prima.cpp keeps its **me
How about speed? prima.cpp is built on [llama.cpp](https://github.com/ggerganov/llama.cpp), but its **15x faster!** 🚀 On my poor devices, QwQ-32B generates 11 tokens per second, and Llama 3-70B generates 1.5 tokens per second. That's about the same speed as audiobook apps, from slow to fast speaking. We plan to power a **Home Siri** soon, then we can have private chats without privacy concerns.
prima.cpp vs llama.cpp on QwQ 32B:
https://github.com/user-attachments/assets/9549290e-a56f-46e1-9549-74250c1e0f7e
prima.cpp vs llama.cpp on DeepSeek R1 70B:
**prima.cpp vs llama.cpp on QwQ 32B:**
https://github.com/user-attachments/assets/c048f0b2-3cdf-4a3f-a5c5-a9705edfca1e
**prima.cpp vs llama.cpp on DeepSeek R1 70B:**
https://github.com/user-attachments/assets/9549290e-a56f-46e1-9549-74250c1e0f7e
And, if your devices are more powerful, you could unlock even more possibilities, like running LLM agents right in your home! If you do, wed love to hear about it, just share your cluster setup and token throughput with us!
**Table 1:** Home cluster configurations.