From 4845abf25e3568d20fd9adc4b784c300522fb9b7 Mon Sep 17 00:00:00 2001 From: "Li, Zonghang" <870644199@qq.com> Date: Fri, 11 Apr 2025 01:20:36 +0800 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b702dcc4..3d5f584f 100644 --- a/README.md +++ b/README.md @@ -3,12 +3,12 @@ ![prima](https://raw.githubusercontent.com/Lizonghang/prima.cpp/main/figures/prima-cpp-logo.png) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) -prima.cpp is a magic trick that lets you **run 70B-level LLMs on your everyday devices**β€”πŸ’» laptops, πŸ–₯️ desktops, πŸ“± phones, and tablets (GPU or no GPU, it’s all good). With it, you can run **QwQ-32B, Qwen 2.5-72B, Llama 3-70B, or DeepSeek R1 70B** right from your local home cluster! +prima.cpp is a **distributed implementation** of [llama.cpp](https://github.com/ggerganov/llama.cpp) that lets you **run 70B-level LLMs on your everyday devices**β€”πŸ’» laptops, πŸ–₯️ desktops, πŸ“± phones, and tablets (GPU or no GPU, it’s all good). With it, you can run **QwQ-32B, Qwen 2.5-72B, Llama 3-70B, or DeepSeek R1 70B** right from your local home cluster! Worried about OOM or your device stucking? Never again! prima.cpp keeps its **memory pressure below 10%**, you can run very large models while enjoying Tiktok (if you don't mind the inference speed). ## πŸš€ Performance -How about speed? Built upon [llama.cpp](https://github.com/ggerganov/llama.cpp), but it’s **15x faster!** πŸš€ On my poor devices, QwQ-32B generates 11 tokens per second, and Llama 3-70B generates 1.5 tokens per second. That's about the same speed as audiobook apps, from slow to fast speaking. We plan to power a **Home Siri** soon, then we can have private chats without privacy concerns. +How about speed? Built upon llama.cpp, but it’s **15x faster!** πŸš€ On my poor devices, QwQ-32B generates 11 tokens per second, and Llama 3-70B generates 1.5 tokens per second. That's about the same speed as audiobook apps, from slow to fast speaking. We plan to power a **Home Siri** soon, then we can have private chats without privacy concerns. **prima.cpp vs llama.cpp on QwQ 32B:**