Update README.md

2025-09-06 16:19:02 +00:00 · 2025-04-11 01:41:43 +08:00 · 2025-04-11 01:41:43 +08:00 · 6d13836c44
commit 6d13836c44
parent 4845abf25e
1 changed files with 2 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -59,6 +59,8 @@ And, if your devices are more powerful, you could unlock even more possibilities
 | DeepSeek-R1-Distill-Llama-70B    | 10978 ms      | OOM           | -          | **724 ms**    |
 | Qwen-2.5-72B                     | 12227 ms      | OOM           | -          | **867 ms**    |

+> As video recording consumes some RAM, prima.cpp proactively reduces memory usage, resulting in slightly higher latency in the video compared to the table.
+
 > In current implementation, each device is assigned at least one model layer. For example, this leads to a 1:1:29:1 split for Llama 3-8B, which makes prima.cpp less efficient. In future updates, we will have a 0:0:32:0 split and idle devices removed, then llama.cpp would become a special case of prima.cpp when serving small models.

 ## 🔑 Key Features