From 8ed8eb2a9e820b39c5d2d88110dd200bbd26ef00 Mon Sep 17 00:00:00 2001
From: Atream <80757050+Atream@users.noreply.github.com>
Date: Sat, 15 Feb 2025 23:27:35 +0800
Subject: [PATCH] Update FAQ.md

---
 doc/en/FAQ.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/en/FAQ.md b/doc/en/FAQ.md
index 75e5e10..e738a29 100644
--- a/doc/en/FAQ.md
+++ b/doc/en/FAQ.md
@@ -25,7 +25,7 @@ from-https://github.com/kvcache-ai/ktransformers/issues/129#issue-2842799552
    1. local_chat.py: You can increase the context window size by setting `--max_new_tokens` to a larger value.
    2. server: Increase the `--cache_lens' to a larger value.
 2. Move more weights to the GPU.
-    Refer to the ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-marlin.yaml
+    Refer to the ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
     ```yaml
     - match:
        name: "^model\\.layers\\.([4-10])\\.mlp\\.experts$" # inject experts in layer 4~10 as marlin expert
@@ -39,6 +39,8 @@ from-https://github.com/kvcache-ai/ktransformers/issues/129#issue-2842799552
     You can modify layer as you want, eg. `name: "^model\\.layers\\.([4-10])\\.mlp\\.experts$"` to `name: "^model\\.layers\\.([4-12])\\.mlp\\.experts$"` to move more weights to the GPU.
 
     > Note: The first matched rule in yaml will be applied. For example, if you have two rules that match the same layer, only the first rule's replacement will be valid.
+    > Note：Currently, executing experts on the GPU will conflict with CUDA Graph. Without CUDA Graph, there will be a significant slowdown. Therefore, unless you have a substantial amount of VRAM (placing a single layer of experts for DeepSeek-V3/R1 on the GPU requires at least 5.6GB of VRAM), we do not recommend enabling this feature. We are actively working on optimization.
+    > Note KExpertsTorch is untested.
 
 
 ### Q: If I don't have enough VRAM, but I have multiple GPUs, how can I utilize them?