fix fp8 multi gpu; update FQA

2026-04-28 03:39:48 +00:00 · 2025-02-25 10:52:29 +00:00 · 2025-02-25 10:52:29 +00:00 · 7e5962af3d
commit 7e5962af3d
parent 89b55052b8
2 changed files with 7 additions and 2 deletions
--- a/doc/en/FAQ.md
+++ b/doc/en/FAQ.md
@ -92,4 +92,8 @@ Traceback (most recent call last):
    next_token = torch.multinomial(probs, num_samples=1).squeeze(1)
 RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
 ```
-**SOLUTION**: The issue of running ktransformers on Ubuntu 22.04 is caused by the current system's g++ version being too old, and the pre-defined macros do not include avx_bf16. We have tested and confirmed that it works on g++ 11.4 in Ubuntu 22.04.
+**SOLUTION**: The issue of running ktransformers on Ubuntu 22.04 is caused by the current system's g++ version being too old, and the pre-defined macros do not include avx_bf16. We have tested and confirmed that it works on g++ 11.4 in Ubuntu 22.04.
+
+### Q: Using fp8 prefill very slow.
+
+The FP8 kernel is build by JIT, so the first run will be slow. The subsequent runs will be faster.