mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-28 03:39:48 +00:00
fix fp8 multi gpu; update FQA
This commit is contained in:
parent
89b55052b8
commit
7e5962af3d
2 changed files with 7 additions and 2 deletions
|
|
@ -92,4 +92,8 @@ Traceback (most recent call last):
|
|||
next_token = torch.multinomial(probs, num_samples=1).squeeze(1)
|
||||
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
|
||||
```
|
||||
**SOLUTION**: The issue of running ktransformers on Ubuntu 22.04 is caused by the current system's g++ version being too old, and the pre-defined macros do not include avx_bf16. We have tested and confirmed that it works on g++ 11.4 in Ubuntu 22.04.
|
||||
**SOLUTION**: The issue of running ktransformers on Ubuntu 22.04 is caused by the current system's g++ version being too old, and the pre-defined macros do not include avx_bf16. We have tested and confirmed that it works on g++ 11.4 in Ubuntu 22.04.
|
||||
|
||||
### Q: Using fp8 prefill very slow.
|
||||
|
||||
The FP8 kernel is build by JIT, so the first run will be slow. The subsequent runs will be faster.
|
||||
Loading…
Add table
Add a link
Reference in a new issue