mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-05-05 23:50:14 +00:00
update fp8 kernel tutorial
This commit is contained in:
parent
ca7366d2db
commit
4dc5518e4d
7 changed files with 46 additions and 5 deletions
|
|
@ -55,7 +55,7 @@ You have to set `--cpu_infer` to the number of cores you want to use. The more c
|
|||
|
||||
### Q: My DeepSeek-R1 model is not thinking.
|
||||
|
||||
According to DeepSeek, you need to enforce the model to initiate its response with "\<think>\n" at the beginning of every output by passing the arg `--force_think true `.
|
||||
According to DeepSeek, you need to enforce the model to initiate its response with "\<think>\n" at the beginning of every output by passing the arg `--force_think True `.
|
||||
|
||||
### Q: Loading gguf error
|
||||
|
||||
|
|
@ -63,9 +63,12 @@ Make sure you:
|
|||
1. Have the `gguf` file in the `--gguf_path` directory.
|
||||
2. The directory only contains gguf files from one model. If you have multiple models, you need to separate them into different directories.
|
||||
3. The folder name it self should not end with `.gguf`, eg. `Deep-gguf` is correct, `Deep.gguf` is wrong.
|
||||
4. The file itself is not corrupted; you can verify this by checking that the sha256sum matches the one from huggingface, modelscope, or hf-mirror.
|
||||
|
||||
### Q: Version `GLIBCXX_3.4.30' not found
|
||||
The detailed error:
|
||||
>ImportError: /mnt/data/miniconda3/envs/xxx/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/xxx/xxx/ktransformers/./cpuinfer_ext.cpython-312-x86_64-linux-gnu.so)
|
||||
|
||||
Running `conda install -c conda-forge libstdcxx-ng` can solve the problem.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@ Supported operators and their corresponding classes are as follows:
|
|||
| Linear | KTransformersLinear | KLinearMarlin | Marlin as backend |
|
||||
| | | KLinearTorch | pytorch as backend |
|
||||
| | | KLinearCPUInfer | llamafile as backend |
|
||||
| | | KLinearFP8 | Triton fp8_gemm kernel. Requires GPU be able to caluculate fp8 data |
|
||||
| experts | KTransformersExperts | KExpertsTorch | pytorch as backend |
|
||||
| | | KExpertsMarlin | Marlin as backend |
|
||||
| | | KExpertsCPU | llamafile as backend |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue