mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-04 19:50:04 +00:00
Update fp8 doc; Update install.md broken link
This commit is contained in:
parent
bb6920ed72
commit
c05ebb74b1
2 changed files with 12 additions and 10 deletions
|
@ -10,15 +10,17 @@ The DeepSeek-AI team provides FP8 safetensors for DeepSeek-R1/V3 models. We achi
|
||||||
So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1.
|
So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1.
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
✅ Hybrid Precision Architecture (FP8 + GGML)
|
|
||||||
|
✅ Hybrid Precision Architecture (FP8 + GGML)<br>
|
||||||
✅ Memory Optimization (~19GB VRAM usage)
|
✅ Memory Optimization (~19GB VRAM usage)
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
### Using Pre-Merged Weights
|
### Using Pre-Merged Weights
|
||||||
|
|
||||||
Pre-merged weights are available on Hugging Face:
|
Pre-merged weights are available on Hugging Face:<br>
|
||||||
[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3)
|
[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3)<br>
|
||||||
[KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-R1)
|
[KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-R1)
|
||||||
|
|
||||||
> Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time.
|
> Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time.
|
||||||
|
|
||||||
|
|
||||||
|
@ -32,12 +34,12 @@ pip install -U huggingface_hub
|
||||||
huggingface-cli download --resume-download KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid --local-dir <local_dir>
|
huggingface-cli download --resume-download KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid --local-dir <local_dir>
|
||||||
```
|
```
|
||||||
### Using merge scripts
|
### Using merge scripts
|
||||||
If you got local DeepSeek-R1/V3 fp8 safetensors and q4km gguf weights, you can merge them using the following scripts.
|
If you got local DeepSeek-R1/V3 fp8 safetensors and gguf weights(eg.q4km), you can merge them using the following scripts.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python convert_model.py \
|
python merge_tensors/merge_safetensor_gguf.py \
|
||||||
--safetensor_path <fp8_safetensor_path> \
|
--safetensor_path <fp8_safetensor_path> \
|
||||||
--gguf_path <q4km_gguf_folder_path> \
|
--gguf_path <gguf_folder_path> \
|
||||||
--output_path <merged_output_path>
|
--output_path <merged_output_path>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -60,15 +62,15 @@ python ktransformers/local_chat.py \
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
⚠️ Hardware Requirements
|
⚠️ Hardware Requirements<br>
|
||||||
* Recommended minimum 19GB available VRAM for FP8 kernel.
|
* Recommended minimum 19GB available VRAM for FP8 kernel.
|
||||||
* Requires GPU with FP8 support (e.g., 4090)
|
* Requires GPU with FP8 support (e.g., 4090)
|
||||||
|
|
||||||
⏳ First-Run Optimization
|
⏳ First-Run Optimization
|
||||||
JIT compilation causes longer initial execution (subsequent runs retain optimized speed).
|
JIT compilation causes longer initial execution (subsequent runs retain optimized speed).
|
||||||
|
|
||||||
🔄 Temporary Interface
|
🔄 Temporary Interface<br>
|
||||||
Current weight loading implementation is provisional - will be refined in future versions
|
Current weight loading implementation is provisional - will be refined in future versions
|
||||||
|
|
||||||
📁 Path Specification
|
📁 Path Specification<br>
|
||||||
Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to `--gguf_path`
|
Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to `--gguf_path`
|
|
@ -121,7 +121,7 @@ We provide a simple command-line local chat Python script that you can run for t
|
||||||
mkdir DeepSeek-V2-Lite-Chat-GGUF
|
mkdir DeepSeek-V2-Lite-Chat-GGUF
|
||||||
cd DeepSeek-V2-Lite-Chat-GGUF
|
cd DeepSeek-V2-Lite-Chat-GGUF
|
||||||
|
|
||||||
wget https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/resolve/main/DeepSeek-V2-Lite-Chat.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf
|
wget https://huggingface.co/mradermacher/DeepSeek-V2-Lite-GGUF/resolve/main/DeepSeek-V2-Lite.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf
|
||||||
|
|
||||||
cd .. # Move to repo's root dir
|
cd .. # Move to repo's root dir
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue