From c05ebb74b1a04376cc4f7863a66efec1457bdede Mon Sep 17 00:00:00 2001 From: Azure Date: Wed, 26 Feb 2025 15:43:08 +0000 Subject: [PATCH] Update fp8 doc; Update install.md broken link --- doc/en/fp8_kernel.md | 20 +++++++++++--------- doc/en/install.md | 2 +- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/doc/en/fp8_kernel.md b/doc/en/fp8_kernel.md index 5237a5c..e76bae5 100644 --- a/doc/en/fp8_kernel.md +++ b/doc/en/fp8_kernel.md @@ -10,15 +10,17 @@ The DeepSeek-AI team provides FP8 safetensors for DeepSeek-R1/V3 models. We achi So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1. ## Key Features -✅ Hybrid Precision Architecture (FP8 + GGML) + +✅ Hybrid Precision Architecture (FP8 + GGML)
✅ Memory Optimization (~19GB VRAM usage) ## Quick Start ### Using Pre-Merged Weights -Pre-merged weights are available on Hugging Face: -[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3) +Pre-merged weights are available on Hugging Face:
+[KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-V3)
[KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid](https://huggingface.co/KVCache-ai/DeepSeek-R1) + > Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time. @@ -32,12 +34,12 @@ pip install -U huggingface_hub huggingface-cli download --resume-download KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid --local-dir ``` ### Using merge scripts -If you got local DeepSeek-R1/V3 fp8 safetensors and q4km gguf weights, you can merge them using the following scripts. +If you got local DeepSeek-R1/V3 fp8 safetensors and gguf weights(eg.q4km), you can merge them using the following scripts. ```shell -python convert_model.py \ +python merge_tensors/merge_safetensor_gguf.py \ --safetensor_path \ - --gguf_path \ + --gguf_path \ --output_path ``` @@ -60,15 +62,15 @@ python ktransformers/local_chat.py \ ## Notes -⚠️ Hardware Requirements +⚠️ Hardware Requirements
* Recommended minimum 19GB available VRAM for FP8 kernel. * Requires GPU with FP8 support (e.g., 4090) ⏳ First-Run Optimization JIT compilation causes longer initial execution (subsequent runs retain optimized speed). -🔄 Temporary Interface +🔄 Temporary Interface
Current weight loading implementation is provisional - will be refined in future versions -📁 Path Specification +📁 Path Specification
Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to `--gguf_path` \ No newline at end of file diff --git a/doc/en/install.md b/doc/en/install.md index 2a4a6af..3f0acf0 100644 --- a/doc/en/install.md +++ b/doc/en/install.md @@ -121,7 +121,7 @@ We provide a simple command-line local chat Python script that you can run for t mkdir DeepSeek-V2-Lite-Chat-GGUF cd DeepSeek-V2-Lite-Chat-GGUF -wget https://huggingface.co/mzwing/DeepSeek-V2-Lite-Chat-GGUF/resolve/main/DeepSeek-V2-Lite-Chat.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf +wget https://huggingface.co/mradermacher/DeepSeek-V2-Lite-GGUF/resolve/main/DeepSeek-V2-Lite.Q4_K_M.gguf -O DeepSeek-V2-Lite-Chat.Q4_K_M.gguf cd .. # Move to repo's root dir