Update Kimi-K2 Readme

This commit is contained in:
chenxl 2025-07-12 12:51:00 +08:00
parent 4fb367542b
commit b5024f62a4

View file

@ -19,20 +19,13 @@ With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations
### 1. Resource Requirements ### 1. Resource Requirements
The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory. The model running with 384 Experts requires approximately 600 GB of memory and 14 GB of GPU memory.
### 2. Prepare Models ### 2. Prepare Models
You can convert the fp8 to bf16.
```bash ```bash
# download fp8 # download gguf
huggingface-cli download --resume-download xxx huggingface-cli download --resume-download KVCache-ai/Kimi-K2-Instruct-GGUF
# convert fp8 to bf16
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path <path_to_fp8> --output-bf16-hf-path <path_to_bf16>
``` ```
@ -46,7 +39,7 @@ To install KTransformers, follow the official [Installation Guide](https://kvcac
python ktransformers/server/main.py \ python ktransformers/server/main.py \
--port 10002 \ --port 10002 \
--model_path <path_to_safetensor_config> \ --model_path <path_to_safetensor_config> \
--gguf_path <path_to_bf16_files> \ --gguf_path <path_to_gguf_files> \
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \ --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
--max_new_tokens 1024 \ --max_new_tokens 1024 \
--cache_lens 32768 \ --cache_lens 32768 \