Merge pull request #1431 from kvcache-ai/support-kimi-k2

Support kimi k2
2025-09-07 21:19:51 +00:00 · 2025-07-11 09:36:01 +08:00 · 2025-07-11 09:36:01 +08:00 · 2303889709
commit 2303889709
parent 890b0f1622 cf79c93fae
2 changed files with 72 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -23,6 +23,8 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
 <h2 id="Updates">🔥 Updates</h2>
 * **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md))
 * **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
 * **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)).
--- a/doc/en/Kimi-K2.md
+++ b/doc/en/Kimi-K2.md
@ -0,0 +1,70 @@
 # Kimi-K2 Support for KTransformers
 ## Introduction
 ### Overview
 We are very pleased to announce that Ktransformers now supports Kimi-K2.
 ### Model & Resource Links
 - Official Kimi-K2 Release: 
  - http://xxx.com
 - GGUF Format(quantized models):
  - Coming soon
 ## Installation Guide
 ### 1. Resource Requirements
 The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory.
 ### 2. Prepare Models
 You can convert the fp8 to bf16.
 ```bash
 # download fp8
 huggingface-cli download --resume-download xxx
 # convert fp8 to bf16
 git clone https://github.com/deepseek-ai/DeepSeek-V3.git
 cd inference
 python fp8_cast_bf16.py --input-fp8-hf-path <path_to_fp8> --output-bf16-hf-path  <path_to_bf16>
 ```
 ### 3. Install ktransformers
 To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
 ### 4. Run Kimi-K2 Inference Server
 ```bash
 python ktransformers/server/main.py \
  --port 10002 \
  --model_path <path_to_safetensor_config> \
  --gguf_path <path_to_bf16_files> \
  --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
  --max_new_tokens 1024 \
  --cache_lens 32768 \
  --chunk_size 256 \
  --max_batch_size 4 \
  --backend_type balance_serve \
 ```
 ### 5. Access server
 ```
 curl -X POST http://localhost:10002/v1/chat/completions \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "hello"}
    ],
    "model": "Kimi-K2",
    "temperature": 0.3,
    "top_p": 1.0,
    "stream": true
  }'
 ```