diff --git a/README.md b/README.md index 44f5310..4f630f3 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,8 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin

🔥 Updates

+* **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md)) + * **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse. * **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)). diff --git a/doc/en/Kimi-K2.md b/doc/en/Kimi-K2.md new file mode 100644 index 0000000..fca6b5c --- /dev/null +++ b/doc/en/Kimi-K2.md @@ -0,0 +1,70 @@ +# Kimi-K2 Support for KTransformers + +## Introduction + +### Overview +We are very pleased to announce that Ktransformers now supports Kimi-K2. + +### Model & Resource Links + +- Official Kimi-K2 Release: + - http://xxx.com +- GGUF Format(quantized models): + - Coming soon + +## Installation Guide + +### 1. Resource Requirements + +The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory. + +### 2. Prepare Models + +You can convert the fp8 to bf16. + +```bash +# download fp8 +huggingface-cli download --resume-download xxx + +# convert fp8 to bf16 +git clone https://github.com/deepseek-ai/DeepSeek-V3.git +cd inference +python fp8_cast_bf16.py --input-fp8-hf-path --output-bf16-hf-path + +``` + +### 3. Install ktransformers + +To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html). + +### 4. Run Kimi-K2 Inference Server + +```bash +python ktransformers/server/main.py \ + --port 10002 \ + --model_path \ + --gguf_path \ + --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \ + --max_new_tokens 1024 \ + --cache_lens 32768 \ + --chunk_size 256 \ + --max_batch_size 4 \ + --backend_type balance_serve \ +``` + +### 5. Access server + +``` +curl -X POST http://localhost:10002/v1/chat/completions \ + -H "accept: application/json" \ + -H "Content-Type: application/json" \ + -d '{ + "messages": [ + {"role": "user", "content": "hello"} + ], + "model": "Kimi-K2", + "temperature": 0.3, + "top_p": 1.0, + "stream": true + }' +```