mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-07 21:19:51 +00:00
Merge pull request #1431 from kvcache-ai/support-kimi-k2
Support kimi k2
This commit is contained in:
commit
2303889709
2 changed files with 72 additions and 0 deletions
|
@ -23,6 +23,8 @@ Our vision for KTransformers is to serve as a flexible platform for experimentin
|
||||||
|
|
||||||
<h2 id="Updates">🔥 Updates</h2>
|
<h2 id="Updates">🔥 Updates</h2>
|
||||||
|
|
||||||
|
* **July 11, 2025**: Support Kimi-K2. ([Tutorial](./doc/en/Kimi-K2.md))
|
||||||
|
|
||||||
* **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
|
* **June 30, 2025**: Support 3-layer (GPU-CPU-Disk) [prefix cache](./doc/en/prefix_cache.md) reuse.
|
||||||
|
|
||||||
* **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)).
|
* **May 14, 2025**: Support Intel Arc GPU ([Tutorial](./doc/en/xpu.md)).
|
||||||
|
|
70
doc/en/Kimi-K2.md
Normal file
70
doc/en/Kimi-K2.md
Normal file
|
@ -0,0 +1,70 @@
|
||||||
|
# Kimi-K2 Support for KTransformers
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
We are very pleased to announce that Ktransformers now supports Kimi-K2.
|
||||||
|
|
||||||
|
### Model & Resource Links
|
||||||
|
|
||||||
|
- Official Kimi-K2 Release:
|
||||||
|
- http://xxx.com
|
||||||
|
- GGUF Format(quantized models):
|
||||||
|
- Coming soon
|
||||||
|
|
||||||
|
## Installation Guide
|
||||||
|
|
||||||
|
### 1. Resource Requirements
|
||||||
|
|
||||||
|
The model running with 384 Experts requires approximately 2 TB of memory and 14 GB of GPU memory.
|
||||||
|
|
||||||
|
### 2. Prepare Models
|
||||||
|
|
||||||
|
You can convert the fp8 to bf16.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# download fp8
|
||||||
|
huggingface-cli download --resume-download xxx
|
||||||
|
|
||||||
|
# convert fp8 to bf16
|
||||||
|
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
|
||||||
|
cd inference
|
||||||
|
python fp8_cast_bf16.py --input-fp8-hf-path <path_to_fp8> --output-bf16-hf-path <path_to_bf16>
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Install ktransformers
|
||||||
|
|
||||||
|
To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
|
||||||
|
|
||||||
|
### 4. Run Kimi-K2 Inference Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python ktransformers/server/main.py \
|
||||||
|
--port 10002 \
|
||||||
|
--model_path <path_to_safetensor_config> \
|
||||||
|
--gguf_path <path_to_bf16_files> \
|
||||||
|
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
|
||||||
|
--max_new_tokens 1024 \
|
||||||
|
--cache_lens 32768 \
|
||||||
|
--chunk_size 256 \
|
||||||
|
--max_batch_size 4 \
|
||||||
|
--backend_type balance_serve \
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Access server
|
||||||
|
|
||||||
|
```
|
||||||
|
curl -X POST http://localhost:10002/v1/chat/completions \
|
||||||
|
-H "accept: application/json" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "hello"}
|
||||||
|
],
|
||||||
|
"model": "Kimi-K2",
|
||||||
|
"temperature": 0.3,
|
||||||
|
"top_p": 1.0,
|
||||||
|
"stream": true
|
||||||
|
}'
|
||||||
|
```
|
Loading…
Add table
Reference in a new issue