mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-04-28 11:49:51 +00:00
70 lines
2 KiB
Markdown
70 lines
2 KiB
Markdown
# Kimi-K2 Support for KTransformers
|
|
|
|
## Introduction
|
|
|
|
### Overview
|
|
We are very pleased to announce that Ktransformers now supports Kimi-K2 and Kimi-K2-0905.
|
|
|
|
On a single-socket CPU with one consumer-grade GPU, running the Q4_K_M model yields roughly 10 TPS and requires about 600 GB of DRAM.
|
|
With a dual-socket CPU and sufficient system memory, enabling NUMA optimizations increases performance to about 14 TPS.
|
|
|
|
### Model & Resource Links
|
|
|
|
- Official Kimi-K2 Release:
|
|
- https://huggingface.co/collections/moonshotai/kimi-k2-6871243b990f2af5ba60617d
|
|
- GGUF Format(quantized models):
|
|
- https://huggingface.co/KVCache-ai/Kimi-K2-Instruct-GGUF
|
|
- Official Kimi-K2-0905 Release:
|
|
- https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905
|
|
- GGUF Format(quantized models):
|
|
- https://huggingface.co/KVCache-ai/Kimi-K2-Instruct-0905-GGUF
|
|
|
|
## Installation Guide
|
|
|
|
### 1. Resource Requirements
|
|
|
|
The model running with 384 Experts requires approximately 600 GB of memory and 14 GB of GPU memory.
|
|
|
|
### 2. Prepare Models
|
|
|
|
```bash
|
|
# download gguf
|
|
huggingface-cli download --resume-download KVCache-ai/Kimi-K2-Instruct-GGUF
|
|
|
|
```
|
|
|
|
### 3. Install ktransformers
|
|
|
|
To install KTransformers, follow the official [Installation Guide](https://kvcache-ai.github.io/ktransformers/en/install.html).
|
|
|
|
### 4. Run Kimi-K2 Inference Server
|
|
|
|
```bash
|
|
python ktransformers/server/main.py \
|
|
--port 10002 \
|
|
--model_path <path_to_safetensor_config> \
|
|
--gguf_path <path_to_gguf_files> \
|
|
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
|
|
--max_new_tokens 1024 \
|
|
--cache_lens 32768 \
|
|
--chunk_size 256 \
|
|
--max_batch_size 4 \
|
|
--backend_type balance_serve \
|
|
```
|
|
|
|
### 5. Access server
|
|
|
|
```
|
|
curl -X POST http://localhost:10002/v1/chat/completions \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"messages": [
|
|
{"role": "user", "content": "hello"}
|
|
],
|
|
"model": "Kimi-K2",
|
|
"temperature": 0.3,
|
|
"top_p": 1.0,
|
|
"stream": true
|
|
}'
|
|
```
|