kvcache-ai-ktransformers/doc/en/prefix_cache.md
2025-06-30 15:09:35 +00:00

39 lines
1.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Enabling Prefix Cache Mode in KTransformers
Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
### Step 1: Modify the Configuration File
Edit the `./ktransformers/configs/config.yaml` file with the following content (you can adjust the values according to your needs):
```yaml
attn:
page_size: 16 # Size of a page in KV Cache.
chunk_size: 256
kvc2:
gpu_only: false # Set to false to enable prefix cache mode (Disk + CPU + GPU KV storage)
utilization_percentage: 1.0
cpu_memory_size_GB: 500 # Amount of CPU memory allocated for KV Cache
disk_path: /mnt/data/kvc # Path to store KV Cache on disk
```
### Step 2: Update Submodules and Recompile
If this is your first time using prefix cache mode, please update the submodules first:
```bash
git submodule update --init --recursive # Update PhotonLibOS submodule
```
Then recompile the project:
```bash
# Install single NUMA dependencies
USE_BALANCE_SERVE=1 bash ./install.sh
# For those who have two cpu and 1T RAMDual NUMA:
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
```
## Note
Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.