kvcache-ai-ktransformers/doc/en/prefix_cache.md at update-readme

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-07 04:59:55 +00:00

ouqingliang 90cff820cf update kvc disk path config.

2025-06-30 15:09:35 +00:00

1.4 KiB

Raw Permalink Blame History

Enabling Prefix Cache Mode in KTransformers

Balance serve now supports prefix cache reuse! To enable Prefix Cache Mode in KTransformers, you need to modify the configuration file and recompile the project.

Step 1: Modify the Configuration File

Edit the ./ktransformers/configs/config.yaml file with the following content (you can adjust the values according to your needs):

attn:
  page_size: 16 # Size of a page in KV Cache.
  chunk_size: 256
kvc2:
  gpu_only: false # Set to false to enable prefix cache mode (Disk + CPU + GPU KV storage)
  utilization_percentage: 1.0
  cpu_memory_size_GB: 500 # Amount of CPU memory allocated for KV Cache
  disk_path: /mnt/data/kvc # Path to store KV Cache on disk

Step 2: Update Submodules and Recompile

If this is your first time using prefix cache mode, please update the submodules first:

git submodule update --init --recursive # Update PhotonLibOS submodule

Then recompile the project:

# Install single NUMA dependencies
USE_BALANCE_SERVE=1  bash ./install.sh
# For those who have two cpu and 1T RAM（Dual NUMA）:
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh

Note

Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.

1.4 KiB Raw Permalink Blame History Unescape Escape

Enabling Prefix Cache Mode in KTransformers

Step 1: Modify the Configuration File

Step 2: Update Submodules and Recompile

Note

1.4 KiB

Raw Permalink Blame History