add prefix cache documentation

2025-09-08 05:29:29 +00:00 · 2025-06-28 07:13:33 +00:00 · 2025-06-28 07:13:33 +00:00 · cc822df65d
commit cc822df65d
parent 4d51831316
1 changed files with 34 additions and 0 deletions
--- a/doc/en/prefix_cache.md
+++ b/doc/en/prefix_cache.md
@ -0,0 +1,34 @@
 ## Enabling Prefix Cache Mode in KTransformers
 To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
 ### Step 1: Modify the Configuration File
 Edit the `./ktransformers/configs/config.yaml` file with the following content (you can adjust the values according to your needs):
 ```yaml
 attn:
  page_size: 16 # Size of a page in KV Cache.
  chunk_size: 256
 kvc2:
  gpu_only: false # Set to false to enable prefix cache mode (Disk + CPU + GPU KV storage)
  utilization_percentage: 1.0
  cpu_memory_size_GB: 500 # Amount of CPU memory allocated for KV Cache
 ```
 ### Step 2: Update Submodules and Recompile
 If this is your first time using prefix cache mode, please update the submodules first:
 ```bash
 git submodule update --init --recursive # Update PhotonLibOS submodule
 ```
 Then recompile the project:
 ```bash
 # Install single NUMA dependencies
 USE_BALANCE_SERVE=1  bash ./install.sh
 # For those who have two cpu and 1T RAM（Dual NUMA）:
 USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
 ```