add prefix cache documentation

2025-09-07 04:59:55 +00:00 · 2025-06-28 07:13:33 +00:00 · 2025-06-28 07:13:33 +00:00 · cc822df65d
commit cc822df65d
parent 4d51831316
1 changed files with 34 additions and 0 deletions
--- a/doc/en/prefix_cache.md
+++ b/doc/en/prefix_cache.md
@ -0,0 +1,34 @@
+## Enabling Prefix Cache Mode in KTransformers
+
+To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
+
+### Step 1: Modify the Configuration File
+
+Edit the `./ktransformers/configs/config.yaml` file with the following content (you can adjust the values according to your needs):
+
+```yaml
+attn:
+  page_size: 16 # Size of a page in KV Cache.
+  chunk_size: 256
+kvc2:
+  gpu_only: false # Set to false to enable prefix cache mode (Disk + CPU + GPU KV storage)
+  utilization_percentage: 1.0
+  cpu_memory_size_GB: 500 # Amount of CPU memory allocated for KV Cache
+```
+
+### Step 2: Update Submodules and Recompile
+
+If this is your first time using prefix cache mode, please update the submodules first:
+
+```bash
+git submodule update --init --recursive # Update PhotonLibOS submodule
+```
+
+Then recompile the project:
+
+```bash
+# Install single NUMA dependencies
+USE_BALANCE_SERVE=1  bash ./install.sh
+# For those who have two cpu and 1T RAM（Dual NUMA）:
+USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
+```