Update prefix_cache.md

This commit is contained in:
ErvinXie 2025-06-30 15:04:37 +08:00 committed by GitHub
parent a9a72e52c3
commit 5a73aaf652
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,6 +1,6 @@
## Enabling Prefix Cache Mode in KTransformers
To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
### Step 1: Modify the Configuration File
@ -31,4 +31,8 @@ Then recompile the project:
USE_BALANCE_SERVE=1 bash ./install.sh
# For those who have two cpu and 1T RAMDual NUMA:
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
```
```
## Note
Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.