diff --git a/doc/en/prefix_cache.md b/doc/en/prefix_cache.md index bdbac02..f854b44 100644 --- a/doc/en/prefix_cache.md +++ b/doc/en/prefix_cache.md @@ -1,6 +1,6 @@ ## Enabling Prefix Cache Mode in KTransformers -To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project. +Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project. ### Step 1: Modify the Configuration File @@ -31,4 +31,8 @@ Then recompile the project: USE_BALANCE_SERVE=1 bash ./install.sh # For those who have two cpu and 1T RAM(Dual NUMA): USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh -``` \ No newline at end of file +``` + +## Note +Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files. +