mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-07 04:59:55 +00:00
Update prefix_cache.md
This commit is contained in:
parent
a9a72e52c3
commit
5a73aaf652
1 changed files with 6 additions and 2 deletions
|
@ -1,6 +1,6 @@
|
|||
## Enabling Prefix Cache Mode in KTransformers
|
||||
|
||||
To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
|
||||
Balance serve now supports prefix cache reuse! To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
|
||||
|
||||
### Step 1: Modify the Configuration File
|
||||
|
||||
|
@ -31,4 +31,8 @@ Then recompile the project:
|
|||
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||
# For those who have two cpu and 1T RAM(Dual NUMA):
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
```
|
||||
```
|
||||
|
||||
## Note
|
||||
Balance serve utilizes a 3-layer (GPU-CPU-Disk) scheme to store and reuse KVCache. Deleting KVCache is not supported now. If you have too much KVCache, you can simply delete them by remove kvcache files.
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue