mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-08 05:29:29 +00:00
add prefix cache documentation
This commit is contained in:
parent
4d51831316
commit
cc822df65d
1 changed files with 34 additions and 0 deletions
34
doc/en/prefix_cache.md
Normal file
34
doc/en/prefix_cache.md
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
## Enabling Prefix Cache Mode in KTransformers
|
||||||
|
|
||||||
|
To enable **Prefix Cache Mode** in KTransformers, you need to modify the configuration file and recompile the project.
|
||||||
|
|
||||||
|
### Step 1: Modify the Configuration File
|
||||||
|
|
||||||
|
Edit the `./ktransformers/configs/config.yaml` file with the following content (you can adjust the values according to your needs):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
attn:
|
||||||
|
page_size: 16 # Size of a page in KV Cache.
|
||||||
|
chunk_size: 256
|
||||||
|
kvc2:
|
||||||
|
gpu_only: false # Set to false to enable prefix cache mode (Disk + CPU + GPU KV storage)
|
||||||
|
utilization_percentage: 1.0
|
||||||
|
cpu_memory_size_GB: 500 # Amount of CPU memory allocated for KV Cache
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Update Submodules and Recompile
|
||||||
|
|
||||||
|
If this is your first time using prefix cache mode, please update the submodules first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git submodule update --init --recursive # Update PhotonLibOS submodule
|
||||||
|
```
|
||||||
|
|
||||||
|
Then recompile the project:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install single NUMA dependencies
|
||||||
|
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||||
|
# For those who have two cpu and 1T RAM(Dual NUMA):
|
||||||
|
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||||
|
```
|
Loading…
Add table
Reference in a new issue