mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-05 20:19:51 +00:00
update readme
This commit is contained in:
parent
ee524b0f41
commit
c3d0ac80c6
4 changed files with 9 additions and 1 deletions
|
@ -100,8 +100,10 @@ git submodule update --init --recursive
|
|||
|
||||
# Install single NUMA dependencies
|
||||
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
# For those who have two cpu and 1T RAM(Dual NUMA):
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
```
|
||||
|
||||
## Running DeepSeek-R1-Q4KM Models
|
||||
|
|
|
@ -117,11 +117,13 @@ Download source code and compile:
|
|||
|
||||
```shell
|
||||
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
```
|
||||
- For Multi-concurrency with two cpu and 1T RAM:
|
||||
|
||||
```shell
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
```
|
||||
- For Windows (Windows native temporarily deprecated, please try WSL)
|
||||
|
||||
|
|
|
@ -67,9 +67,11 @@ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.o
|
|||
|
||||
```bash
|
||||
# Install single NUMA dependencies
|
||||
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||
USE_BALANCE_SERVE=1 bash ./
|
||||
pip install third_party/custom_flashinfer/
|
||||
# For those who have two cpu and 1T RAM(Dual NUMA):
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
```
|
||||
|
||||
### 4. Use our custom config.json
|
||||
|
|
|
@ -127,8 +127,10 @@ cd ktransformers
|
|||
git submodule update --init --recursive
|
||||
# 如果使用双 numa 版本
|
||||
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
# 如果使用单 numa 版本
|
||||
USE_BALANCE_SERVE=1 bash ./install.sh
|
||||
pip install third_party/custom_flashinfer/
|
||||
# 启动命令
|
||||
python ktransformers/server/main.py --model_path <your model path> --gguf_path <your gguf path> --cpu_infer 62 --optimize_config_path <inject rule path> --port 10002 --chunk_size 256 --max_new_tokens 1024 --max_batch_size 4 --port 10002 --cache_lens 32768 --backend_type balance_serve
|
||||
```
|
||||
|
|
Loading…
Add table
Reference in a new issue