Update balance-serve.md

This commit is contained in:
wang jiahao 2025-04-05 11:49:05 +08:00 committed by GitHub
parent 8a1313ca4e
commit 6cbe044aae
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -128,14 +128,17 @@ It features the following arguments:
- `--max_new_tokens`: Maximum number of tokens generated per request. - `--max_new_tokens`: Maximum number of tokens generated per request.
- `--cache_lens`: Total length of kvcache allocated by the scheduler. All requests share a kvcache space. - `--cache_lens`: Total length of kvcache allocated by the scheduler. All requests share a kvcache space.
- `--max_batch_size`: Maximum number of requests (prefill + decode) processed in a single run by the engine. (Supported only by `balance_serve`)
- `--chunk_size`: Maximum number of tokens processed in a single run by the engine. - `--chunk_size`: Maximum number of tokens processed in a single run by the engine.
corresponding to 32768 tokens, and the space occupied will be released after the requests are completed. corresponding to 32768 tokens, and the space occupied will be released after the requests are completed.
- `--max_batch_size`: Maximum number of requests (prefill + decode) processed in a single run by the engine. (Supported only by `balance_serve`)
- `--backend_type`: `balance_serve` is a multi-concurrency backend engine introduced in version v0.2.4. The original single-concurrency engine is `ktransformers`. - `--backend_type`: `balance_serve` is a multi-concurrency backend engine introduced in version v0.2.4. The original single-concurrency engine is `ktransformers`.
- `--model_path`: Path to safetensor config path (only config required, not model safetensors). - `--model_path`: Path to safetensor config path (only config required, not model safetensors).
Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be one of the model names defined in `ktransformers/configs/model_configs.json`. Please note that, since `ver 0.2.4`, the last segment of `${model_path}` directory name **MUST** be one of the model names defined in `ktransformers/configs/model_configs.json`.
- `--force_think`: Force responding the reasoning tag of `DeepSeek R1`. - `--force_think`: Force responding the reasoning tag of `DeepSeek R1`.
The relationship between `max_batch_size`, `cache_lens`, and `max_new_tokens` should satisfy:
`cache_lens > max_batch_size * max_new_tokens`, otherwise the concurrency will decrease.
### 2. access server ### 2. access server
``` ```