mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-04-30 20:50:16 +00:00
server: add router multi-model tests (#17704) (#17722)
Some checks are pending
Python Type-Check / pyright type-check (push) Waiting to run
Some checks are pending
Python Type-Check / pyright type-check (push) Waiting to run
* llama-server: add router multi-model tests (#17704) Add 4 test cases for model router: - test_router_unload_model: explicit model unloading - test_router_models_max_evicts_lru: LRU eviction with --models-max - test_router_no_models_autoload: --no-models-autoload flag behavior - test_router_api_key_required: API key authentication Tests use async model loading with polling and graceful skip when insufficient models available for eviction testing. utils.py changes: - Add models_max, models_dir, no_models_autoload attributes to ServerProcess - Handle JSONDecodeError for non-JSON error responses (fallback to text) * llama-server: update test models to new HF repos * add offline * llama-server: fix router LRU eviction test and add preloading Fix eviction test: load 2 models first, verify state, then load 3rd to trigger eviction. Previous logic loaded all 3 at once, causing first model to be evicted before verification could occur. Add module fixture to preload models via ServerPreset.load_all() and mark test presets as offline to use cached models * llama-server: fix split model download on Windows --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
This commit is contained in:
parent
1257491047
commit
e7c2cf1356
3 changed files with 169 additions and 6 deletions
|
|
@ -65,6 +65,7 @@ def test_server_slots():
|
|||
|
||||
def test_load_split_model():
|
||||
global server
|
||||
server.offline = False
|
||||
server.model_hf_repo = "ggml-org/models"
|
||||
server.model_hf_file = "tinyllamas/split/stories15M-q8_0-00001-of-00003.gguf"
|
||||
server.model_alias = "tinyllama-split"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue