server: add router multi-model tests (#17704) (#17722) · e7c2cf1356 - vrr/koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-04-30 20:50:16 +00:00

server: add router multi-model tests (#17704) (#17722)

Some checks are pending

Python Type-Check / pyright type-check (push) Waiting to run

Details

* llama-server: add router multi-model tests (#17704)

Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication

Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.

utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)

* llama-server: update test models to new HF repos

* add offline

* llama-server: fix router LRU eviction test and add preloading

Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.

Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models

* llama-server: fix split model download on Windows

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

This commit is contained in:

Pascal

2025-12-03 15:10:37 +01:00

• committed by

GitHub

parent 1257491047

commit e7c2cf1356

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

3 changed files with 169 additions and 6 deletions

									
										1

tools/server/tests/unit/test_basic.py
									
										View file
										
				@ -65,6 +65,7 @@ def test_server_slots():

				def test_load_split_model():

				    global server

				    server.offline = False

				    server.model_hf_repo = "ggml-org/models"

				    server.model_hf_file = "tinyllamas/split/stories15M-q8_0-00001-of-00003.gguf"

				    server.model_alias = "tinyllama-split"

Rows
Columns

server: add router multi-model tests (#17704) (#17722) Some checks are pending Python Type-Check / pyright type-check (push) Waiting to run Details

1 tools/server/tests/unit/test_basic.py Unescape Escape View file

server: add router multi-model tests (#17704) (#17722)

Some checks are pending

Python Type-Check / pyright type-check (push) Waiting to run

Details

1

tools/server/tests/unit/test_basic.py

View file