All C++ handling code currently:
- build a comma-separated list from the info_vulkan array
- if GGML_VK_VISIBLE_DEVICES isn't set
- set GGML_VK_VISIBLE_DEVICES to the list
Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this
can be done in the same way at the Python level, before all loading
functions.
Caveat: load_model had the default `inputs.vulkan_info = "0"`,
so the default GPU would be "0" only when loading a text model.
Closes the hardcoded 600s timeout in the router-mode reverse proxy: long
generations through --routermode would be cut off at the upstream
HTTPConnection timeout regardless of how long the model actually takes,
because http.client.HTTPConnection('localhost', upstream_port, timeout=600)
was wired with a literal 600.
Adds a new --routermodetimeout (default 600) under the admin group, and
threads it through the three HTTPConnection sites in the router handler:
the model-swap reload, the autoswap reload, and the main upstream proxy
forward. Behavior is unchanged at the default; users with long generations
can now pass e.g. --routermodetimeout 3600.
Reported in https://github.com/LostRuins/koboldcpp/issues/2168
* debug: allow loading backend libraries without normal arg parsing
This is just to be able to test backend functions directly, with e.g.:
>> import koboldcpp
>> koboldcpp.init_libraries()
>> koboldcpp.sd_get_info()
* sd: report all sampler aliases and centralize name mapping
* Pass img_min_params and img_max_params to ctx_clip_params
These values determine the minimum and maximum size (in
tokens) of vision embeddings. The default value of -1
uses a model-dependent default size, for example for
Gemma 4 the default is a 280 token embedding. For higher
quality results (at the cost of using more memory and
slower speed) you can increase the size of the embedding
to 1120 tokens.
* Change dict to mydict to match change to method
* new baseconfig setting that aworks in router mode
* re-added fix that prevents unneccessary model reload
* fixed the fix
* swapped order of baseconfig <-> override
* fix indent
* simplify baseconfig, if specified AND restart_override_config_target is NOT, it simply replaces the field (+1 squashed commits)
Squashed commits:
[95e816b16] simplify baseconfig, if specified AND restart_override_config_target is NOT, it simply replaces the field
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>