Commit graph

1464 commits

Author SHA1 Message Date
Wagner Bruna
25fab4113e
refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179)
All C++ handling code currently:
- build a comma-separated list from the info_vulkan array
- if GGML_VK_VISIBLE_DEVICES isn't set
  - set GGML_VK_VISIBLE_DEVICES to the list

Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this
can be done in the same way at the Python level, before all loading
functions.

Caveat: load_model had the default `inputs.vulkan_info = "0"`,
so the default GPU would be "0" only when loading a text model.
2026-05-02 23:10:29 +08:00
Concedo
42ce63fd3b allow customizing multiuser queue in gui 2026-05-02 18:25:50 +08:00
Concedo
8b62e7b667 allow splitmode to be set independently, enable tensor parallelism 2026-05-02 16:41:28 +08:00
Concedo
7e98e06075 improved lora dir selection via gui 2026-05-02 10:51:35 +08:00
Concedo
b18a250205 handle raw args for model_param 2026-05-02 00:21:45 +08:00
Concedo
ef79904628 added a fix to make description optional in rosie's tool repack 2026-04-30 17:32:34 +08:00
Concedo
029cc3ad99 don't save deprecated args 2026-04-30 16:27:02 +08:00
Concedo
7bd95eb505 routermodetimeout -> reqtimeout, add to gui 2026-04-29 21:55:07 +08:00
Tai An
dfd87c4fb6
feat(router): add --routermodetimeout to make reverse-proxy timeout configurable (#2169)
Closes the hardcoded 600s timeout in the router-mode reverse proxy: long
generations through --routermode would be cut off at the upstream
HTTPConnection timeout regardless of how long the model actually takes,
because http.client.HTTPConnection('localhost', upstream_port, timeout=600)
was wired with a literal 600.

Adds a new --routermodetimeout (default 600) under the admin group, and
threads it through the three HTTPConnection sites in the router handler:
the model-swap reload, the autoswap reload, and the main upstream proxy
forward. Behavior is unchanged at the default; users with long generations
can now pass e.g. --routermodetimeout 3600.

Reported in https://github.com/LostRuins/koboldcpp/issues/2168
2026-04-29 20:20:42 +08:00
Concedo
9eaed2ec32 make musicui accessible to screen readers 2026-04-27 19:43:51 +08:00
Concedo
f679e3fec5 fix missing ipv4 support 2026-04-26 14:44:26 +08:00
Concedo
929f214bf6 updated docs, handle seed oss thinking 2026-04-25 22:44:40 +08:00
Wagner Bruna
c04832bb2b
sd: add eta support (#2164) 2026-04-25 19:04:13 +08:00
Concedo
18a3bedf63 fixed a deadlock 2026-04-25 19:03:03 +08:00
Concedo
4090400dff improved gemma toolcall handling 2026-04-25 09:51:29 +08:00
Concedo
cfb14bd844 fixed more args 2026-04-23 11:11:24 +08:00
Concedo
68e238857f fixed args 2026-04-23 11:00:42 +08:00
Concedo
c818716f57 router mode fixed for parallel requests 2026-04-21 22:33:46 +08:00
Concedo
96ec87127a updated colab, handle connection dropping during prompt processing 2026-04-21 21:46:13 +08:00
Concedo
1feba4e4ea fixed koboldcpp.sh, fixed vision max/min when one param is missing, fixed processing count wrong, updated lite 2026-04-21 18:36:47 +08:00
Concedo
c17ba99812 change time.sleep to asyncio 2026-04-20 23:25:35 +08:00
Concedo
fe4c1b80a1 fix unwanted error print 2026-04-20 13:48:57 +08:00
Concedo
a8290a072f more robust json field handling 2026-04-19 23:27:19 +08:00
Concedo
707bb67b30 minimal uses 10% of budget 2026-04-19 20:19:45 +08:00
Concedo
71b4107bb6 fixed terminal logs 2026-04-19 11:31:12 +08:00
Concedo
8886e48a4a cache sd info 2026-04-19 02:19:11 +08:00
Wagner Bruna
1be08b9d15
sd: report all sampler aliases and centralize name mapping (#2149)
* debug: allow loading backend libraries without normal arg parsing

This is just to be able to test backend functions directly, with e.g.:

>> import koboldcpp
>> koboldcpp.init_libraries()
>> koboldcpp.sd_get_info()

* sd: report all sampler aliases and centralize name mapping
2026-04-19 01:51:42 +08:00
Concedo
e5eab545f3 handle override jinja template 2026-04-19 00:30:28 +08:00
Concedo
17c754a5fc improved reasoning budget 2026-04-18 17:19:09 +08:00
Concedo
0b37cb9a57 added preliminary support for reasoning budget 2026-04-18 11:56:33 +08:00
Concedo
9a38091207 support q5_1 kv 2026-04-17 17:06:15 +08:00
Concedo
e074939c17 compact context GUI page (+1 squashed commits)
Squashed commits:

[136f073ce] compact context GUI page
2026-04-17 14:40:53 +08:00
Concedo
aed18cc901 swa padding default to 0 2026-04-17 10:54:14 +08:00
Concedo
ae292c496e handle SWA conflicting with rewind, increased default SWA padding. 2026-04-16 17:00:26 +08:00
Concedo
0251c6dbde added swa padding controls 2026-04-16 16:21:48 +08:00
Concedo
a9e817fb4c smartcache off when fastforward off 2026-04-16 15:29:23 +08:00
Concedo
535df844dd touchup for min/max tokens ui 2026-04-16 14:56:22 +08:00
Llama
c592bd01da
Pass img_min_params and img_max_params to ctx_clip_params (#2133)
* Pass img_min_params and img_max_params to ctx_clip_params

These values determine the minimum and maximum size (in
tokens) of vision embeddings. The default value of -1
uses a model-dependent default size, for example for
Gemma 4 the default is a 280 token embedding. For higher
quality results (at the cost of using more memory and
slower speed) you can increase the size of the embedding
to 1120 tokens.

* Change dict to mydict to match change to method
2026-04-16 12:27:06 +08:00
Concedo
a9f9e9a38b rename the filepaths for clarity (+1 squashed commits)
Squashed commits:

[fa8fc6914] rename the filepaths for clarity
2026-04-16 12:17:23 +08:00
Concedo
45737effd3 refactor for clarity 2026-04-16 10:53:35 +08:00
Rose
2f67e9f096
new baseconfig setting that aworks in router mode (#2130)
* new baseconfig setting that aworks in router mode

* re-added fix that prevents unneccessary model reload

* fixed the fix

* swapped order of baseconfig <-> override

* fix indent

* simplify baseconfig, if specified AND restart_override_config_target is NOT, it simply replaces the field (+1 squashed commits)

Squashed commits:

[95e816b16] simplify baseconfig, if specified AND restart_override_config_target is NOT, it simply replaces the field

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-04-15 22:50:47 +08:00
Concedo
c6b59fc2c7 autoswap some edge conditions 2026-04-14 23:02:29 +08:00
Concedo
3f810dc8c7 fixed preload story import for large stories 2026-04-13 23:27:55 +08:00
Concedo
c984147c84 fix quotes 2026-04-13 22:50:08 +08:00
Concedo
5a3369fd2a support for gpt oss jinja 2026-04-12 16:13:51 +08:00
Concedo
4084917cab fixed token counting limit (+1 squashed commits)
Squashed commits:

[314528eb2] fixed token counting limit, set to max supported ctx of 256k
2026-04-12 15:36:03 +08:00
Concedo
f07dcbf7af allow tokencount to handle messages 2026-04-12 11:46:37 +08:00
Concedo
6556161804 jinja tool streaming is now finally working 2026-04-12 02:05:39 +08:00
Concedo
c4abba8868 almost working 2026-04-12 01:44:41 +08:00
Concedo
3175da0873 cleanup - do not use tool calls from kai api, only 2026-04-11 12:19:48 +08:00