koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 04:09:19 +00:00

Author	SHA1	Message	Date
Concedo	0d320f60a6	fix multiuser regression	2026-05-17 00:17:12 +08:00
Concedo	47d5772fbe	add batching failure spam logs	2026-05-16 23:21:01 +08:00
Concedo	9203b6a051	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # .github/workflows/server-sanitize.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server.yml # .github/workflows/ui-build.yml # .github/workflows/ui-ci.yml # .github/workflows/ui-publish.yml # .gitignore # CMakeLists.txt # CODEOWNERS # scripts/ui-download.cmake # scripts/xxd.cmake # tests/test-backend-ops.cpp # tests/test-reasoning-budget.cpp # tools/CMakeLists.txt # tools/server/CMakeLists.txt # tools/server/README.md	2026-05-16 22:56:33 +08:00
Concedo	3095da076a	only fetch new popped horde requests if model is not blocked queue	2026-05-16 22:27:12 +08:00
Concedo	80ce8a50b3	allow token bans and eos handling in	2026-05-16 15:20:46 +08:00
Wagner Bruna	f273fd35b9	sd: sync to master-601-eeac950 (#2206 ) * sd: sync to master-601-eeac950 * sd: add mmap support	2026-05-16 11:23:10 +08:00
Concedo	77fa2cd348	batching horde worker adjustments	2026-05-16 00:30:23 +08:00
Concedo	35f524d3e2	horde advertise more threads when batching is enabled	2026-05-15 17:36:53 +08:00
Reithan	5962bca463	Fix jinja error on case-insensitive roles and 0-len messages result (#2201 ) * fix jinja error on case-insensitive roles and 0-len messages result * check length in correct place	2026-05-15 16:48:42 +08:00
Concedo	1fe1a083cd	run multiple horde workers if used with batching.	2026-05-14 23:36:42 +08:00
Concedo	286e62267e	adjust batching eligibility	2026-05-11 21:54:32 +08:00
Concedo	bfaddd7a3b	added support for added memory and gemma and glm prompt fixes for batching mode	2026-05-10 23:39:03 +08:00
Concedo	33ca75d56f	ci for tools upload, minor function reordering	2026-05-10 23:10:43 +08:00
AlpinDale	c03302b670	feat: add a primitive form of continuous batching (#2167 ) * feat: add a primitive form of continuous batching * fix: deadlock in batching fallback * fix: windows build * chore: suppress the contbatch arg from --help * feat: batch-aware rep_pen_slope * fix: automatically disable shifting when batching is enabled * fix: mixed-path state corruption * fix: attempt to fully separate the two pipelines * added a semaphore to prevent non-batchable requests from starting while batched requests are running --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-10 17:50:31 +08:00
Concedo	7a2f653451	falsy value handling on load config	2026-05-07 23:42:44 +08:00
Concedo	15e86c4f9b	hard coded reasoning_effort field from the api payload and force it into the jinja kwargs (request by @henk717). field name also hardcoded.	2026-05-06 17:35:26 +08:00
Tai An	24495f6c48	docs(args): clarify --debugmode level semantics in help text (#2181 ) Closes #2178 The --debugmode help string previously read "Shows additional debug info in the terminal" with no indication of what numeric values it accepts or what each does — making the recommended troubleshooting flag opaque (per #2178). Document the three values actually checked in the source: -1: Horde-quiet (suppresses non-essential prints; auto-applied when --horde* args are set, see configure_horde_settings) 0: default 1: verbose (extra slot/cache info; larger utfprint buffer; retains 'debug-' horde model prefix; etc.) Also note that bare --debugmode (no value) implies 1, which is the existing argparse behavior (nargs='?', const=1) but easy to miss.	2026-05-03 16:06:13 +08:00
Concedo	676e716ce3	try to handle duplicate think tags by swallowing them	2026-05-03 16:02:38 +08:00
Concedo	9be810628e	setenv return int	2026-05-03 13:32:05 +08:00
Concedo	2fb97d9c2c	explicitly set env var internally.	2026-05-03 13:18:50 +08:00
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Concedo	42ce63fd3b	allow customizing multiuser queue in gui	2026-05-02 18:25:50 +08:00
Concedo	8b62e7b667	allow splitmode to be set independently, enable tensor parallelism	2026-05-02 16:41:28 +08:00
Concedo	7e98e06075	improved lora dir selection via gui	2026-05-02 10:51:35 +08:00
Concedo	b18a250205	handle raw args for model_param	2026-05-02 00:21:45 +08:00
Concedo	ef79904628	added a fix to make description optional in rosie's tool repack	2026-04-30 17:32:34 +08:00
Concedo	029cc3ad99	don't save deprecated args	2026-04-30 16:27:02 +08:00
Concedo	7bd95eb505	routermodetimeout -> reqtimeout, add to gui	2026-04-29 21:55:07 +08:00
Tai An	dfd87c4fb6	feat(router): add --routermodetimeout to make reverse-proxy timeout configurable (#2169 ) Closes the hardcoded 600s timeout in the router-mode reverse proxy: long generations through --routermode would be cut off at the upstream HTTPConnection timeout regardless of how long the model actually takes, because http.client.HTTPConnection('localhost', upstream_port, timeout=600) was wired with a literal 600. Adds a new --routermodetimeout (default 600) under the admin group, and threads it through the three HTTPConnection sites in the router handler: the model-swap reload, the autoswap reload, and the main upstream proxy forward. Behavior is unchanged at the default; users with long generations can now pass e.g. --routermodetimeout 3600. Reported in https://github.com/LostRuins/koboldcpp/issues/2168	2026-04-29 20:20:42 +08:00
Concedo	9eaed2ec32	make musicui accessible to screen readers	2026-04-27 19:43:51 +08:00
Concedo	f679e3fec5	fix missing ipv4 support	2026-04-26 14:44:26 +08:00
Concedo	929f214bf6	updated docs, handle seed oss thinking	2026-04-25 22:44:40 +08:00
Wagner Bruna	c04832bb2b	sd: add eta support (#2164 )	2026-04-25 19:04:13 +08:00
Concedo	18a3bedf63	fixed a deadlock	2026-04-25 19:03:03 +08:00
Concedo	4090400dff	improved gemma toolcall handling	2026-04-25 09:51:29 +08:00
Concedo	cfb14bd844	fixed more args	2026-04-23 11:11:24 +08:00
Concedo	68e238857f	fixed args	2026-04-23 11:00:42 +08:00
Concedo	c818716f57	router mode fixed for parallel requests	2026-04-21 22:33:46 +08:00
Concedo	96ec87127a	updated colab, handle connection dropping during prompt processing	2026-04-21 21:46:13 +08:00
Concedo	1feba4e4ea	fixed koboldcpp.sh, fixed vision max/min when one param is missing, fixed processing count wrong, updated lite	2026-04-21 18:36:47 +08:00
Concedo	c17ba99812	change time.sleep to asyncio	2026-04-20 23:25:35 +08:00
Concedo	fe4c1b80a1	fix unwanted error print	2026-04-20 13:48:57 +08:00
Concedo	a8290a072f	more robust json field handling	2026-04-19 23:27:19 +08:00
Concedo	707bb67b30	minimal uses 10% of budget	2026-04-19 20:19:45 +08:00
Concedo	71b4107bb6	fixed terminal logs	2026-04-19 11:31:12 +08:00
Concedo	8886e48a4a	cache sd info	2026-04-19 02:19:11 +08:00
Wagner Bruna	1be08b9d15	sd: report all sampler aliases and centralize name mapping (#2149 ) * debug: allow loading backend libraries without normal arg parsing This is just to be able to test backend functions directly, with e.g.: >> import koboldcpp >> koboldcpp.init_libraries() >> koboldcpp.sd_get_info() * sd: report all sampler aliases and centralize name mapping	2026-04-19 01:51:42 +08:00
Concedo	e5eab545f3	handle override jinja template	2026-04-19 00:30:28 +08:00
Concedo	17c754a5fc	improved reasoning budget	2026-04-18 17:19:09 +08:00
Concedo	0b37cb9a57	added preliminary support for reasoning budget	2026-04-18 11:56:33 +08:00

1 2 3 4 5 ...

1484 commits