koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

Author	SHA1	Message	Date
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Concedo	8b62e7b667	allow splitmode to be set independently, enable tensor parallelism	2026-05-02 16:41:28 +08:00
Wagner Bruna	c04832bb2b	sd: add eta support (#2164 )	2026-04-25 19:04:13 +08:00
Concedo	e5eab545f3	handle override jinja template	2026-04-19 00:30:28 +08:00
Concedo	17c754a5fc	improved reasoning budget	2026-04-18 17:19:09 +08:00
Concedo	0b37cb9a57	added preliminary support for reasoning budget	2026-04-18 11:56:33 +08:00
Concedo	0251c6dbde	added swa padding controls	2026-04-16 16:21:48 +08:00
Concedo	535df844dd	touchup for min/max tokens ui	2026-04-16 14:56:22 +08:00
Llama	c592bd01da	Pass img_min_params and img_max_params to ctx_clip_params (#2133 ) * Pass img_min_params and img_max_params to ctx_clip_params These values determine the minimum and maximum size (in tokens) of vision embeddings. The default value of -1 uses a model-dependent default size, for example for Gemma 4 the default is a 280 token embedding. For higher quality results (at the cost of using more memory and slower speed) you can increase the size of the embedding to 1120 tokens. * Change dict to mydict to match change to method	2026-04-16 12:27:06 +08:00
Wagner Bruna	e3c6227d46	sd: report back image generation parameters and metadata (#2062 ) * sd: refactor image generation result handling * sd: report back image generation metadata	2026-03-28 00:49:03 +08:00
Concedo	c91f350ed5	increase max images, take images from the end instead of beginning if too many images	2026-03-26 23:03:52 +08:00
Concedo	0d50cafd8b	added CustomVoice support	2026-03-23 18:50:08 +08:00
Concedo	fdfb713d91	added `--sdmaingpu` allowing image models to be independently placed on any gpu	2026-03-21 17:34:12 +08:00
Wagner Bruna	feea014774	sd: support for dynamic LoRA loading from a directory (#2036 ) * backend support for controlling LoRA cache and fixed multipliers The generation LoRA multipliers are now added to the initial multipliers, so e.g. a merged LCM model will behave the same as a normal model with a preloaded LCM LoRA. * frontend support	2026-03-16 20:39:21 +08:00
Wagner Bruna	b437d18319	add support for cache modes to accelerate image generation (#2021 ) * sd: sync to master-525-d6dd6d7 * sd: add support for cache modes for inference acceleration * keep gendefaults as a JSON object inside the config file * covered more invalid cases on gendefaults parsing	2026-03-15 15:27:14 +08:00
Concedo	8f23b8d81e	wip on ref audio, but it compiles	2026-03-12 23:46:10 +08:00
Concedo	3fd9648726	added mp3 support	2026-03-12 21:00:50 +08:00
Concedo	211d4fe632	lots of tweaks for ace step	2026-03-11 23:57:52 +08:00
Wagner Bruna	3f42ed1af7	support for customizing LoRA multipliers through the sdapi (#1982 ) * fix corner case in sd_oai_transform_params Also fix typo in the function name. * support for customizing loaded LoRA multipliers The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. If the list of multipliers is shorter than the list of LoRAs, the multiplier list is extended with the first multiplier (1.0 by default), to keep it compatible with the previous behavior. * support for `<lora:name:multiplier>` prompt syntax and metadata * add a few tests for sanitize_lora_multipliers	2026-03-10 21:29:39 +08:00
Wagner Bruna	5c40f07d4a	sd: sync to 0752cc9 (master-507-b314d80 +1) (#1999 ) * sd: sync to 0752cc9 (master-507-b314d80 +1) * sd: add flow-shift support to gendefaults	2026-02-28 12:22:32 +08:00
Concedo	ba42f22fc8	stereo is working	2026-02-27 20:36:44 +08:00
Concedo	5c5fe55f7d	bump kv overrides max (+1 squashed commits) Squashed commits: [9bc8212a0] bump kv overrides max	2026-02-26 00:24:53 +08:00
Concedo	11a85d62fc	lowvram for music lm	2026-02-24 22:21:17 +08:00
Concedo	488c431331	not yet working	2026-02-24 17:47:50 +08:00
Concedo	c2b0cb26a8	ace step codes api	2026-02-23 14:04:45 +08:00
Concedo	4be93db21c	ace step codes generation now working	2026-02-23 00:27:26 +08:00
Concedo	13db5aee9e	stub files for loading ace step	2026-02-22 23:15:08 +08:00
Concedo	2db018a1d7	qwen3tts support reference audio	2026-02-21 17:30:21 +08:00
Concedo	bf3f2e1ba8	support loading multiple sd loras (up to 4 at once)	2026-02-19 13:57:58 +08:00
Concedo	349c461453	add stop reason for error	2026-02-04 20:23:18 +08:00
Concedo	7f485e5287	remove CLBlast, part 1	2026-01-23 13:50:12 +08:00
Concedo	cdd6578a9a	esrgan added	2026-01-20 22:10:37 +08:00
Concedo	c9c15749e0	wip on adding esrgan upscaling	2026-01-20 00:35:35 +08:00
Llama	95ebfdcde8	Add token ids to logprob data returned by the API (#1928 ) Previously, logprobs only contained the token string and byte data, as well as the log probability itself. For workflows that require the token id, translating from the token bytes to the token id is potentially costly and unreliable. It is simple and inexpensive to expose the numeric token ids directly instead.	2026-01-18 16:30:46 +08:00
Concedo	3816391a74	increase logprobs returned to 10	2026-01-18 11:13:42 +08:00
Concedo	62bea5ef4f	allow overriding the devices directly	2026-01-17 19:08:06 +08:00
Concedo	d8942cde14	smartcache allow custom number of slots	2026-01-02 17:19:40 +08:00
Concedo	442fa7cd7c	support for circular textures in sdcpp	2026-01-01 16:34:09 +08:00
Concedo	27261bfc26	adaptive decay as an overridable param (+1 squashed commits) Squashed commits: [d94df7843] adaptive decay as an overridable param	2025-12-28 13:34:20 +08:00
Concedo	6548645aaa	rename power law sampler to adaptive p	2025-12-27 17:50:58 +08:00
Concedo	91d8863f18	power law sampler added	2025-12-27 09:46:06 +08:00
Wagner Bruna	f30da43b7f	sd: get the available schedulers directly from sd.cpp (#1900 ) Avoids a hardcoded list on the Python side.	2025-12-24 21:55:24 +08:00
Concedo	26d89bf589	support for downloading AVI from sdui	2025-12-24 18:40:10 +08:00
Concedo	fedd529fdc	autofit counts overheads	2025-12-21 14:31:08 +08:00
Concedo	1e083d9c8b	integrate autofit for upstream, removed forceversion	2025-12-17 18:42:47 +08:00
Concedo	ab9bc6f2ae	zimage cfg clamp is opt out with remove_limits	2025-12-13 09:20:00 +08:00
Concedo	fd0d0cab03	move pipeline parallelism to a --pipelineparallel launch flag	2025-12-11 21:03:41 +08:00
Concedo	8a18e094f5	added smartcaching implementation inspired from Pento95 (+2 squashed commit) Squashed commit: [fcc498688] wip basic smart caching test [b6e8b2577] wip basic smart caching test	2025-12-10 18:00:03 +08:00
Wagner Bruna	3a7dd1a97f	sd: sync to master-358-347710f Also adapt Koboldcpp LoRA loading function, and add backend support for lora_apply_mode.	2025-11-23 19:28:54 -03:00
LostRuins Concedo	281542aa0d	add smoothing curve, not tested	2025-11-17 23:07:35 +08:00

1 2 3 4 5

221 commits