koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-07 00:41:50 +00:00

Author	SHA1	Message	Date
Concedo	c4b1a17e1a	tools debug	2026-03-19 23:13:02 +08:00
Concedo	2f63f94fd8	fix router nocertify mode	2026-03-19 12:45:19 +08:00
Concedo	8cf9ba34e9	fixed SSL in routermode	2026-03-19 12:43:11 +08:00
Concedo	15e86010d8	autofit will clear moecpu and overridetensors	2026-03-18 21:20:57 +08:00
Concedo	d85272a958	fixed wrong encoding (+1 squashed commits) Squashed commits: [a87d059a8] fixed wrong encoding	2026-03-17 15:54:54 +08:00
Concedo	e09ddc8fff	mcp fix (+1 squashed commits) Squashed commits: [c5a959a07] mcp fix	2026-03-17 15:45:05 +08:00
Concedo	837fe9d832	mcp stdio fixes	2026-03-17 15:34:05 +08:00
Concedo	39f9007d12	handle notifications in mcp	2026-03-17 15:13:42 +08:00
Concedo	6d3f01d139	compact css, fix .py variable name error	2026-03-17 11:11:46 +08:00
henk717	927d3c68bb	502 Loading page (#2042 ) * Proper Loading page * Loading page wording * Different wording	2026-03-17 10:59:44 +08:00
Wagner Bruna	51187d5362	sd: support changing preloaded LoRA multipliers (#2041 ) * sd: remove C++ support for enforcing fixed LoRA multipliers The logic at the Python level is enough. * sd: support changing preloaded LoRA multipliers We keep the same rules as before: - Any LoRA with multiplier 0 can be changed - If all LoRAs have multiplier != 0, they are fixed and optimized but tweak the corner case of LoRAs specified more than once to allow adjusting the multiplier if the same LoRA is also specified with a zero multiplier, as if they were two different LoRAs. So the following keeps working as before: - --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1 - --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0 - --sdlora /loras/ : dynamic, default 0 - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2 But now we have: - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1 - --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1	2026-03-17 10:09:55 +08:00
Wagner Bruna	0c66ed863d	sd: additional validation for the LoRA list (#2043 ) * sd: additional validation for the LoRA list * sd: sanitize LoRA list before downloading	2026-03-17 10:09:10 +08:00
Concedo	ea15dfab83	added auto unload for admin mode	2026-03-16 23:56:34 +08:00
Concedo	b656b7c929	router works with singleinstance	2026-03-16 22:53:24 +08:00
Concedo	6c8c55afb5	PR cleanup	2026-03-16 21:04:16 +08:00
Wagner Bruna	feea014774	sd: support for dynamic LoRA loading from a directory (#2036 ) * backend support for controlling LoRA cache and fixed multipliers The generation LoRA multipliers are now added to the initial multipliers, so e.g. a merged LCM model will behave the same as a normal model with a preloaded LCM LoRA. * frontend support	2026-03-16 20:39:21 +08:00
Concedo	310bd97972	router mode is fully functional	2026-03-15 23:07:47 +08:00
Concedo	5a65005d27	allow swap back to initial model	2026-03-15 21:35:23 +08:00
Concedo	ccd4745e0c	ollama streaming emulation	2026-03-15 18:25:37 +08:00
Concedo	2e725e4f10	fixed port bug	2026-03-15 17:00:50 +08:00
Wagner Bruna	b437d18319	add support for cache modes to accelerate image generation (#2021 ) * sd: sync to master-525-d6dd6d7 * sd: add support for cache modes for inference acceleration * keep gendefaults as a JSON object inside the config file * covered more invalid cases on gendefaults parsing	2026-03-15 15:27:14 +08:00
Concedo	3b9385a627	updated colab, wip model router	2026-03-15 00:38:29 +08:00
Concedo	8b9594b6ea	wip router mode	2026-03-14 17:07:05 +08:00
Concedo	6143a75426	improve autofit padding heuristics	2026-03-14 00:36:52 +08:00
Concedo	8f23b8d81e	wip on ref audio, but it compiles	2026-03-12 23:46:10 +08:00
Concedo	d5a4c17e14	mp3 not default	2026-03-12 21:42:59 +08:00
Concedo	3fd9648726	added mp3 support	2026-03-12 21:00:50 +08:00
Wagner Bruna	796f7bdeff	sd: fix LoRA multiplier logic to switch to at_runtime mode (#2029 ) `0. in inputs.lora_multipliers` didn't work because the C array has variable length. Also fixed a few corner cases related to the default multipliers (mainly to ensure robustness against future changes, since in most cases the multiplier list is already sanitized by a previous function).	2026-03-12 15:36:51 +08:00
Concedo	3cc6e2ea17	make stereo default	2026-03-12 00:10:25 +08:00
Concedo	211d4fe632	lots of tweaks for ace step	2026-03-11 23:57:52 +08:00
Concedo	8095bf9807	include overhead fromn music models	2026-03-10 22:52:20 +08:00
Concedo	b06dd2606e	ruff: linting	2026-03-10 21:32:36 +08:00
Wagner Bruna	3f42ed1af7	support for customizing LoRA multipliers through the sdapi (#1982 ) * fix corner case in sd_oai_transform_params Also fix typo in the function name. * support for customizing loaded LoRA multipliers The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. If the list of multipliers is shorter than the list of LoRAs, the multiplier list is extended with the first multiplier (1.0 by default), to keep it compatible with the previous behavior. * support for `<lora:name:multiplier>` prompt syntax and metadata * add a few tests for sanitize_lora_multipliers	2026-03-10 21:29:39 +08:00
Concedo	eafb5ff4c5	autofit improvement e.g. for strix (+1 squashed commits) Squashed commits: [`6f6fd59c3`] autofit improvement e.g. for strix	2026-03-10 21:20:02 +08:00
Concedo	270d4ad2c1	fixed a typo	2026-03-08 12:56:08 +08:00
Concedo	73fc5c4767	handle jinja exceptions	2026-03-08 12:12:02 +08:00
Concedo	41df8b09e5	jinjatools now works mostly well	2026-03-08 11:55:22 +08:00
Concedo	2c38638b3d	Merge commit '`2afcdb9777`' into concedo_experimental # Conflicts: # scripts/sync_vendor.py # tests/CMakeLists.txt	2026-03-06 21:13:15 +08:00
Gustavo Rocha Dias	cbecc34667	Fix OAI-compatible token usage and unique request IDs (#2015 ) * fix: token usage fix for mistral-vibe * fix: generate unique request IDs for OAI-compatible responses * fix: prompt_tokens reporting KV cache size instead of actual count during streaming * fixes for PR #2015 For (1), this is not a good idea. If it returned 0 (e.g. during an error), this value may not be updated and will return the value of a previous or different request. It's better to return 0 in those cases. For (2), this is a good idea but we don't need that level of randomness. I'll probably swap it with a 6 digit random number instead. For (3), the official openai spec gates it behind stream_options.include_usage = true so i'll do that too * missed 1 item --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-03-06 20:57:22 +08:00
Concedo	8658af1018	qwen3tts default to cpu unless gpu selected	2026-03-05 11:11:46 +08:00
Concedo	5d35193749	fixed a sse stream issue	2026-03-03 21:30:28 +08:00
Concedo	7df210833e	missed one case for autofit	2026-03-03 21:05:59 +08:00
Concedo	d7fb3df10a	support 1 level deep admindir	2026-03-02 16:23:34 +08:00
Concedo	c9e651f7e5	updated lite, fix some cuda spams, fix qwen3tts voice loading	2026-03-01 00:41:56 +08:00
Wagner Bruna	5c40f07d4a	sd: sync to 0752cc9 (master-507-b314d80 +1) (#1999 ) * sd: sync to 0752cc9 (master-507-b314d80 +1) * sd: add flow-shift support to gendefaults	2026-02-28 12:22:32 +08:00
Concedo	14d82bb38e	allow music llm and diffusion gen models to be loaded independently	2026-02-27 21:56:48 +08:00
Concedo	ba42f22fc8	stereo is working	2026-02-27 20:36:44 +08:00
Wagner Bruna	d400b37215	config file saving enhancements (#1994 ) * process --exportconfig and --exporttemplate after --config This allows using `--config oldfile.kcpps --exportconfig newfile.kcpps` to update old config items, copy a config file with changed parameters, download and save a remote config, etc. * filter out command flags from the saved config files Also ident files saved by command-line.	2026-02-26 14:55:01 +08:00
Concedo	5c5fe55f7d	bump kv overrides max (+1 squashed commits) Squashed commits: [9bc8212a0] bump kv overrides max	2026-02-26 00:24:53 +08:00
Concedo	8a3ccfcba5	some fixes but some issues	2026-02-25 23:41:32 +08:00

1 2 3 4 5 ...

1351 commits