koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-13 15:39:11 +00:00

Author	SHA1	Message	Date
Wagner Bruna	9223f41320	sd: call SetCircularAxesAll directly (#2078 )	2026-03-29 01:17:48 +08:00
Concedo	2cdf02102e	preserve previous filename	2026-03-28 01:13:03 +08:00
Wagner Bruna	e3c6227d46	sd: report back image generation parameters and metadata (#2062 ) * sd: refactor image generation result handling * sd: report back image generation metadata	2026-03-28 00:49:03 +08:00
Concedo	4a5c903718	sd model model replacement logic: adjusted approach for easy merge	2026-03-26 21:57:42 +08:00
Concedo	efdc52fe8b	q3tts custom voice support	2026-03-24 23:38:18 +08:00
Concedo	0d50cafd8b	added CustomVoice support	2026-03-23 18:50:08 +08:00
Wagner Bruna	abe55fa424	sd: fix metadata for generated images (#2061 ) * sd: fix metadata for generated images * sd: refactor output image conversion	2026-03-23 17:04:32 +08:00
Alistair Stewart	5ff6cefce0	Fix music generation token stopping (#2057 ) * Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-23 17:02:14 +08:00
Wagner Bruna	592dedee28	sd: ensure previous generation results are cleaned up on all code paths (#2060 )	2026-03-22 23:18:09 +08:00
Concedo	3bda0bf102	passthrough mode without any gens	2026-03-22 23:09:08 +08:00
Concedo	f846c83a7a	pre-seed the tts so it can be shown	2026-03-22 10:36:42 +08:00
Concedo	fdfb713d91	added `--sdmaingpu` allowing image models to be independently placed on any gpu	2026-03-21 17:34:12 +08:00
Concedo	a3d3800f3e	added passthrough mode for esrgan upscale, triggered by img2img denoise 0.0 with 1 step	2026-03-21 16:19:10 +08:00
Wagner Bruna	51187d5362	sd: support changing preloaded LoRA multipliers (#2041 ) * sd: remove C++ support for enforcing fixed LoRA multipliers The logic at the Python level is enough. * sd: support changing preloaded LoRA multipliers We keep the same rules as before: - Any LoRA with multiplier 0 can be changed - If all LoRAs have multiplier != 0, they are fixed and optimized but tweak the corner case of LoRAs specified more than once to allow adjusting the multiplier if the same LoRA is also specified with a zero multiplier, as if they were two different LoRAs. So the following keeps working as before: - --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1 - --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0 - --sdlora /loras/ : dynamic, default 0 - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2 But now we have: - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1 - --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1	2026-03-17 10:09:55 +08:00
Wagner Bruna	6e7b9a1549	sd: sync to master-529-630ee03 (#2040 )	2026-03-17 00:23:28 +08:00
Wagner Bruna	feea014774	sd: support for dynamic LoRA loading from a directory (#2036 ) * backend support for controlling LoRA cache and fixed multipliers The generation LoRA multipliers are now added to the initial multipliers, so e.g. a merged LCM model will behave the same as a normal model with a preloaded LCM LoRA. * frontend support	2026-03-16 20:39:21 +08:00
Concedo	b88fc44d0e	add some debug prints	2026-03-16 16:27:49 +08:00
Concedo	2093ca4c73	ace step optimizations	2026-03-15 20:58:45 +08:00
Wagner Bruna	b437d18319	add support for cache modes to accelerate image generation (#2021 ) * sd: sync to master-525-d6dd6d7 * sd: add support for cache modes for inference acceleration * keep gendefaults as a JSON object inside the config file * covered more invalid cases on gendefaults parsing	2026-03-15 15:27:14 +08:00
Concedo	b1c500ae2b	Merge commit '`2948e6049a`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # docs/backend/VirtGPU/development.md # docs/ops.md # docs/ops/WebGPU.csv # embd_res/templates/GigaChat3-10B-A1.8B.jinja # embd_res/templates/GigaChat3.1-10B-A1.8B.jinja # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-quantize-fns.cpp	2026-03-15 11:21:24 +08:00
Concedo	22c78f6c82	fix q3tts compile, update docs and lite	2026-03-14 23:33:18 +08:00
Concedo	1d067933f0	claude fixes for ace step, idk man who am i to argue with an agi	2026-03-14 12:27:26 +08:00
Concedo	349fc744e9	cleanup, fixed a regression in music gen with codes due to instruct prompt change	2026-03-14 11:32:47 +08:00
Concedo	4189508ef3	qwen3tts support 1.7b model	2026-03-13 21:15:24 +08:00
Concedo	a13641c00c	tts loader fixes	2026-03-13 18:33:10 +08:00
Concedo	0a38237ff5	original qwen3tts files	2026-03-13 15:24:18 +08:00
Concedo	4427bab37e	cover mode is now working	2026-03-13 14:55:39 +08:00
Concedo	84734eb409	better audio runtime reload	2026-03-13 14:02:56 +08:00
Concedo	8f23b8d81e	wip on ref audio, but it compiles	2026-03-12 23:46:10 +08:00
Concedo	d5a4c17e14	mp3 not default	2026-03-12 21:42:59 +08:00
Concedo	3fd9648726	added mp3 support	2026-03-12 21:00:50 +08:00
Concedo	3092694d2e	better resampler	2026-03-12 16:49:53 +08:00
Concedo	318a5486ce	duration	2026-03-12 15:33:51 +08:00
Concedo	3cc6e2ea17	make stereo default	2026-03-12 00:10:25 +08:00
Concedo	211d4fe632	lots of tweaks for ace step	2026-03-11 23:57:52 +08:00
Concedo	ecc4865244	improves code output quality	2026-03-10 23:07:52 +08:00
Concedo	c8800ed16c	gcc path fix	2026-03-10 21:40:32 +08:00
Wagner Bruna	3f42ed1af7	support for customizing LoRA multipliers through the sdapi (#1982 ) * fix corner case in sd_oai_transform_params Also fix typo in the function name. * support for customizing loaded LoRA multipliers The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. If the list of multipliers is shorter than the list of LoRAs, the multiplier list is extended with the first multiplier (1.0 by default), to keep it compatible with the previous behavior. * support for `<lora:name:multiplier>` prompt syntax and metadata * add a few tests for sanitize_lora_multipliers	2026-03-10 21:29:39 +08:00
Concedo	ee96e71bae	don't resample audio	2026-03-09 22:53:55 +08:00
Concedo	45c74da08b	adjust ace step, still wip on caption rework	2026-03-09 00:11:48 +08:00
Wagner Bruna	9158bd8b4d	sd: sync to master-520-d950627 (#2006 ) * sd: sync to master-509-4cdfff5 * sd: Anima support * sd: sync to master-514-5792c66 * sd: additional workaround for Anima .safetensors model * sd: sync to master-517-ba35dd7 * sd: sync to master-520-d950627	2026-03-08 01:23:03 +08:00
Concedo	ebe44e7819	modify q3tts loader	2026-03-08 00:53:33 +08:00
JustCommitRandomness	2fbc3b2ae5	Adjust int types in format strings (#2009 ) * tweak format sting types This may not be all of them, but it's the ones which warn on OpenBSD * complete the changes needed to fix the format string specifers * avoid using inttypes, directly cast to size_t (u64 usually) instead --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-03-06 19:06:18 +08:00
JustCommitRandomness	389773070f	OpenBSD also needs alloca.h (#2012 )	2026-03-05 12:32:31 +08:00
Concedo	8658af1018	qwen3tts default to cpu unless gpu selected	2026-03-05 11:11:46 +08:00
Concedo	4f1b22c415	kv snapshots save and load last logits for correctness. added some text for musicui, updated docs	2026-03-04 21:57:28 +08:00
Concedo	707f7b37bf	optimize pp	2026-03-03 21:02:51 +08:00
Concedo	ae67caa2f7	ace qwen rep pen for codes	2026-03-02 21:18:06 +08:00
Concedo	de9840afac	qwen image max ref image size fix from 512x512 to 1024x1024	2026-03-02 21:08:52 +08:00
Concedo	b632d2ce1c	print timestamp when image generated	2026-03-02 18:38:21 +08:00

1 2 3 4 5 ...

634 commits