koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

Author	SHA1	Message	Date
Concedo	950676fdb7	split utils.cpp into 2 files to support sd.cpp	2026-05-04 15:04:12 +08:00
Wagner Bruna	276c651a12	sd: sync to master-593-3d6064b (#2175 ) * sd: sync to master-593-3d6064b * sd: use the same sdtype_adapter object for all builds Since master-592-b8079e2, no sd.cpp source depends on the ggml backend build anymore. * sd: fix main_gpu selection * sd: report backend devices to the Python layer	2026-05-04 14:05:34 +08:00
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Wagner Bruna	e2bdd6d7aa	sd: sync to master-591-331cfa5 (#2155 ) * sd: sync to master-585-44cca3d * sd: sync to master-587-b8bdffc * sd: sync to master-591-331cfa5	2026-05-01 16:33:28 +08:00
Concedo	eca9f4c1df	use original precision for q3tts	2026-04-30 17:28:11 +08:00
Concedo	2741d7e7bd	switch back from ulaw to wav16	2026-04-30 17:27:55 +08:00
Wagner Bruna	c04832bb2b	sd: add eta support (#2164 )	2026-04-25 19:04:13 +08:00
Concedo	2cde0bffd2	minor text edit	2026-04-23 20:06:04 +08:00
Wagner Bruna	bad9b61064	sd: sync to master-582-7023fc4 (#2150 ) * sd: remove sampler alias handling from the C++ layer It's already handled at the Python layer. * sd: sync to master-580-7d33d4b * sd: sync to master-582-7023fc4	2026-04-21 23:01:33 +08:00
Concedo	271c4c332c	hack to allow kokoro to remain functional even with much higher GGML_SCHED_MAX_SPLIT_INPUTS	2026-04-19 20:40:07 +08:00
Concedo	afaf3b960e	try to make kokoro take less graph size	2026-04-19 19:00:35 +08:00
Wagner Bruna	1be08b9d15	sd: report all sampler aliases and centralize name mapping (#2149 ) * debug: allow loading backend libraries without normal arg parsing This is just to be able to test backend functions directly, with e.g.: >> import koboldcpp >> koboldcpp.init_libraries() >> koboldcpp.sd_get_info() * sd: report all sampler aliases and centralize name mapping	2026-04-19 01:51:42 +08:00
Concedo	0b37cb9a57	added preliminary support for reasoning budget	2026-04-18 11:56:33 +08:00
Concedo	ae292c496e	handle SWA conflicting with rewind, increased default SWA padding.	2026-04-16 17:00:26 +08:00
Concedo	535df844dd	touchup for min/max tokens ui	2026-04-16 14:56:22 +08:00
Llama	c592bd01da	Pass img_min_params and img_max_params to ctx_clip_params (#2133 ) * Pass img_min_params and img_max_params to ctx_clip_params These values determine the minimum and maximum size (in tokens) of vision embeddings. The default value of -1 uses a model-dependent default size, for example for Gemma 4 the default is a 280 token embedding. For higher quality results (at the cost of using more memory and slower speed) you can increase the size of the embedding to 1120 tokens. * Change dict to mydict to match change to method	2026-04-16 12:27:06 +08:00
Concedo	d9724a4caa	kcpp musicgen - disable flash attention as its not stable on vulkan. due to optimizations should still fit in 6gb in lowvram.	2026-04-12 18:28:30 +08:00
Concedo	7bf7b0aefc	optimize lowvram for music	2026-04-12 18:17:08 +08:00
Concedo	ad6eaffd3c	updated docs, adjusted acestep threads	2026-04-09 22:33:30 +08:00
Concedo	5529748a01	Merge commit '`de1aa6fa73`' into concedo_experimental # Conflicts: # docs/build.md # docs/ops.md # docs/ops/WebGPU.csv # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2026-04-09 17:16:33 +08:00
Wagner Bruna	f371bb14d4	sd: sync to master-560-e8323ca (#2082 ) * sd: sync to master-540-f16a110 * tae post-merge fixes * build fixes * restore image mask for non-inpainting models * sd: sync to master-551-99c1de3 * avoid nlohmann/json.hpp include diffs * Euler A now works on Flux * sd: sync to master-555-7397dda avi_writer.h got removed upstream, but I've simply kept the local copy for now. * sd: sync to master-558-8afbeb6 * sd: sync to master-560-e8323ca	2026-04-09 14:44:59 +08:00
Concedo	6aa49b91b1	fixed acestep bad on vulkan	2026-04-08 22:22:07 +08:00
Concedo	9b02806191	updated acestep convert	2026-04-08 18:39:28 +08:00
Concedo	355f75769e	acestep xl now loads and works!	2026-04-08 18:36:18 +08:00
Concedo	4b478b70fa	ace step xl tentative changes (not yet working)	2026-04-08 18:00:39 +08:00
Concedo	a8841063b5	adjust q3tts chunking	2026-04-07 22:55:02 +08:00
Concedo	a1fc912452	try don't trigger the magic if input is following the jinja template.exactly for thinking models (+1 squashed commits) Squashed commits: [5542e81dc] try don't trigger the magic if input is following the jinja template.	2026-04-07 21:47:38 +08:00
Concedo	c8d6546a14	experimental q3tts batch test	2026-04-07 20:49:46 +08:00
Wagner Bruna	9223f41320	sd: call SetCircularAxesAll directly (#2078 )	2026-03-29 01:17:48 +08:00
Concedo	2cdf02102e	preserve previous filename	2026-03-28 01:13:03 +08:00
Wagner Bruna	e3c6227d46	sd: report back image generation parameters and metadata (#2062 ) * sd: refactor image generation result handling * sd: report back image generation metadata	2026-03-28 00:49:03 +08:00
Concedo	4a5c903718	sd model model replacement logic: adjusted approach for easy merge	2026-03-26 21:57:42 +08:00
Concedo	efdc52fe8b	q3tts custom voice support	2026-03-24 23:38:18 +08:00
Concedo	0d50cafd8b	added CustomVoice support	2026-03-23 18:50:08 +08:00
Wagner Bruna	abe55fa424	sd: fix metadata for generated images (#2061 ) * sd: fix metadata for generated images * sd: refactor output image conversion	2026-03-23 17:04:32 +08:00
Alistair Stewart	5ff6cefce0	Fix music generation token stopping (#2057 ) * Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-23 17:02:14 +08:00
Wagner Bruna	592dedee28	sd: ensure previous generation results are cleaned up on all code paths (#2060 )	2026-03-22 23:18:09 +08:00
Concedo	3bda0bf102	passthrough mode without any gens	2026-03-22 23:09:08 +08:00
Concedo	f846c83a7a	pre-seed the tts so it can be shown	2026-03-22 10:36:42 +08:00
Concedo	fdfb713d91	added `--sdmaingpu` allowing image models to be independently placed on any gpu	2026-03-21 17:34:12 +08:00
Concedo	a3d3800f3e	added passthrough mode for esrgan upscale, triggered by img2img denoise 0.0 with 1 step	2026-03-21 16:19:10 +08:00
Wagner Bruna	51187d5362	sd: support changing preloaded LoRA multipliers (#2041 ) * sd: remove C++ support for enforcing fixed LoRA multipliers The logic at the Python level is enough. * sd: support changing preloaded LoRA multipliers We keep the same rules as before: - Any LoRA with multiplier 0 can be changed - If all LoRAs have multiplier != 0, they are fixed and optimized but tweak the corner case of LoRAs specified more than once to allow adjusting the multiplier if the same LoRA is also specified with a zero multiplier, as if they were two different LoRAs. So the following keeps working as before: - --sdlora /loras/lcm.gguf --sdloramult 1 : fixed as 1 - --sdlora /loras/lcm.gguf --sdloramult 0 : dynamic, default 0 - --sdlora /loras/ : dynamic, default 0 - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 1 : fixed as 2 But now we have: - --sdlora /loras/lcm.gguf /loras/lcm.gguf --sdloramult 1 0 : dynamic, default 1 - --sdlora /loras/lcm.gguf /loras/ --sdloramult 1 : dynamic, default 1	2026-03-17 10:09:55 +08:00
Wagner Bruna	6e7b9a1549	sd: sync to master-529-630ee03 (#2040 )	2026-03-17 00:23:28 +08:00
Wagner Bruna	feea014774	sd: support for dynamic LoRA loading from a directory (#2036 ) * backend support for controlling LoRA cache and fixed multipliers The generation LoRA multipliers are now added to the initial multipliers, so e.g. a merged LCM model will behave the same as a normal model with a preloaded LCM LoRA. * frontend support	2026-03-16 20:39:21 +08:00
Concedo	b88fc44d0e	add some debug prints	2026-03-16 16:27:49 +08:00
Concedo	2093ca4c73	ace step optimizations	2026-03-15 20:58:45 +08:00
Wagner Bruna	b437d18319	add support for cache modes to accelerate image generation (#2021 ) * sd: sync to master-525-d6dd6d7 * sd: add support for cache modes for inference acceleration * keep gendefaults as a JSON object inside the config file * covered more invalid cases on gendefaults parsing	2026-03-15 15:27:14 +08:00
Concedo	b1c500ae2b	Merge commit '`2948e6049a`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # docs/backend/VirtGPU/development.md # docs/ops.md # docs/ops/WebGPU.csv # embd_res/templates/GigaChat3-10B-A1.8B.jinja # embd_res/templates/GigaChat3.1-10B-A1.8B.jinja # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-quantize-fns.cpp	2026-03-15 11:21:24 +08:00
Concedo	22c78f6c82	fix q3tts compile, update docs and lite	2026-03-14 23:33:18 +08:00
Concedo	1d067933f0	claude fixes for ace step, idk man who am i to argue with an agi	2026-03-14 12:27:26 +08:00

1 2 3 4 5 ...

662 commits