koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 19:47:49 +00:00

Author	SHA1	Message	Date
Wagner Bruna	f85a747dc0	sd: add backend support for max_vram (#2221 )	2026-05-21 11:51:00 +08:00
Wagner Bruna	592d12d0a3	sd: support for CLIP and VAE on different devices (#2184 ) * sd: generalize internal interfaces to place generation on CPU * sd: backend support for multi-device selection * sd: frontend support for multi-device selection * add deprecated flags to avoid breaking old cli args --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-19 21:51:23 +08:00
Concedo	712ee6be64	try fix recent segfault on SIGINT https://github.com/LostRuins/koboldcpp/issues/2215	2026-05-18 22:37:14 +08:00
Wagner Bruna	90326f8585	sd: sync to master-612-d7ecbe1 (#2213 )	2026-05-18 21:19:12 +08:00
Wagner Bruna	1ae9a79ecc	sd: sync to master-607-fd1a279 (#2212 )	2026-05-17 11:37:16 +08:00
Wagner Bruna	f273fd35b9	sd: sync to master-601-eeac950 (#2206 ) * sd: sync to master-601-eeac950 * sd: add mmap support	2026-05-16 11:23:10 +08:00
Concedo	79666e5764	revert sdcpp build steps to use makefile and cmake without external txt files	2026-05-16 00:53:56 +08:00
Wagner Bruna	bfe9548fd5	sd: sync to master-596-90e87bc (#2204 ) * sd: reuse source lists between make and cmake * sd: sync to master-596-90e87bc * Update source file path for sdtype_adapter.cpp --------- Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-14 23:14:33 +08:00
Wagner Bruna	243b03586b	sd: build each source file separately (#2188 ) * sd: build source files separately * sd: decouple stable-diffusion.cpp and sdtype_adapter.cpp * sd: remove include util.h from sdtype_adapter.cpp * sd: update source file lists and review dependencies	2026-05-07 22:50:10 +08:00
Concedo	4a8a51a3a7	updated sdui, increase ace step music vae chunk size	2026-05-04 15:30:45 +08:00
Concedo	950676fdb7	split utils.cpp into 2 files to support sd.cpp	2026-05-04 15:04:12 +08:00
Wagner Bruna	276c651a12	sd: sync to master-593-3d6064b (#2175 ) * sd: sync to master-593-3d6064b * sd: use the same sdtype_adapter object for all builds Since master-592-b8079e2, no sd.cpp source depends on the ggml backend build anymore. * sd: fix main_gpu selection * sd: report backend devices to the Python layer	2026-05-04 14:05:34 +08:00
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Wagner Bruna	e2bdd6d7aa	sd: sync to master-591-331cfa5 (#2155 ) * sd: sync to master-585-44cca3d * sd: sync to master-587-b8bdffc * sd: sync to master-591-331cfa5	2026-05-01 16:33:28 +08:00
Concedo	eca9f4c1df	use original precision for q3tts	2026-04-30 17:28:11 +08:00
Concedo	2741d7e7bd	switch back from ulaw to wav16	2026-04-30 17:27:55 +08:00
Wagner Bruna	c04832bb2b	sd: add eta support (#2164 )	2026-04-25 19:04:13 +08:00
Concedo	2cde0bffd2	minor text edit	2026-04-23 20:06:04 +08:00
Wagner Bruna	bad9b61064	sd: sync to master-582-7023fc4 (#2150 ) * sd: remove sampler alias handling from the C++ layer It's already handled at the Python layer. * sd: sync to master-580-7d33d4b * sd: sync to master-582-7023fc4	2026-04-21 23:01:33 +08:00
Concedo	271c4c332c	hack to allow kokoro to remain functional even with much higher GGML_SCHED_MAX_SPLIT_INPUTS	2026-04-19 20:40:07 +08:00
Concedo	afaf3b960e	try to make kokoro take less graph size	2026-04-19 19:00:35 +08:00
Wagner Bruna	1be08b9d15	sd: report all sampler aliases and centralize name mapping (#2149 ) * debug: allow loading backend libraries without normal arg parsing This is just to be able to test backend functions directly, with e.g.: >> import koboldcpp >> koboldcpp.init_libraries() >> koboldcpp.sd_get_info() * sd: report all sampler aliases and centralize name mapping	2026-04-19 01:51:42 +08:00
Concedo	0b37cb9a57	added preliminary support for reasoning budget	2026-04-18 11:56:33 +08:00
Concedo	ae292c496e	handle SWA conflicting with rewind, increased default SWA padding.	2026-04-16 17:00:26 +08:00
Concedo	535df844dd	touchup for min/max tokens ui	2026-04-16 14:56:22 +08:00
Llama	c592bd01da	Pass img_min_params and img_max_params to ctx_clip_params (#2133 ) * Pass img_min_params and img_max_params to ctx_clip_params These values determine the minimum and maximum size (in tokens) of vision embeddings. The default value of -1 uses a model-dependent default size, for example for Gemma 4 the default is a 280 token embedding. For higher quality results (at the cost of using more memory and slower speed) you can increase the size of the embedding to 1120 tokens. * Change dict to mydict to match change to method	2026-04-16 12:27:06 +08:00
Concedo	d9724a4caa	kcpp musicgen - disable flash attention as its not stable on vulkan. due to optimizations should still fit in 6gb in lowvram.	2026-04-12 18:28:30 +08:00
Concedo	7bf7b0aefc	optimize lowvram for music	2026-04-12 18:17:08 +08:00
Concedo	ad6eaffd3c	updated docs, adjusted acestep threads	2026-04-09 22:33:30 +08:00
Concedo	5529748a01	Merge commit '`de1aa6fa73`' into concedo_experimental # Conflicts: # docs/build.md # docs/ops.md # docs/ops/WebGPU.csv # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2026-04-09 17:16:33 +08:00
Wagner Bruna	f371bb14d4	sd: sync to master-560-e8323ca (#2082 ) * sd: sync to master-540-f16a110 * tae post-merge fixes * build fixes * restore image mask for non-inpainting models * sd: sync to master-551-99c1de3 * avoid nlohmann/json.hpp include diffs * Euler A now works on Flux * sd: sync to master-555-7397dda avi_writer.h got removed upstream, but I've simply kept the local copy for now. * sd: sync to master-558-8afbeb6 * sd: sync to master-560-e8323ca	2026-04-09 14:44:59 +08:00
Concedo	6aa49b91b1	fixed acestep bad on vulkan	2026-04-08 22:22:07 +08:00
Concedo	9b02806191	updated acestep convert	2026-04-08 18:39:28 +08:00
Concedo	355f75769e	acestep xl now loads and works!	2026-04-08 18:36:18 +08:00
Concedo	4b478b70fa	ace step xl tentative changes (not yet working)	2026-04-08 18:00:39 +08:00
Concedo	a8841063b5	adjust q3tts chunking	2026-04-07 22:55:02 +08:00
Concedo	a1fc912452	try don't trigger the magic if input is following the jinja template.exactly for thinking models (+1 squashed commits) Squashed commits: [5542e81dc] try don't trigger the magic if input is following the jinja template.	2026-04-07 21:47:38 +08:00
Concedo	c8d6546a14	experimental q3tts batch test	2026-04-07 20:49:46 +08:00
Wagner Bruna	9223f41320	sd: call SetCircularAxesAll directly (#2078 )	2026-03-29 01:17:48 +08:00
Concedo	2cdf02102e	preserve previous filename	2026-03-28 01:13:03 +08:00
Wagner Bruna	e3c6227d46	sd: report back image generation parameters and metadata (#2062 ) * sd: refactor image generation result handling * sd: report back image generation metadata	2026-03-28 00:49:03 +08:00
Concedo	4a5c903718	sd model model replacement logic: adjusted approach for easy merge	2026-03-26 21:57:42 +08:00
Concedo	efdc52fe8b	q3tts custom voice support	2026-03-24 23:38:18 +08:00
Concedo	0d50cafd8b	added CustomVoice support	2026-03-23 18:50:08 +08:00
Wagner Bruna	abe55fa424	sd: fix metadata for generated images (#2061 ) * sd: fix metadata for generated images * sd: refactor output image conversion	2026-03-23 17:04:32 +08:00
Alistair Stewart	5ff6cefce0	Fix music generation token stopping (#2057 ) * Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-23 17:02:14 +08:00
Wagner Bruna	592dedee28	sd: ensure previous generation results are cleaned up on all code paths (#2060 )	2026-03-22 23:18:09 +08:00
Concedo	3bda0bf102	passthrough mode without any gens	2026-03-22 23:09:08 +08:00
Concedo	f846c83a7a	pre-seed the tts so it can be shown	2026-03-22 10:36:42 +08:00
Concedo	fdfb713d91	added `--sdmaingpu` allowing image models to be independently placed on any gpu	2026-03-21 17:34:12 +08:00

1 2 3 4 5 ...

672 commits