koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-19 16:31:59 +00:00

Author	SHA1	Message	Date
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Concedo	d9724a4caa	kcpp musicgen - disable flash attention as its not stable on vulkan. due to optimizations should still fit in 6gb in lowvram.	2026-04-12 18:28:30 +08:00
Concedo	7bf7b0aefc	optimize lowvram for music	2026-04-12 18:17:08 +08:00
Concedo	ad6eaffd3c	updated docs, adjusted acestep threads	2026-04-09 22:33:30 +08:00
Concedo	6aa49b91b1	fixed acestep bad on vulkan	2026-04-08 22:22:07 +08:00
Concedo	9b02806191	updated acestep convert	2026-04-08 18:39:28 +08:00
Concedo	355f75769e	acestep xl now loads and works!	2026-04-08 18:36:18 +08:00
Concedo	4b478b70fa	ace step xl tentative changes (not yet working)	2026-04-08 18:00:39 +08:00
Alistair Stewart	5ff6cefce0	Fix music generation token stopping (#2057 ) * Fix music generation token stopping for quantized models In Phase 1 lyrics mode, the FSM transitions to CODES state after TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was not efficiently generating TOKEN_IM_END to stop the generation, causing it to continue until hitting the 8192 token limit. This fix forces TOKEN_IM_END to be generated immediately after TOKEN_THINK_END in lyrics mode, ensuring clean completion of the planning phase without excessive token generation. Testing shows generation now completes in ~500ms instead of 80+ seconds with timeout errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clarify comment - fix applies to all models, not just quantized 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve fix: only force TOKEN_IM_END at token limit Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END, only force it when we've reached the token limit. This allows the model to generate lyrics after the thinking block while still preventing KV cache exhaustion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-23 17:02:14 +08:00
Concedo	b88fc44d0e	add some debug prints	2026-03-16 16:27:49 +08:00
Concedo	2093ca4c73	ace step optimizations	2026-03-15 20:58:45 +08:00
Concedo	1d067933f0	claude fixes for ace step, idk man who am i to argue with an agi	2026-03-14 12:27:26 +08:00
Concedo	349fc744e9	cleanup, fixed a regression in music gen with codes due to instruct prompt change	2026-03-14 11:32:47 +08:00
Concedo	4427bab37e	cover mode is now working	2026-03-13 14:55:39 +08:00
Concedo	84734eb409	better audio runtime reload	2026-03-13 14:02:56 +08:00
Concedo	8f23b8d81e	wip on ref audio, but it compiles	2026-03-12 23:46:10 +08:00
Concedo	d5a4c17e14	mp3 not default	2026-03-12 21:42:59 +08:00
Concedo	3fd9648726	added mp3 support	2026-03-12 21:00:50 +08:00
Concedo	3092694d2e	better resampler	2026-03-12 16:49:53 +08:00
Concedo	318a5486ce	duration	2026-03-12 15:33:51 +08:00
Concedo	3cc6e2ea17	make stereo default	2026-03-12 00:10:25 +08:00
Concedo	211d4fe632	lots of tweaks for ace step	2026-03-11 23:57:52 +08:00
Concedo	ecc4865244	improves code output quality	2026-03-10 23:07:52 +08:00
Concedo	ee96e71bae	don't resample audio	2026-03-09 22:53:55 +08:00
Concedo	45c74da08b	adjust ace step, still wip on caption rework	2026-03-09 00:11:48 +08:00
Concedo	ae67caa2f7	ace qwen rep pen for codes	2026-03-02 21:18:06 +08:00
Concedo	42134db6b4	finally fixed smartcache for qwen	2026-03-02 00:47:38 +08:00
Concedo	6c5a7a27af	clamp music duration	2026-03-01 01:15:26 +08:00
Concedo	d643d945f5	clamp music inference steps to 100 max	2026-02-28 12:12:50 +08:00
Concedo	14d82bb38e	allow music llm and diffusion gen models to be loaded independently	2026-02-27 21:56:48 +08:00
Concedo	19eb78844c	audio codes working	2026-02-27 21:23:00 +08:00
Concedo	ba42f22fc8	stereo is working	2026-02-27 20:36:44 +08:00
Concedo	5a57ed8ca4	revert to 8 step	2026-02-26 22:07:01 +08:00
Concedo	173702d1a4	music lowvram indicator	2026-02-26 21:30:47 +08:00
Concedo	adebf63877	ace converter	2026-02-26 19:53:02 +08:00
Concedo	ac8f12f259	still a bit wonky	2026-02-26 17:50:49 +08:00
Concedo	81fb4d773c	swap resampling function	2026-02-26 17:37:53 +08:00
Concedo	fb3f7d92bc	reenable cfg	2026-02-26 14:51:15 +08:00
Concedo	b7d2fe68e7	adjust	2026-02-26 14:46:41 +08:00
Concedo	edbc4fe592	music lm finally working	2026-02-26 14:00:58 +08:00
Concedo	cf042af701	Revert "still not working" This reverts commit `a1305ffff9`.	2026-02-26 10:55:55 +08:00
Concedo	a1305ffff9	still not working	2026-02-26 10:48:21 +08:00
Concedo	d8746a851f	still bugged	2026-02-26 00:07:04 +08:00
Concedo	8a3ccfcba5	some fixes but some issues	2026-02-25 23:41:32 +08:00
Concedo	0eafc3cf2d	ace step lowvram mode done, improved	2026-02-24 23:12:26 +08:00
Concedo	11a85d62fc	lowvram for music lm	2026-02-24 22:21:17 +08:00
Concedo	aa58d1ed3b	all working, but needs to optimize vram	2026-02-24 21:55:57 +08:00
Concedo	488c431331	not yet working	2026-02-24 17:47:50 +08:00
Concedo	0fd7d2c0e5	ace step diffusion loading	2026-02-24 15:24:15 +08:00
Concedo	5311997581	updated ace step cpp	2026-02-23 23:01:10 +08:00

1 2

56 commits