koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 18:30:50 +00:00

Author	SHA1	Message	Date
Concedo	a1305ffff9	still not working	2026-02-26 10:48:21 +08:00
Concedo	5c5fe55f7d	bump kv overrides max (+1 squashed commits) Squashed commits: [9bc8212a0] bump kv overrides max	2026-02-26 00:24:53 +08:00
Concedo	d8746a851f	still bugged	2026-02-26 00:07:04 +08:00
Concedo	8a3ccfcba5	some fixes but some issues	2026-02-25 23:41:32 +08:00
Concedo	0eafc3cf2d	ace step lowvram mode done, improved	2026-02-24 23:12:26 +08:00
Concedo	11a85d62fc	lowvram for music lm	2026-02-24 22:21:17 +08:00
Concedo	aa58d1ed3b	all working, but needs to optimize vram	2026-02-24 21:55:57 +08:00
Concedo	488c431331	not yet working	2026-02-24 17:47:50 +08:00
Concedo	0fd7d2c0e5	ace step diffusion loading	2026-02-24 15:24:15 +08:00
Concedo	749536f464	fixed wav header wrong size	2026-02-24 01:13:44 +08:00
askmyteapot	062e361968	Update ace-qwen3.cpp to build on MSVC (#1992 ) need to include <sstream> otherwise build fails with lots of the below errors: ``` C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2297: '<<': not valid as right operand has type 'const cha r [26]' [C:\koboldcpp\build\music_adapter.vcxproj] (compiling source file '../otherarch/acestep/music_adapter.cpp') C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2679: binary '<<': no operator found which takes a right-h and operand of type 'std::string' (or there is no acceptable conversion) [C:\koboldcpp\build\music_adapter.vcxproj] (compiling source file '../otherarch/acestep/music_adapter.cpp') C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717\include\__msvc_int128.hpp( 753,46): could be 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept' [found us ing argument-dependent lookup] C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept': cannot conver t argument 2 from 'std::string' to 'const std::_Base128 &' C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57): Reason: cannot convert from 'std::string' to 'const std::_Base128' C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57): No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called ```	2026-02-23 23:03:07 +08:00
Concedo	5311997581	updated ace step cpp	2026-02-23 23:01:10 +08:00
Concedo	2e713cfff5	fixed compile issue, trying out 8bit pcm	2026-02-23 21:19:03 +08:00
Wagner Bruna	a6c0a224b2	sd: sync to master-506-c9cd497 (#1991 )	2026-02-23 17:35:59 +08:00
Concedo	06c0ffaead	with am17an fix for henk to test	2026-02-23 17:30:19 +08:00
Concedo	c2b0cb26a8	ace step codes api	2026-02-23 14:04:45 +08:00
Concedo	d100c8660e	added Tlacuilo	2026-02-23 10:48:56 +08:00
Concedo	4be93db21c	ace step codes generation now working	2026-02-23 00:27:26 +08:00
Concedo	71d42fae85	Revert "Revert "Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645 )""" This reverts commit `edc04f3f7d`.	2026-02-22 23:18:53 +08:00
Concedo	13db5aee9e	stub files for loading ace step	2026-02-22 23:15:08 +08:00
Concedo	37ae068dee	set default to GPU test	2026-02-22 17:03:43 +08:00
Concedo	fdf868f397	add ace step cpp license info	2026-02-22 13:24:28 +08:00
Concedo	5cd6e50eab	initial files for ace step	2026-02-22 13:22:24 +08:00
Concedo	ac70ca35dd	preliminary patches for acestep.cpp	2026-02-22 12:50:08 +08:00
Wagner Bruna	19588f18ea	sd: relax size restrictions for DiT models (#1986 ) Round image dimensions to the specific multiple required by each DiT model, which range from 32 (certain Wan models) to 1 (Chroma Radiance), with most requiring multiples of 8 or 16. Unet models keep being rounded to multiples of 64. Current sd.cpp rounds the sizes internally; but it always rounds up, so we still need to round on our side to apply image size restrictions, and to trigger VAE tiling correctly. Also, remove a legacy test that could abort a generation with unsupported image sizes: it'd never run, because it was applied after the image side adjustements.	2026-02-22 11:00:10 +08:00
Concedo	0a87f5501e	updated sdui, fix img imports	2026-02-22 10:49:55 +08:00
Concedo	73f3ffaeb7	fix followup tool call check with assistant prefills	2026-02-22 10:33:00 +08:00
Concedo	edc04f3f7d	Revert "Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645 )"" This reverts commit `131e3cb17a`.	2026-02-22 09:33:25 +08:00
Concedo	d06700687f	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/rocm.Dockerfile # .github/workflows/release.yml # CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # scripts/sync_vendor.py # tests/test-chat.cpp	2026-02-22 09:33:13 +08:00
Mario Limonciello	35715657cb	Update ROCm docker container to 7.2 release (#19418 ) Some checks failed Python Type-Check / pyright type-check (push) Has been cancelled Details Also update architectures	2026-02-21 21:53:39 +01:00
Mario Limonciello	f75c4e8bf5	Add a build target to generate ROCm artifacts using ROCm 7.2 (#19433 ) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942	2026-02-21 19:56:26 +01:00
Concedo	78b4b87e54	fixed compile issue for tts on ci (+1 squashed commits) Squashed commits: [d6f778499] fixed compile issue for tts on ci	2026-02-22 02:28:11 +08:00
Adrien Gallouët	99156f3a5f	vendor : update cpp-httplib to 0.33.1 (#19778 ) Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>	2026-02-21 19:12:31 +01:00
Concedo	7068a74998	tts upstream bugfix	2026-02-22 00:46:03 +08:00
Concedo	313d37a602	cache used voices	2026-02-22 00:43:57 +08:00
Concedo	5536fb29f2	add some default voices for qwen3tts	2026-02-21 23:45:15 +08:00
Gaurav Garg	a0c91e8f9f	Improve CUDA graph capture (#19754 ) * Improve CUDA graph capture Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because: - The first call always incurs CUDA graph capture overhead even if the graph is unstable - Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode) The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize. This also fixes issues such as https://github.com/ggml-org/llama.cpp/discussions/19708 * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Remove EM dashes * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2026-02-21 15:09:36 +05:30
Concedo	2db018a1d7	qwen3tts support reference audio	2026-02-21 17:30:21 +08:00
crsawyer	07968d53e4	fix: UI single model selection in router mode (#19767 )	2026-02-21 09:28:39 +01:00
Concedo	72219fdbf5	basic qwen3 tts working	2026-02-21 12:03:53 +08:00
Concedo	1af7095cb5	add qwen3 tts repo files	2026-02-21 10:54:55 +08:00
Concedo	ad0618e351	bump defaults, updated lite, fixed glm4.7 autoguess template	2026-02-21 08:51:53 +08:00
Mengsheng Wu	ba3b9c8844	hexagon : fix build release (#19444 ) (#19587 )	2026-02-20 16:40:00 -08:00
Aldehir Rojas	94b0200a01	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 ) * common : migrate qwen3-coder to PEG parsing variant * cont : add JSON parameter test	2026-02-20 23:22:22 +01:00
Concedo	131e3cb17a	Revert "cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645 )" This reverts commit `ad8207af77`.	2026-02-20 21:34:17 +08:00
Concedo	81065fd801	fix ci build error	2026-02-20 21:32:07 +08:00
Taimur Ahmad	b908baf182	ggml-cpu: add RVV vec dot kernels for quantization types (#18784 ) * ggml-cpu: add rvv vec_dot for iq2_s Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add rvv vec_dot for iq3_s Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add rvv vec_dot for tq1_0, tq2_0 Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> ggml-cpu: add rvv vec_dot for tq1_0, tq2_0 * ggml-cpu: add rvv vec_dot for iq1_s, iq1_m Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add vlen switch for rvv vec_dot --------- Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2026-02-20 13:30:07 +02:00
ddh0	492bc31978	quantize : add --dry-run option (#19526 ) * clean slate for branch * use 6 characters for tensor dims * add --dry-run to llama-quantize * use 6 characters for tensor dims (cont.) * no need to re-calculate ggml_nbytes for tensor * fix indent * show model and quant BPW when quant completes * add example to --help * new function `tensor_requires_imatrix`, add courtesy warning about imatrix * missing __func__, move imatrix flag set * logic error * fixup tensor_requires_imatrix * add missing `GGML_TYPE`s * simplify and rename `tensor_type_requires_imatrix` * simplify for style * add back Q2_K edge case for imatrix * guard ftype imatrix warning * comment ref #12557 * remove per @compilade * remove unused `params` parameter * move `bool dry_run` per GG * move `bool dry_run` per GG * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-20 09:20:16 +01:00
Concedo	e626de2430	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # docs/ops/WebGPU.csv # embd_res/templates/stepfun-ai-Step-3.5-Flash.jinja # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/mtmd/CMakeLists.txt	2026-02-20 15:16:26 +08:00
Concedo	07c45ced56	Merge commit '`c78e682245`' into concedo_experimental # Conflicts: # src/models/qwen35.cpp # src/models/qwen35moe.cpp	2026-02-20 14:41:32 +08:00

1 2 3 4 5 ...

11821 commits