koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 04:00:53 +00:00

Author	SHA1	Message	Date
Concedo	22c78f6c82	fix q3tts compile, update docs and lite	2026-03-14 23:33:18 +08:00
Concedo	1802b09e6f	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # docs/ops.md # docs/ops/CPU.csv # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-cpu/repack.h # src/llama-quant.cpp # tests/test-json-schema-to-grammar.cpp	2026-03-14 17:56:16 +08:00
Concedo	ff3f8533d3	Merge commit '`c96f608d98`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # docs/ops.md # docs/ops/Vulkan.csv # models/templates/LFM2-8B-A1B.jinja # tests/peg-parser/test-python-dict-parser.cpp # tests/peg-parser/test-unicode.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2026-03-14 17:14:34 +08:00
Concedo	8b9594b6ea	wip router mode	2026-03-14 17:07:05 +08:00
Concedo	1d067933f0	claude fixes for ace step, idk man who am i to argue with an agi	2026-03-14 12:27:26 +08:00
Concedo	349fc744e9	cleanup, fixed a regression in music gen with codes due to instruct prompt change	2026-03-14 11:32:47 +08:00
Concedo	6143a75426	improve autofit padding heuristics	2026-03-14 00:36:52 +08:00
Concedo	04915d99ee	Merge commit '`451ef08432`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/ops.md # docs/ops/Vulkan.csv # src/llama-model-loader.cpp # src/llama-model.cpp # src/llama.cpp # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/test-json-parser.cpp # tests/peg-parser/test-python-dict-parser.cpp # tests/peg-parser/test-unicode.cpp # tests/test-chat-auto-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/CMakeLists.txt	2026-03-13 23:33:37 +08:00
Concedo	d2c911884d	Merge commit '`213c4a0b81`' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # common/chat-peg-parser.cpp # common/chat.cpp # docs/backend/SYCL.md # docs/development/parsing.md # docs/ops.md # docs/ops/SYCL.csv # embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja # embd_res/templates/Bielik-11B-v3.0-Instruct.jinja # embd_res/templates/GLM-4.7-Flash.jinja # embd_res/templates/LFM2-8B-A1B.jinja # embd_res/templates/StepFun3.5-Flash.jinja # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/count-equal.cpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/presets.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/vecdotq.hpp # models/templates/Apertus-8B-Instruct.jinja # models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja # models/templates/Qwen-QwQ-32B.jinja # models/templates/Qwen3-Coder.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # models/templates/deepseek-ai-DeepSeek-V3.1.jinja # models/templates/fireworks-ai-llama-3-firefunction-v2.jinja # models/templates/moonshotai-Kimi-K2.jinja # models/templates/unsloth-Apriel-1.5.jinja # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/tests.h # tests/test-backend-ops.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-peg-parser.cpp # tools/CMakeLists.txt # tools/cli/cli.cpp	2026-03-13 21:35:56 +08:00
Concedo	4189508ef3	qwen3tts support 1.7b model	2026-03-13 21:15:24 +08:00
Concedo	a13641c00c	tts loader fixes	2026-03-13 18:33:10 +08:00
Concedo	0a38237ff5	original qwen3tts files	2026-03-13 15:24:18 +08:00
Concedo	4427bab37e	cover mode is now working	2026-03-13 14:55:39 +08:00
Concedo	84734eb409	better audio runtime reload	2026-03-13 14:02:56 +08:00
Concedo	8f23b8d81e	wip on ref audio, but it compiles	2026-03-12 23:46:10 +08:00
Concedo	d5a4c17e14	mp3 not default	2026-03-12 21:42:59 +08:00
Concedo	3fd9648726	added mp3 support	2026-03-12 21:00:50 +08:00
Concedo	3092694d2e	better resampler	2026-03-12 16:49:53 +08:00
Wagner Bruna	796f7bdeff	sd: fix LoRA multiplier logic to switch to at_runtime mode (#2029 ) `0. in inputs.lora_multipliers` didn't work because the C array has variable length. Also fixed a few corner cases related to the default multipliers (mainly to ensure robustness against future changes, since in most cases the multiplier list is already sanitized by a previous function).	2026-03-12 15:36:51 +08:00
Concedo	318a5486ce	duration	2026-03-12 15:33:51 +08:00
Concedo	5b22858dbd	updated docs	2026-03-12 00:20:20 +08:00
Concedo	3cc6e2ea17	make stereo default	2026-03-12 00:10:25 +08:00
Concedo	211d4fe632	lots of tweaks for ace step	2026-03-11 23:57:52 +08:00
Concedo	ecc4865244	improves code output quality	2026-03-10 23:07:52 +08:00
Concedo	8095bf9807	include overhead fromn music models	2026-03-10 22:52:20 +08:00
Concedo	6adcd0b5db	Merge commit '`34df42f7be`' into concedo_experimental # Conflicts: # README.md # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/binary-ops.c # ggml/src/ggml-hexagon/htp/cpy-ops.c # ggml/src/ggml-hexagon/htp/get-rows-ops.c # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-arith.h # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/set-rows-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # tests/test-backend-ops.cpp # tools/cli/cli.cpp # tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte	2026-03-10 22:20:04 +08:00
Concedo	746664fde6	Merge commit '`2cd20b72ed`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # docs/backend/CANN.md # docs/backend/SYCL.md # docs/backend/snapdragon/README.md # docs/backend/snapdragon/windows.md # docs/build.md # docs/multimodal/MobileVLM.md # docs/ops.md # docs/ops/WebGPU.csv # examples/debug/README.md # examples/llama.vim # examples/model-conversion/README.md # examples/sycl/README.md # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-cpu/arch/x86/repack.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp-drv.cpp # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-copy.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-reduce.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/worker-pool.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cpy.cl # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/pr2wt.sh # scripts/server-bench.py # scripts/snapdragon/windows/run-cli.ps1 # tests/test-alloc.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/completion/README.md # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/README.md # tools/perplexity/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md	2026-03-10 22:11:08 +08:00
Concedo	c8800ed16c	gcc path fix	2026-03-10 21:40:32 +08:00
Ray Xu	8d880ac012	examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968 ) * Fix logic for retrieving schema items in `json_schema_to_grammar.py` If `schema['items']` is `{}` and `prefixItems not in schema', as `{}` is Falsy, the original code here will raise an error. I think if `schema['items']` is `{}`, them items should just be `{}` * Apply suggestion from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add tests for arrays with empty items Add two unit tests to `tests/test-json-schema-to-grammar.cpp` that validate handling of arrays when 'items' is an empty schema and when 'prefixItems' is present alongside an empty 'items'. Both tests expect the same generated grammar, ensuring the JSON Schema->grammar conversion treats an empty 'items' schema (and the presence of 'prefixItems') correctly and covering this edge case. --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-10 14:38:18 +01:00
Concedo	b06dd2606e	ruff: linting	2026-03-10 21:32:36 +08:00
a3894281	0f1e9d14cc	docs: update CPU backend ops to mark POOL_1D as supported (#20304 )	2026-03-10 21:31:24 +08:00
Wagner Bruna	3f42ed1af7	support for customizing LoRA multipliers through the sdapi (#1982 ) * fix corner case in sd_oai_transform_params Also fix typo in the function name. * support for customizing loaded LoRA multipliers The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. If the list of multipliers is shorter than the list of LoRAs, the multiplier list is extended with the first multiplier (1.0 by default), to keep it compatible with the previous behavior. * support for `<lora:name:multiplier>` prompt syntax and metadata * add a few tests for sanitize_lora_multipliers	2026-03-10 21:29:39 +08:00
Concedo	eafb5ff4c5	autofit improvement e.g. for strix (+1 squashed commits) Squashed commits: [`6f6fd59c3`] autofit improvement e.g. for strix	2026-03-10 21:20:02 +08:00
Georgi Gerganov	1274fbee9e	models : fix assert in mamba2 (cont) (#20335 ) * models : fix assert in mamba2 (cont) * cont : add n_group mod Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-10 15:00:08 +02:00
Georgi Gerganov	a7b3dee7a5	server : make 2 checkpoints near the end of the prompt (#20288 ) * server : make 2 checkpoints near the end of the prompt * cont : adjust checkpoints	2026-03-10 14:28:23 +02:00
Sigbjørn Skjæret	ec947d2b16	common : fix incorrect uses of stoul (#20313 )	2026-03-10 11:40:26 +01:00
Charles Xu	0cd4f4720b	kleidiai : support for concurrent sme and neon kernel execution (#20070 )	2026-03-10 09:25:25 +02:00
Taimur Ahmad	af237f3026	ggml-cpu: add RVV repack GEMM and GEMV for quantization types (#19121 ) * ggml-cpu: add rvv ggml_quantize_mat_4x8 for q8_0 Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add rvv repacking for iq4_nl * ggml-cpu: add generic impl for iq4_nl gemm/gemv * ggml-cpu: add rvv repacking for q8_0 * ggml-cpu: refactor; add rvv repacking for q4_0, q4_K * ggml-cpu: refactor; add rvv repacking for q2_K Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: refactor rvv repack --------- Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2026-03-10 08:49:52 +02:00
Julian Pscheid	1a5631beaa	metal: handle command buffer failures gracefully in synchronize (#20306 ) Replace GGML_ABORT("fatal error") in ggml_metal_synchronize() with error flag + return. This aligns synchronize error handling with graph_compute, which already returns GGML_STATUS_FAILED for the same condition. When a command buffer fails (e.g., iOS GPU access revocation during backgrounding, macOS eGPU disconnect, OOM), the backend enters an error state instead of killing the host process. Subsequent graph_compute calls return GGML_STATUS_FAILED immediately. Recovery requires recreating the backend. Failed extra command buffers are properly released on the error path to avoid Metal object leaks.	2026-03-10 08:32:24 +02:00
ddh0	1dab5f5a44	llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770 ) * quantize : imatrix-fail early + code cleanup * fix manual override printing it's in the preliminary loop now, so needs to be on its own line * revert header changes per ggerganov * remove old #includes * clarify naming rename `tensor_quantization` to `tensor_typo_option` to descirbe its functionality * fix per barto	2026-03-10 08:16:05 +02:00
Concedo	500a1ab466	disable smartcache if slots is zero	2026-03-10 08:57:31 +08:00
Aldehir Rojas	c96f608d98	common: consolidate PEG string parsers (#20263 ) * common : consolidate PEG string parsers * cont : fix json_string_content()	2026-03-10 00:29:21 +01:00
Xuan-Son Nguyen	0842b9b465	model: fix step3.5 n_rot (#20318 )	2026-03-09 23:42:24 +01:00
Xuan-Son Nguyen	59db9a357d	llama: dynamic head_dim and n_rot for SWA (#20301 ) * llama: dynamic head_dim and n_rot for SWA * also add gguf_writer wrappers * fix build * build_rope_shift arg reorder	2026-03-09 22:22:39 +01:00
Evan Huus	23fbfcb1ad	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 ) * Parse port numbers from MCP server URLs * Pass scheme to http proxy for determining whether to use SSL * Fix download on non-standard port and re-add port to logging * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-09 17:47:54 +01:00
Concedo	2bd6b87d5b	remove a file	2026-03-09 23:08:53 +08:00
Concedo	ee96e71bae	don't resample audio	2026-03-09 22:53:55 +08:00
Paul Flynn	e22cd0aa15	metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250 ) Enable mul_mv_ext small-batch kernels (BS 2-8) for BF16, Q2_K, and Q3_K quantization types. These types previously fell through to the slower single-row mul_mv path. BF16 uses the float4 dequantize path (like F16). Q2_K and Q3_K use the float4x4 K-quant path (like Q4_K/Q5_K/Q6_K). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 16:48:12 +02:00
Georgi Gerganov	96cfc4992c	server : fix checkpoints n_tokens calculation (#20287 )	2026-03-09 16:47:06 +02:00
Georgi Gerganov	ed0007aa32	metal : add upscale (#20284 )	2026-03-09 16:45:11 +02:00

1 2 3 4 5 ...

12060 commits