koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 04:00:53 +00:00

Author	SHA1	Message	Date
Concedo	088c01e2a7	add jaxxks mutex lock for proxy during request	2026-03-31 17:03:38 +08:00
Concedo	2acf209972	minor gui cleanup	2026-03-31 16:59:10 +08:00
Eso	79209a14a8	feat: Autoswap functionality (#2080 ) * feat: Autoswap mode (cherry-picked from remoteManagement) Co-authored-by: esolithe <65901558+esolithe@users.noreply.github.com> * fix: Remove modelOverride, add triggered_sleeping to autoswap unload timeout branch Agent-Logs-Url: https://github.com/esolithe/esobold/sessions/1ddb3f88-43b4-4234-aa41-0fe6c9976db4 Co-authored-by: esolithe <65901558+esolithe@users.noreply.github.com> * fix: Remove esobold-specific GUI elements from admin tab, renumber remaining rows Agent-Logs-Url: https://github.com/esolithe/esobold/sessions/6a2e4ec3-cb19-4f98-b00f-bdb13749ead3 Co-authored-by: esolithe <65901558+esolithe@users.noreply.github.com> * fix: Removed unneeded changes --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-31 16:49:59 +08:00
Concedo	56c21bac04	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cpu.Dockerfile # .devops/cuda-new.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .devops/vulkan.Dockerfile # .github/workflows/docker.yml # docs/docker.md # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # tests/test-backend-ops.cpp # tests/test-jinja.cpp	2026-03-31 15:47:40 +08:00
Concedo	9fe8027ed3	try fix actions	2026-03-31 15:46:46 +08:00
Concedo	ab1c5b3fdc	try actions build again (+1 squashed commits) Squashed commits: [e7e54ec3b] try actions build again	2026-03-31 14:40:19 +08:00
Concedo	5440ca4794	rolling builds	2026-03-31 13:12:11 +08:00
shaofeiqi	08f21453ae	opencl: add q4_K gemm and gemv kernels for Adreno (#20919 ) * opencl: add q4_K gemm and gemv kernels for Adreno * opencl: fix whitespace * opencl: add workarounds for compiler bugs on older devices * opencl: handle fp16 denorm on X Elite * opencl: fix kernel build error * opencl: fix whitespace * opencl: make q4_K cvt kernels signature consistent --------- Co-authored-by: Li He <lih@qti.qualcomm.com>	2026-03-30 12:19:16 -07:00
Seungmin Kim	84ae8434d0	CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122 ) * CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> * Obtain source tag name from git tag Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Ts-sound <44093942+Ts-sound@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-30 20:24:37 +02:00
Zhihao "Zephyr" Yao	ead417f01c	jinja : handle empty expressions correctly (#20913 ) * Reject empty computed member expressions before returning slices[0] from parse_member_expression_arguments(). * Treat empty computed member expressions with Jinja2 undefined semantics Treat empty computed member expressions like `a[]` as undefined instead of raising a parser error, to match Jinja2 behavior. - return a noop expression for empty computed member arguments - return undefined when a computed member key evaluates to undefined - add Jinja tests covering `a[]\|default('fallback')` and `a[] is undefined` * Handle undefined computed member properties Move undefined-property handling to the common member access path, and add a test covering `a[undefined] is undefined`. * Use default undefined value in member access Initialize val and then return it when property is undefined. Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * empty statement parses to blank_expression instead of noop_statement --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-30 20:08:46 +02:00
Oliver Simons	64ac9ab66a	CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181 ) * CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes #21162 * Reduce nrows in test case to 256, don't need 768	2026-03-30 16:20:00 +02:00
Radoslav Gerganov	cad2d3884c	rpc : fix misleading error log (#21184 ) When RPC is running with a remote backend which doesn't have init_tensor function (like CPU and Metal), the server log gets full with error messages saying that init_tensor is being called with null buffer which is incorrect. This patch fixes this.	2026-03-30 17:05:11 +03:00
Concedo	0afcf4bc6d	up jimver cuda toolkit version	2026-03-30 21:43:29 +08:00
Concedo	894591da7c	increase ctx size slider	2026-03-30 21:41:31 +08:00
Concedo	a3a5897d93	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/workflows/python-type-check.yml # embd_res/templates/Qwen3.5-4B.jinja # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/model-conversion/scripts/utils/compare_tokens.py # examples/model-conversion/scripts/utils/semantic_check.py # examples/sycl/build.sh # examples/sycl/run-llama2.sh # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/rope-ops.c # scripts/gen-unicode-data.py # tests/test-chat.cpp	2026-03-30 21:41:19 +08:00
Concedo	9864d46389	add password for musicui	2026-03-30 21:03:12 +08:00
Concedo	42ad89cd86	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cann.Dockerfile # .devops/cpu.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/nix/package.nix # .github/workflows/build-android.yml # .github/workflows/build-cann.yml # .github/workflows/build-msys.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/python-lint.yml # .github/workflows/release.yml # CMakeLists.txt # docs/backend/CANN.md # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync_vendor.py # tests/test-chat-auto-parser.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-reasoning-budget.cpp # tools/cli/cli.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2026-03-30 20:45:38 +08:00
Aleksander Grygier	389c7d4955	webui: Fix branching logic on edit message (#21175 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details * fix: Branching logic + small refactor * chore: update webui build output	2026-03-30 14:40:50 +02:00
Concedo	923d5fc5d0	warning: clip_image_preprocess has been moved, now you must manually copy init_vision from mtmd into clip.cpp's setup_init_vision_shim_kcpp	2026-03-30 20:39:55 +08:00
Aman Gupta	278521c33a	llama-model-loader: print warning when using overrides with mmap (#20978 ) * llama-model-loader: use pinned memory for tensor overrides * change to warning	2026-03-30 17:40:17 +08:00
Sigbjørn Skjæret	e2eb39e81c	ci : bump ty to 0.0.26 (#21156 ) * fix incorrect type ignore comments * bump ty to 0.0.26	2026-03-30 09:29:15 +02:00
Xuan-Son Nguyen	abf9a62161	server: wrap headers for mcp proxy (#21072 ) * server: wrap headers for mcp proxy * Update tools/server/server-cors-proxy.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix build * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-30 08:59:16 +02:00
Sigbjørn Skjæret	7c203670f8	add missing ROPE_FACTORS_LONG/SHORT for MiniCPM (#21150 )	2026-03-29 19:45:40 +02:00
Gaurav Garg	ec16a072f0	Optimize MOE GEMV kernel for BS > 1. (#20905 ) * Optimize MOE GEMV kernel for BS > 1. The previous MOE kernel for BS > 1 had too many thread blocks (nrows_x, nchannels_dst, ncols_dst), with very little work per block. block of (32, 4) was doing inner dot product for a single row. New mul_mat_vec_q_moe kernel is dedicated for MoE multi-token kernel with grid (ceil(nrows_x/rpb), nchannels_dst), block (warp_size, ncols_dst). Each warp handles two rows independently with warp-level reduction only (no shared memory sync). This change doesn't increase any compilation time as a single template instance is needed per type. This also simplifies the original GEMV kernel and gets rid of `is_multi_token_id` specialization. * Remove em-dashes * Cherry-pick changes from @am17an PR https://github.com/ggml-org/llama.cpp/pull/20885 to enable small_k optimization only for cases where it benefits Increase max batch size for MMVQ kernels for MUL_MAT_ID to 8 * Make the max batch size for MOE GEMV kernel configurable based on GPU arch and datatype --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2026-03-29 18:35:18 +02:00
Concedo	4fc3c28f1a	reasoning output parsing improvements	2026-03-29 23:35:24 +08:00
Max Krasnyansky	f5d1c4179f	hexagon: dma optimizations (mostly fixing regressions) (#21137 ) * hex-fa: add simple dma cache for Mask I noticed that we were refetch the mask rows over and over. This simple cache avoids that. * hex-dma: unset in-order desc bit which caused signficant perf regression We don't rely on true in order processing of the DMA descriptors anywhere. Turns out this mode caused significant regression of around 3-4 TPS during token gen. * hex-rope: update comment to clarify that we don't need in-order DMA completions	2026-03-29 06:40:13 -07:00
Concedo	4a09f3805b	prepare for breaking merge	2026-03-29 14:09:29 +08:00
Davi Henrique Linhares	2405d59cb6	devops: including compute-runtime for intel.Dockerfile (#21076 )	2026-03-29 13:34:03 +08:00
Neo Zhang	afe65aa282	[SYCL] Enhance build script to use half cores to build, avoid OS hang (#21093 ) * use half cores to build, avoid OS hang * reduce the output text num to short test time * avoid to return 0	2026-03-29 09:02:45 +08:00
Sigbjørn Skjæret	65097181e4	fix **/x glob matching (#21129 )	2026-03-28 22:27:38 +01:00
Piotr Wilkin (ilintar)	98ae0a0d36	common/parser: fix handling of tool definition with missing properties key (#21128 )	2026-03-28 20:41:32 +01:00
Sigbjørn Skjæret	3a14a542f5	common : add character class support to glob_match (#21111 ) * add character class support to glob_match * remove pointless reference	2026-03-28 19:57:37 +01:00
Concedo	df6b7b5fdb	Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental	2026-03-29 01:25:07 +08:00
Concedo	3eedde8ab5	Merge commit '`ded446b34c`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # tests/test-backend-ops.cpp	2026-03-29 01:24:31 +08:00
Wagner Bruna	9223f41320	sd: call SetCircularAxesAll directly (#2078 )	2026-03-29 01:17:48 +08:00
Concedo	8760d22a84	switch back to newly updated jimver github cuda toolkit	2026-03-29 01:17:11 +08:00
Concedo	aac220f7e3	Merge commit '`0fac87b157`' into concedo_experimental # Conflicts: # .github/workflows/build-android.yml # .github/workflows/hip-quality-check.yml # docs/multimodal.md # scripts/hip/gcn-cdna-vgpr-check.py # scripts/snapdragon/windows/run-bench.ps1 # scripts/snapdragon/windows/run-cli.ps1 # scripts/snapdragon/windows/run-tool.ps1 # tests/test-backend-ops.cpp # tests/test-llama-archs.cpp # tools/imatrix/imatrix.cpp # tools/mtmd/CMakeLists.txt	2026-03-29 01:14:33 +08:00
BlueMöhre	968189729f	WebUI: Replace illegal nested button elements (#21026 ) * remove/replace nested button elements * map rest props to outer element * solve TODO * chore: update webui build output	2026-03-28 17:57:59 +01:00
Concedo	674b7f5eee	indicate support for claude messages api	2026-03-29 00:57:58 +08:00
Adrien	e397d3885c	common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details The regex-to-grammar converter in _visit_pattern() crashes with SIGSEGV when a JSON schema "pattern" field contains a non-capturing group (?:...). Root cause: when the parser sees '(' followed by '?', it pushes a warning but does not advance past '?:'. The recursive transform() call then interprets '?' as a quantifier and calls seq.back() on an empty vector, causing undefined behavior. This commonly occurs when serving OpenAI-compatible tool calls from clients that include complex regex patterns in their JSON schemas (e.g., date validation patterns like ^(?:(?:\d\d[2468][048]\|...)-02-29\|...)$). The fix: - Skip '?:' after '(' to treat non-capturing groups as regular groups - For unsupported syntax (?=, ?!, etc.), skip to matching ')' safely, handling escaped characters to avoid miscounting parenthesis depth - Adjust the ')' unbalanced-parentheses check using direct char comparisons instead of substr - Add test cases for non-capturing groups (C++ only, as the JS/Python implementations do not yet support this syntax)	2026-03-28 17:55:38 +01:00
Concedo	e3b7905e1c	added anthropic messages api support	2026-03-29 00:55:32 +08:00
Concedo	5ad9e3ee31	crude openai responses streaming	2026-03-29 00:16:30 +08:00
Aldehir Rojas	e6f2ec01ff	common : add reasoning_format = none support to gpt-oss (#21094 )	2026-03-28 09:33:39 -05:00
Georgi Gerganov	edfb440a2f	server : fix processing of multiple back-to-back mtmd chunks (#21107 )	2026-03-28 16:27:36 +02:00
Adrien Gallouët	3d66da1809	ci : gracefully shut down the server (#21110 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 14:49:57 +01:00
Woof Dog	82b703f8bc	Document custom default webui preferences in server README (#19771 )	2026-03-28 14:19:16 +01:00
Concedo	94b266a6b0	musicui fix reset defaults	2026-03-28 21:09:40 +08:00
Aleksander Grygier	51a84efc53	webui: Conversation forking + branching improvements (#21021 ) * refactor: Make `DialogConfirmation` extensible with children slot * feat: Add conversation forking logic * feat: Conversation forking UI * feat: Update delete/edit dialogs and logic for forks * refactor: Improve Chat Sidebar UX and add MCP Servers entry * refactor: Cleanup * feat: Update message in place when editing leaf nodes * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * refactor: Post-review improvements * chore: update webui build output * test: Update Storybook test * chore: update webui build output * chore: update webui build output	2026-03-28 13:38:15 +01:00
Concedo	1e787cd03a	improve responses api	2026-03-28 18:42:15 +08:00
Concedo	f768b2a4bd	whatever, i tried	2026-03-28 17:32:07 +08:00

1 2 3 4 5 ...

12499 commits