koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 09:59:50 +00:00

Author	SHA1	Message	Date
Concedo	7e53bfd28d	Merge commit '`2b6dfe824d`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # examples/save-load-state/save-load-state.cpp # src/llama-context.cpp # tools/cli/cli.cpp	2026-02-26 15:07:23 +08:00
Xuan-Son Nguyen	5452d736f8	jinja: correct stats for tojson and string filters (#19785 )	2026-02-22 21:08:23 +01:00
Aldehir Rojas	94b0200a01	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 ) * common : migrate qwen3-coder to PEG parsing variant * cont : add JSON parameter test	2026-02-20 23:22:22 +01:00
Concedo	9eb9e4eb83	Merge commit '`8a70973557`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/backend/SYCL.md # examples/model-conversion/scripts/utils/tensor-info.py # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/expm1.cl # ggml/src/ggml-opencl/kernels/mean.cl # ggml/src/ggml-opencl/kernels/softplus.cl # ggml/src/ggml-opencl/kernels/sum_rows.cl # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/scale.wgsl # tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte	2026-02-20 14:36:49 +08:00
Jeff Bolz	77d6ae4ac8	test: mul_mat tests with huge batch size (#19519 ) Some checks failed Update Operations Documentation / update-ops-docs (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details	2026-02-19 20:08:25 -06:00
Jesse Posner	3dadc88b58	common : fix Step-3.5-Flash format detection and thinking support (#19635 ) * common : fix Step-3.5-Flash format detection and thinking support Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder (<tool_call><function=...><parameter=...>) but its Jinja template lacks the bare <function> and plural <parameters> markers that the detection logic previously required. This caused it to fall through to Hermes 2 Pro, which doesn't call func_args_not_string(), so arguments stayed as JSON strings and templates using arguments\|items crashed. Additionally, the Qwen3-Coder-XML format handler had no thinking support. Models like Step-3.5-Flash that unconditionally emit <think> in their generation prompt need the same thinking_forced_open handling that Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content is never separated from content in API responses. Changes: - Relax Qwen3-Coder XML detection to only require the 3 shared markers - Tighten Nemotron v3 branch to also require bare <function> and plural <parameters>, preventing Step-3.5-Flash from being misrouted via <think> - Add thinking_forced_open support to Qwen3-Coder-XML init function - Add <think>/</think> to preserved tokens - Fix build_grammar_xml_tool_call to handle thinking_forced_open in the grammar root rule, allowing </think> before tool calls - Add Step-3.5-Flash chat template and format detection test Builds on: https://github.com/ggml-org/llama.cpp/pull/19283 * chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with unconditional <think> output. Route it to the Nemotron v3 PEG parser for streaming and schema-aware parameter parsing. Detection: templates with <think> + XML tool tags use Nemotron v3 PEG parser; templates without <think> (Qwen3-Coder) use GBNF grammar. Tests cover: basic messages, tool calls with/without thinking content, parallel tool calls, code string parameters, optional </parameter> closing tags, and JSON schema response format. * chat : remove dead thinking code from qwen3_coder_xml Remove thinking handling code that became unreachable after routing Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no <think> in its template, so the thinking_forced_open logic, preserved tokens, and grammar prefix were dead paths.	2026-02-19 22:40:52 +01:00
Piotr Wilkin (ilintar)	8a70973557	Add Jinja support for "indent" string filter (#19529 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details * Add partial Jinja support for "indent" string filter * Fully implement indent * Add tests for all width variants. * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fix getline ignoring trailing newlines * Update common/jinja/value.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix first indent condition --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-19 00:25:52 +01:00
Georgi Gerganov	08e6d914b8	ggml : avoid UB in gemm ukernel (#19642 )	2026-02-15 14:56:35 +02:00
Jeff Bolz	dbb023336b	vulkan: support L2_NORM with contiguous rows (#19604 )	2026-02-14 06:42:04 +01:00
ymcki	0e21991472	fix vulkan ggml_acc only works in 3d but not 4d (#19426 ) * fix vulkan ggml_acc only works in 3d but not 4d * removed clamp in test_acc_block * use the correct stride and its test case * cuda : fix "supports op" condition * change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check * version without boundary check * revert back to boundary check version --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-02-13 13:31:37 +01:00
Georgi Gerganov	490eb96b88	metal : support GGML_OP_SET (#19548 )	2026-02-13 07:34:52 +02:00
Georgi Gerganov	3b3a948134	metal : update sum_rows kernel to support float4 (#19524 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details	2026-02-12 11:35:28 +02:00
Georgi Gerganov	914dde72ba	ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511 ) * ggml : unary ops support non-cont src0 * metal : support F16 unary ops + fix ELU	2026-02-11 18:58:43 +02:00
Georgi Gerganov	89181c0b6d	ggml : extend bin bcast for permuted src1 (#19484 ) * tests : extend bin bcast for permuted src1 * cont : extend bin support * cont : s0 is always 1 * tests : simplify	2026-02-11 07:52:00 +02:00
Georgi Gerganov	ceaa89b786	metal : consolidate unary ops (#19490 )	2026-02-11 07:51:12 +02:00
Xuan-Son Nguyen	9a96352729	test: fix IMROPE perf test case (#19465 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details	2026-02-10 14:37:50 +01:00
Georgi Gerganov	a0d585537c	cuda : extend GGML_OP_PAD to work with non-cont src0 (#19429 ) * cuda : extend GGML_OP_PAD to work with non-cont src0 * tests : add permuted pad	2026-02-10 08:07:16 +02:00
Hugo	1e8924fd65	cmake : add variable to skip installing tests (#19370 ) When packaging downstream, there's usually little point in installing test. The default behaviour remains the same.	2026-02-09 07:12:02 +01:00
Jeff Bolz	db6adb3c88	tests: reduce number of FA test permutations (#19381 ) Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).	2026-02-06 08:50:30 -06:00
Jeff Bolz	449ec2ab07	vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (#19281 ) Write out a 2-bit code per block and avoid loading the mask when it matches these two common cases. Apply this optimization when the mask is relatively large (i.e. prompt processing).	2026-02-05 09:26:38 -06:00
Georgi Gerganov	eaba92c3dc	tests : add non-cont, inplace rope tests (#19296 ) * tests : add non-cont, inplace rope tests * cont : exercise dim 3 Co-authored-by: Jeff Bolz <jbolz@nvidia.com> * cont : more dim3 exercises --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2026-02-04 12:45:21 +02:00
Concedo	7b393fa487	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AUTHORS # ci/run.sh # docs/backend/SYCL.md # docs/build.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmo4.0.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # docs/multimodal/minicpmv4.0.md # docs/multimodal/minicpmv4.5.md # docs/ops.md # docs/ops/SYCL.csv # docs/speculative.md # examples/deprecation-warning/README.md # examples/deprecation-warning/deprecation-warning.cpp # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/convert-model.sh # ggml/include/ggml-cann.h # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/concat.cl # ggml/src/ggml-opencl/kernels/repeat.cl # ggml/src/ggml-opencl/kernels/scale.cl # ggml/src/ggml-opencl/kernels/tanh.cl # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/wkv.cpp # src/llama-vocab.cpp # tests/test-autorelease.cpp # tests/test-backend-ops.cpp # tools/cvector-generator/pca.hpp # tools/export-lora/export-lora.cpp # tools/perplexity/README.md	2026-02-03 19:00:42 +08:00
Sid Mohan	0dfcd3b607	jinja : add missing 'in' test to template engine (#19004 ) (#19239 ) * jinja : add missing 'in' test to template engine (#19004) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * reuse test_is_in in binary op --------- Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-02-02 21:00:55 +01:00
Aman Gupta	9f682fb640	ggml-cpu: FA split across kv for faster TG (#19209 ) * ggml-cpu: split across kv for faster TG * simplify sinks application * add ref impl	2026-02-03 01:19:55 +08:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Georgi Gerganov	c3b87cebff	tests : add GQA=20 FA test (#19095 ) Some checks failed Python Type-Check / pyright type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details	2026-01-30 13:52:57 +02:00
Aldehir Rojas	7b7ae857f6	chat : add parsing for solar-open-100b (#18540 ) * chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-01-29 16:06:15 +01:00
Concedo	7e755014b2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/winget.yml # CODEOWNERS # common/CMakeLists.txt # common/arg.cpp # docs/ops/SYCL.csv # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # tests/test-chat-template.cpp	2026-01-29 23:05:05 +08:00
Concedo	46cd17c17e	Merge commit '`88d23ad515`' into concedo_experimental # Conflicts: # CODEOWNERS # docs/build.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zendnn/CMakeLists.txt # tests/test-chat-template.cpp	2026-01-29 22:25:56 +08:00
Sigbjørn Skjæret	b45ef2702c	jinja : do not pass empty tools and add some none filters (#19176 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Update Operations Documentation / update-ops-docs (push) Waiting to run Details	2026-01-29 14:06:54 +01:00
Sigbjørn Skjæret	60368e1d73	jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147 ) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types	2026-01-28 14:40:29 +01:00
Sigbjørn Skjæret	2b4cbd2834	jinja : implement mixed type object keys (#18955 ) * implement mixed type object keys * add tests * refactor * minor fixes * massive refactor * add more tests * forgotten tuples * fix array/object is_hashable * correct (albeit broken) jinja responses verified with transformers * improved hashing and equality * refactor hash function * more exhausive test case * clean up * cont * cont (2) * missing cstring --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-01-27 19:50:42 +01:00
Johannes Gäßler	b0311c16d2	CUDA: fix padding of GQA to power of 2 in FA (#19115 )	2026-01-26 23:24:58 +01:00
Johannes Gäßler	4e5b83b226	GGUF: check that tensor size is representable (#19072 )	2026-01-24 21:57:51 +01:00
Concedo	e8e7c357c9	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-cache.yml # .github/workflows/build-cmake-pkg.yml # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/check-vendor.yml # .github/workflows/close-issue.yml # .github/workflows/copilot-setup-steps.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/labeler.yml # .github/workflows/pre-tokenizer-hashes.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/python-type-check.yml # .github/workflows/release.yml # .github/workflows/server-webui.yml # .github/workflows/server.yml # .github/workflows/update-ops-docs.yml # .github/workflows/winget.yml # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # requirements/requirements-tool_bench.txt # src/CMakeLists.txt # src/llama-quant.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/server/README.md	2026-01-23 14:27:04 +08:00
Xuan-Son Nguyen	51fa458a92	server : support preserving reasoning_content in assistant message (#18994 ) * support reasoning_content input * report template caps to webui * add docs * rm commented code	2026-01-22 21:30:06 +01:00
Georgi Gerganov	a5eaa1d6a3	mla : make the V tensor a view of K (#18986 ) * mla : pass V as a view of K to the FA op * cuda : adjust mla logic to new layout * kv-cache : fix rope shift * tests : remove comment * cuda : fix reusable_cutoff Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-01-22 22:09:01 +02:00
Piotr Wilkin (ilintar)	c301172f66	jinja: support none\|string (#18995 ) * jinja: support none\|string * Update common/jinja/value.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add as_string() --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-21 19:24:37 +01:00
Jeff Bolz	33f890e579	vulkan: support flash attention GQA/split_k with small batches (#18938 )	2026-01-21 17:43:43 +01:00
Concedo	4984c9bc16	Merge commit '`12a4a47e6a`' into concedo_experimental # Conflicts: # ci/run.sh # examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh # examples/model-conversion/scripts/causal/run-converted-model.sh # examples/model-conversion/scripts/embedding/run-converted-model.sh # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # tests/CMakeLists.txt # tests/test-chat-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/cli/cli.cpp	2026-01-21 21:00:44 +08:00
Xuan-Son Nguyen	2c1f199653	cli : fix reasoning responses in CLI (#18961 ) * cli : fix reasoning responses in CLI * fix build * fix build (2)	2026-01-20 18:23:25 +01:00
Sigbjørn Skjæret	959ecf7f23	jinja : fix undefined keys and attributes and int/float as bool (#18924 ) * fix undefined keys and attributes * add falsy tests * as_bool for integers and floats * more falsy/truthy tests * --typo	2026-01-19 20:29:43 +01:00
Sigbjørn Skjæret	4037093c66	ci : run test-jinja -py on high perf [no ci] (#18916 )	2026-01-19 20:29:15 +01:00
Concedo	7f618454ff	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # CODEOWNERS # docs/backend/OPENCL.md # docs/ops.md # docs/ops/CANN.csv # docs/ops/WebGPU.csv # ggml/src/ggml-blas/CMakeLists.txt # ggml/src/ggml-opencl/kernels/mul_mv_q6_k.cl # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/cpy.tmpl.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl # tests/test-backend-ops.cpp	2026-01-18 23:24:29 +08:00
Xuan-Son Nguyen	fe44d35574	tests : add test-jinja -py option for cross-checking (#18906 ) * tests : add test-jinja -py option or cross-checking * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix + add source * SandboxedEnvironment * fix array.map case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-18 08:14:27 +01:00
Sigbjørn Skjæret	d03c45c9c5	jinja : attribute support for join, map and sort (#18883 ) * support negative array index and default value * attribute support (int and str) for join, map and sort * add tests * update CODEOWNERS * improve fixme sorting comment	2026-01-18 02:53:01 +01:00
Sigbjørn Skjæret	10c98cbdf6	jinja : add missing tojson filter for bool (#18900 ) * add missing tojson for bool * add more literal tests	2026-01-18 01:05:09 +01:00
Sigbjørn Skjæret	420960ab92	jinja : fix lexing of float literals with sign (#18901 ) * fix lexing of float literals with sign * add test * consume_numeric	2026-01-18 00:57:51 +01:00
Xuan-Son Nguyen	f55b033ae6	jinja: correct member access rule (#18905 )	2026-01-18 00:48:55 +01:00
Concedo	8855a7f52b	Merge commit '`c945aaaef2`' into concedo_experimental # Conflicts: # .devops/cann.Dockerfile # .github/workflows/build.yml # .github/workflows/release.yml # README.md # common/CMakeLists.txt # common/chat.cpp # docs/function-calling.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # models/templates/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.jinja # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/peg-parser/tests.h # tests/test-chat-peg-parser.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp # tests/testing.h # tools/llama-bench/llama-bench.cpp	2026-01-17 10:24:03 +08:00

1 2 3 4 5 ...

752 commits