koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 18:30:50 +00:00

Author	SHA1	Message	Date
Concedo	b6bb9c914e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/winget.yml # CMakeLists.txt # common/CMakeLists.txt # examples/model-conversion/scripts/causal/run-org-model.py # ggml/src/ggml-cpu/CMakeLists.txt # tools/perplexity/perplexity.cpp # tools/server/CMakeLists.txt	2026-02-17 19:41:28 +08:00
Adrien Gallouët	65cede7c70	build : cleanup library linking logic (#19665 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-17 08:36:45 +01:00
Ivan Chikish	cceb1b4e33	common : inline functions (#18639 )	2026-02-16 17:52:24 +02:00
Concedo	8d61ff4530	revert "build : remove LLAMA_HTTPLIB option (#19623 )"	2026-02-16 18:32:29 +08:00
Concedo	dbe0604dfc	tools broken currently	2026-02-16 18:10:42 +08:00
Concedo	72f7e01b27	Merge commit '`01d8eaa28d`' into concedo_experimental # Conflicts: # build-xcframework.sh # scripts/sync_vendor.py # tests/test-backend-ops.cpp # tools/mtmd/CMakeLists.txt # tools/rpc/rpc-server.cpp	2026-02-16 15:36:59 +08:00
Adrien Gallouët	9e118b97c4	build : remove LLAMA_HTTPLIB option (#19623 ) This option was introduced as a workaround because cpp-httplib could not build on visionOS. Since it has been fixed and now compiles on all platforms, we can remove it and simplify many things. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-15 15:38:50 +01:00
iMil	badba89320	NetBSD build support (#19589 )	2026-02-14 09:47:01 +01:00
agent-enemy-2	2d8015e8a4	llama : update LoRA API. + fix excessive graph reserves (#19280 ) * Refactoring to use new llama_put_adapter_loras * cont : alternative lora API --------- Co-authored-by: Jake Chavis <jakechavis6@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-02-14 10:06:27 +02:00
Concedo	45dc155530	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # AGENTS.md # SECURITY.md # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # scripts/sync_vendor.py # src/unicode.cpp # tests/test-backend-ops.cpp # tools/cli/cli.cpp	2026-02-14 12:44:16 +08:00
Adrien Gallouët	b48e80f677	common : update download code (#19573 ) * common : remove legacy .json to .etag migration code Signed-off-by: Adrien Gallouët <angt@huggingface.co> * common : simplify common_download_file_single_online This commit also force a redownload if the file exists but has no .etag file. Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-13 15:10:46 +01:00
Concedo	bff3fd3e34	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/common.cpp # docs/backend/snapdragon/README.md # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # scripts/pr2wt.sh # tests/test-backend-ops.cpp # tools/server/README.md	2026-02-13 14:00:45 +08:00
Concedo	55524e160b	temp merge, not working	2026-02-13 12:11:26 +08:00
Georgi Gerganov	338085c69e	args : add -kvu to llama-parallel (#19577 )	2026-02-12 21:52:41 +02:00
Concedo	261d78eaaa	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md # docs/speculative.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tools/mtmd/clip.cpp	2026-02-12 18:05:20 +08:00
Adrien Gallouët	4ae1b7517a	common : replace deprecated codecvt using parse_utf8_codepoint (#19517 ) Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>	2026-02-12 07:27:52 +01:00
Daniel Bevenius	3136a849db	common : remove unused token util functions (#19506 ) This commit removes two unused functions `common_lcp` and `common_lcs`. The last usage of these functions was removed in Commit `33eff40240` ("server : vision support via libmtmd") and are no longer used anywhere in the codebase.	2026-02-11 17:41:35 +01:00
Adrien Gallouët	0c1f39a9ae	common : improve download error reporting (#19491 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-11 09:27:55 +01:00
thecaptain789	8ee538ce73	llama : correct typos 'occured' and 'occurences' (#19414 ) Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>	2026-02-11 07:05:31 +01:00
Xuan-Son Nguyen	98e57ca422	chat: fix case where template accepts type content only (#19419 ) * chat: fix case where template accepts type content only * rm stray log * reuse render_message_to_json	2026-02-09 22:14:12 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Concedo	a0a78dacc4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # docs/ops.md # docs/ops/SYCL.csv # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # pyproject.toml # requirements/requirements-convert_legacy_llama.txt # src/CMakeLists.txt # src/llama-vocab.cpp # tests/test-backend-ops.cpp	2026-02-07 15:54:02 +08:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit d30e59b62a15ef4266a6503e3f4eba770aec001b. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Concedo	157fac7bd0	Merge commit '`c342c3b93d`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CODEOWNERS # scripts/sync_vendor.py	2026-02-05 22:23:05 +08:00
Xuan-Son Nguyen	e0c93af2a0	debug: make common_debug_print_tensor readable (#19331 ) * debug: make common_debug_print_tensor readable * editorconfig	2026-02-04 17:55:31 +01:00
Concedo	1a36ef20c3	Merge branch 'upstream' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp	2026-02-04 20:53:35 +08:00
Georgi Gerganov	d838c22bb3	spec : fix the check-rate logic of ngram-simple (#19261 ) * spec : fix the check-rate logic of ngram-simple * cont : refactor + fix checks	2026-02-04 10:39:53 +02:00
Concedo	7b393fa487	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AUTHORS # ci/run.sh # docs/backend/SYCL.md # docs/build.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmo4.0.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # docs/multimodal/minicpmv4.0.md # docs/multimodal/minicpmv4.5.md # docs/ops.md # docs/ops/SYCL.csv # docs/speculative.md # examples/deprecation-warning/README.md # examples/deprecation-warning/deprecation-warning.cpp # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/convert-model.sh # ggml/include/ggml-cann.h # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/concat.cl # ggml/src/ggml-opencl/kernels/repeat.cl # ggml/src/ggml-opencl/kernels/scale.cl # ggml/src/ggml-opencl/kernels/tanh.cl # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/wkv.cpp # src/llama-vocab.cpp # tests/test-autorelease.cpp # tests/test-backend-ops.cpp # tools/cvector-generator/pca.hpp # tools/export-lora/export-lora.cpp # tools/perplexity/README.md	2026-02-03 19:00:42 +08:00
Georgi Gerganov	aeb827a3cc	spec : simplify time measurement using common_time_meas (#19262 )	2026-02-03 08:20:15 +02:00
Sid Mohan	0dfcd3b607	jinja : add missing 'in' test to template engine (#19004 ) (#19239 ) * jinja : add missing 'in' test to template engine (#19004) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * reuse test_is_in in binary op --------- Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-02-02 21:00:55 +01:00
Sascha Rogmann	b4d05a3d2f	spec : various improvements ton ngram-map + docs (#19253 ) * spec: ngram-map and reasoning chats * spec: add t_begin and t_accept * ngram-map : add internal hash map * docs : update ngram-map, add ngram-mod * docs : fix ngram-map-k * docs : differences between implementations	2026-02-02 08:26:58 +02:00
Concedo	ddce19db72	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package-gguf-py.nix # .devops/nix/scope.nix # common/CMakeLists.txt # docs/backend/SYCL.md # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/sycl/run-llama2.sh # examples/sycl/win-run-llama2.bat # examples/sycl/win-test.bat # ggml/src/ggml-hexagon/CMakeLists.txt # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hvx-dump.h # ggml/src/ggml-hexagon/htp/hvx-reduce.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # scripts/sync-ggml.last	2026-02-01 22:35:25 +08:00
Georgi Gerganov	4927795810	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
Georgi Gerganov	dabaa2e77a	spec : add ngram-mod (#19164 ) * spec : add ngram-mod * cont : simplify + keep track of occupancy * cont : cleanup * cont : move initialization to common/speculative * cont : cleanup * cont : cleanup * cont : fix	2026-01-30 18:21:48 +02:00
Marcello Seri	2e916f996a	jinja : add unordered_map include to value.h [no ci] (#19205 ) On macos Sequoia 15.7.3, x86_64, the build has recently started failing with ``` In file included from .../code/cpp/llama.cpp/common/jinja/string.cpp:2: .../code/cpp/llama.cpp/common/./jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; \| ~~~~~^ In file included from .../code/cpp/llama.cpp/common/jinja/caps.cpp:1: .../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; \| ~~~~~^ In file included from .../code/cpp/llama.cpp/common/jinja/value.cpp:1: In file included from .../code/cpp/llama.cpp/common/jinja/runtime.h:4: .../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; [...] ``` After a bit of digging to make sure all the appropriate flags were used, I notifced that the necessary header was not included. This fixes the build for me and should not affect negatively other builds that for some reasons were already succeeding	2026-01-30 16:09:44 +01:00
Concedo	8d173f50c2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # docs/backend/SYCL.md # docs/backend/snapdragon/CMakeUserPresets.json # docs/backend/snapdragon/README.md # docs/backend/snapdragon/developer.md # docs/ops.md # docs/ops/SYCL.csv # embd_res/templates/upstage-Solar-Open-100B.jinja # ggml/src/CMakeLists.txt # ggml/src/ggml-hexagon/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl # tests/test-chat.cpp	2026-01-30 15:32:59 +08:00
Aldehir Rojas	7b7ae857f6	chat : add parsing for solar-open-100b (#18540 ) * chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-01-29 16:06:15 +01:00
Concedo	7e755014b2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/winget.yml # CODEOWNERS # common/CMakeLists.txt # common/arg.cpp # docs/ops/SYCL.csv # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # tests/test-chat-template.cpp	2026-01-29 23:05:05 +08:00
Concedo	46cd17c17e	Merge commit '`88d23ad515`' into concedo_experimental # Conflicts: # CODEOWNERS # docs/build.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zendnn/CMakeLists.txt # tests/test-chat-template.cpp	2026-01-29 22:25:56 +08:00
Sigbjørn Skjæret	b45ef2702c	jinja : do not pass empty tools and add some none filters (#19176 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Update Operations Documentation / update-ops-docs (push) Waiting to run Details	2026-01-29 14:06:54 +01:00
Georgi Gerganov	eed25bc6b0	arg : add -kvu to llama-batched-bench (#19172 )	2026-01-29 08:50:47 +02:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Sigbjørn Skjæret	60368e1d73	jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147 ) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types	2026-01-28 14:40:29 +01:00
Georgi Gerganov	631cbfcc7a	cuda : fix "V is K view" check for non-unified KV cache (#19145 )	2026-01-28 09:15:27 +02:00
Georgi Gerganov	c5c64f72ac	llama : disable Direct IO by default (#19109 ) * llama : disable Direct IO by default * cont : override mmap if supported	2026-01-28 09:11:13 +02:00
Sigbjørn Skjæret	2b4cbd2834	jinja : implement mixed type object keys (#18955 ) * implement mixed type object keys * add tests * refactor * minor fixes * massive refactor * add more tests * forgotten tuples * fix array/object is_hashable * correct (albeit broken) jinja responses verified with transformers * improved hashing and equality * refactor hash function * more exhausive test case * clean up * cont * cont (2) * missing cstring --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-01-27 19:50:42 +01:00
Concedo	f6ece6fd37	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/check-vendor.yml # .github/workflows/close-issue.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/labeler.yml # .github/workflows/pre-tokenizer-hashes.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/python-type-check.yml # .github/workflows/server.yml # .github/workflows/update-ops-docs.yml # README.md # docs/build.md # examples/model-conversion/scripts/utils/perplexity-gen.sh # examples/model-conversion/scripts/utils/perplexity-run-simple.sh # examples/model-conversion/scripts/utils/perplexity-run.sh # examples/model-conversion/scripts/utils/quantize.sh # examples/model-conversion/scripts/utils/run-embedding-server.sh # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32.cl # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/compare-llama-bench.py # tests/test-backend-ops.cpp # tests/test-gguf.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README.md	2026-01-27 23:06:13 +08:00
Daniel Bevenius	fc3cdf32ce	common : clarify HTTPS build options in error message (#19103 ) * common : clarify HTTPS build options in error message This commit updates the https error message to provide clearer instructions for users who encounter the "HTTPS is not supported" error. The motivation for this is that it might not be clear to users that only one of these options are needed to enable HTTPS support. The LLAMA_OPENSSL option is also added to the message to cover all possible build configurations. * clarify that OpenSSL is the default for HTTPS support	2026-01-27 06:16:00 +01:00
Daniel Bevenius	16639ba217	common : use two decimal places for float arg help messages (#19048 ) * common : use two decimal places for float arg help messages This commit updates the help messages for various command-line arguments in arg.cpp to display floating-point default values with two decimal places instead of one. The motivation for this changes is that currently only having one decimal place means that values generated using --help or llama-gen-docs will not display the correct values. For example, currently the value of top-p in tools/server/README.md is `0.9`, but the default value is actually '0.95'. And running llama-gen-docs does not update this value as it uses the output from the help message, which shows only one decimal place, so the values look like they are unchanged. * docs : run llama-gen-docs to update docs	2026-01-25 07:31:42 +01:00
Johannes Gäßler	e9fd8dcab4	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00

1 2 3 4 5 ...

1064 commits