koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 20:31:01 +00:00

Author	SHA1	Message	Date
Concedo	e4abf643fa	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-rpc/ggml-rpc.cpp # src/CMakeLists.txt # src/llama-vocab.cpp	2026-01-03 15:37:30 +08:00
Wagner Bruna	0ef55844d3	sd: sync to master-453-4ff2c8c (#1907 )	2026-01-03 15:28:27 +08:00
Shouyu	bcfc8c3cec	ggml-hexagon: optimize activation function (#18393 ) * refactor: refactor silu * refactor: optimize swiglu * refactor: remove unncessary if in swiglu * refactor: refactor swiglu_oai * chore: fix formatting issue	2026-01-02 21:24:24 -08:00
Jeff Bolz	18ddaea2ae	vulkan: Optimize GGML_OP_CUMSUM (#18417 ) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback	2026-01-02 15:32:30 -06:00
Jeff Bolz	706e3f93a6	vulkan: Implement mmvq for iq1_s/iq1_m (#18450 )	2026-01-02 20:19:04 +01:00
Prabod	5755e52d15	model : Maincoder-1B support (#18534 ) * Add Maincoder model support * Removed SPM model vocabulary setting and MOE related GGUF parameters Removed trailing spaces from maincoder.cpp * removed set_vocab * added new line * Fix formatting * Add a new line for PEP8	2026-01-02 20:11:59 +01:00
Georgi Gerganov	f38de16341	metal : adjust extra size for FA buffer to avoid reallocations (#18545 )	2026-01-02 19:02:18 +02:00
Georgi Gerganov	af1e8e1a6c	graph : reduce topology branching (#18548 )	2026-01-02 19:01:56 +02:00
Concedo	77082dddfb	mcp image handling	2026-01-03 00:03:05 +08:00
Georgi Gerganov	d84a6a98be	vocab : reduce debug logs about non-EOG control tokens (#18541 ) * vocab : reduce debug logs about non-EOG control tokens * cont : add comment	2026-01-02 16:17:33 +02:00
Concedo	107def07c8	updated lite and sdui (+1 squashed commits) Squashed commits: [3172b5d19] updated lite (+1 squashed commits) Squashed commits: [45081b0e2] updated glm nothink template	2026-01-02 18:11:32 +08:00
Chris Rohlf	c6f0e832da	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
Concedo	d8942cde14	smartcache allow custom number of slots	2026-01-02 17:19:40 +08:00
Concedo	7e1ae49e7d	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cuda/ggml-cuda.cu # tests/test-backend-ops.cpp # tools/mtmd/CMakeLists.txt	2026-01-02 11:05:20 +08:00
Concedo	0a23388e7d	added images in tool call queries	2026-01-02 10:48:34 +08:00
MeeMin	e86f3c2221	cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (#18433 ) * ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension	2026-01-02 00:24:20 +01:00
Sigbjørn Skjæret	169ee68ffb	model : remove modern-bert iswa template (#18529 ) * remove modern-bert iswa template * forgotten	2026-01-02 00:06:42 +01:00
tt	ced765be44	model: support youtu-vl model (#18479 ) * Support Youtu-VL Model * merge code * fix bug * revert qwen2 code & support rsplit in minja.hpp * update warm info * fix annotation * u * revert minja.hpp * fix * Do not write routed_scaling_factor to gguf when routed_scaling_factor is None * fix expert_weights_scale * LGTM after whitespace fixes * fix * fix * fix * layers to layer_index * enum fix --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 19:25:54 +01:00
Piotr Wilkin (ilintar)	3ccccc83f7	Add conversion support for IQuestCoderForCausalLM (#18524 )	2026-01-01 18:45:55 +01:00
o7si	d0a6a31470	model : add support for JinaBertModel with non-gated ffn (#18475 ) * WIP: Initial commit for fixing JinaBert original FF type support * convert: add jina-v2-de tokenizer variant for German_Semantic_V3 * convert: fix token collision in BERT phantom vocab conversion * convert: add feed_forward_type metadata * model: add feed_forward_type metadata for jina-bert-v2 * model: jina-bert-v2 support standard GELU FFN variant * model: remove ffn_type, detect FFN variant from tensor dimensions * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * revert collision fix to be handled in separate PR --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:38:51 +01:00
o7si	2b2afade9f	convert : fix encoding of WPM vocab for BERT models (#18500 ) * convert: avoid token collision when stripping ## prefix * convert: use token types for BERT special tokens check * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:27:07 +01:00
HelloKS	f4f5019254	model: add Solar Open model (#18511 ) * model: add Solar-Open model * vocab: add solar-open to end eog blacklist * model: add proper llm type * chat: basic template for solar open * typo: fix comment about vocab * convert: sugested changes * convert: suggested changes * chat: change reasoning end tag for solar-open * llama-chat: add solar-open template	2026-01-01 18:01:43 +01:00
Concedo	bfa2ae7744	fixed smartcache bug when used with images	2026-01-02 00:35:05 +08:00
Concedo	774841ffd6	clear the images array from kcpp chat completions	2026-01-01 22:51:00 +08:00
Concedo	51edb6ae61	allow clip fa for anything besides cuda on gpu	2026-01-01 21:09:51 +08:00
Anri Lombard	d5574c919c	webui: fix code copy stripping XML/HTML tags (#18518 ) * webui: fix code copy stripping XML/HTML tags * webui: update static build	2026-01-01 13:44:11 +01:00
Aman Gupta	26831bded9	ggml-cuda: remove unneccesary prints on ggml_cuda_init (#18502 )	2026-01-01 19:18:43 +08:00
Concedo	442fa7cd7c	support for circular textures in sdcpp	2026-01-01 16:34:09 +08:00
Jeff Bolz	be47fb9285	vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 ) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk	2026-01-01 08:58:27 +01:00
Concedo	54e419f587	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # docs/ops.md # docs/ops/Metal.csv # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # grammars/README.md # models/templates/llama-cpp-deepseek-r1.jinja # scripts/sync-ggml.last # tests/test-chat.cpp	2026-01-01 15:34:10 +08:00
Concedo	66ccf8f6b8	Merge commit '`f14f4e421b`' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # AGENTS.md # CONTRIBUTING.md # docs/build.md # examples/llama.android/app/build.gradle.kts # examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt # examples/llama.android/app/src/main/res/layout/activity_main.xml # examples/llama.android/gradle/libs.versions.toml # examples/llama.android/lib/src/main/cpp/ai_chat.cpp # examples/llama.android/lib/src/main/java/com/arm/aichat/InferenceEngine.kt # examples/llama.android/lib/src/main/java/com/arm/aichat/internal/InferenceEngineImpl.kt # examples/model-conversion/scripts/causal/compare-embeddings-logits.sh # examples/model-conversion/scripts/embedding/run-original-model.py # examples/retrieval/retrieval.cpp # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-cuda/mmq.cu # ggml/src/ggml-cuda/mmq.cuh # src/CMakeLists.txt # tools/llama-bench/llama-bench.cpp # tools/server/CMakeLists.txt	2026-01-01 15:20:56 +08:00
triplenom	9e10bd2eaf	llama: handle short reads in direct I/O path (#18504 )	2026-01-01 10:24:43 +08:00
Anri Lombard	4cd162a123	chat: make tool description and parameters optional per OpenAI spec (#18478 ) * chat: make tool description and parameters optional per OpenAI spec Per the OpenAI API specification, both 'description' and 'parameters' fields in tool function definitions are optional. Previously, the parser would throw an exception if these fields were missing. Attempts to fix #17667 * refactor: use value() for cleaner optional field access	2025-12-31 17:21:37 -06:00
Concedo	03df0c40f3	if gendefaults is set, horde has debug flag	2026-01-01 00:54:57 +08:00
Georgi Gerganov	13814eb370	sync : ggml	2025-12-31 18:54:43 +02:00
Georgi Gerganov	54f67b9b66	ggml : bump version to 0.9.5 (ggml/1410)	2025-12-31 18:54:43 +02:00
Anri Lombard	33ded988ba	quantize: prevent input/output file collision (#18451 ) Check if input and output files are the same before quantizing to prevent file corruption when mmap reads from a file being written to. Fixes #12753	2025-12-31 23:29:03 +08:00
Concedo	4c3cf7ba56	updated lite	2025-12-31 23:07:25 +08:00
Sigbjørn Skjæret	0db8109849	convert : lint fix (#18507 )	2025-12-31 14:28:21 +01:00
Henry147147	9b8329de7a	mtmd : Adding support for Nvidia Music Flamingo Model (#18470 ) * Inital commit, debugging q5_k_s quant * Made hf_to_gguf extend whisper to reduce code duplication * addressed convert_hf_to_gguf pull request issue --------- Co-authored-by: Henry D <henrydorsey147@gmail.com>	2025-12-31 12:13:23 +01:00
Concedo	76ef726ec8	adaptive p sharpness to 10.0f	2025-12-31 17:28:30 +08:00
gatbontonpc	9a6369bb60	metal : add count_equal op (#18314 ) * add count equal for metal * remove trailing whitespace * updated doc ops table * changed shmem to i32 * added multi tg and templating * removed BLAS support from Metal docs * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add memset to set dst to 0 * metal : cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-31 10:39:48 +02:00
Johannes Gäßler	ecc343de63	CUDA: fix KQ max calculation (#18487 )	2025-12-31 09:37:00 +01:00
Georgi Gerganov	01ade96e71	metal : remove BF16 x F16 kernels (#18456 )	2025-12-31 09:53:48 +02:00
Aman Gupta	7bcaf815c2	sycl: add newline at the end of CMakeLists.txt (#18503 )	2025-12-31 14:23:44 +08:00
Rahul Sathe	c8a3798041	Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (#18345 ) * cmake: work around broken IntelSYCLConfig.cmake in oneAPI 2025.x * [AI] sycl: auto-detect and skip incompatible IntelSYCL package Automatically detect compiler versions with incompatible IntelSYCL CMake configuration files and fall back to manual SYCL flags instead of requiring users to set options manually. Fixes build failures with oneAPI 2025.x where IntelSYCLConfig.cmake has SYCL_FEATURE_TEST_EXTRACT invocation errors. * refactor: improve SYCL provider handling and error messages in CMake configuration * refactor: enhance SYCL provider validation and error handling in CMake configuration * ggml-sycl: wrap find_package(IntelSYCL) to prevent build crashes	2025-12-31 09:08:44 +08:00
Sigbjørn Skjæret	4849661d98	docker : add CUDA 13.1 image build (#18441 ) * add updated cuda-new.Dockerfile for Ubuntu 24.04 compatibilty * add cuda13 build	2025-12-30 22:28:53 +01:00
Bart Louwers	6e0c8cbc40	docs : document that JSON Schema is not available to model when using response_format (#18492 ) * Document unsupported JSON Schema annotations Add note about unsupported JSON Schema annotations. * Update README.md * Update README.md * Update README.md	2025-12-30 15:13:49 -06:00
Aldehir Rojas	0f89d2ecf1	common : default content to an empty string (#18485 ) * common : default content to an empty string * common : fix tests that break when content != null	2025-12-30 12:00:57 -06:00
Daniel Bevenius	ac1d0eb7bf	llama : fix typo in comment in llama-kv-cache.h [no ci] (#18489 )	2025-12-30 17:20:14 +01:00

1 2 3 4 5 ...

11109 commits