koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 09:59:50 +00:00

Author	SHA1	Message	Date
Concedo	c9308570b2	added mcp to list of capabilities, allow it to run standalone	2026-01-05 20:32:25 +08:00
Concedo	b762036388	indicate unofficial builds	2026-01-05 16:12:54 +08:00
Concedo	301a04adfc	Merge branch 'concedo' into concedo_experimental	2026-01-05 15:24:43 +08:00
Concedo	9a4eeafbfc	hotfix 1.105.3	2026-01-05 15:24:21 +08:00
Concedo	ad6c53aeff	Merge commit '`908a9e5a1e`' into concedo	2026-01-05 15:01:49 +08:00
Concedo	4d3866a016	mcp proxy is done	2026-01-05 12:24:43 +08:00
Aman Gupta	908a9e5a1e	CUDA: disable cuda graph when using n-cpu-moe (#18593 ) * CUDA: disable cuda graph when using n-cpu-moe * call ggml_cuda_set_device	2026-01-05 01:37:48 +08:00
Aman Gupta	5126c41c1c	ggml-cuda: remove unused params in ggml_cuda_graph (#18579 )	2026-01-05 01:37:09 +08:00
Concedo	91089ad1bd	wip on mcp	2026-01-04 22:52:47 +08:00
Concedo	a82c89b065	minimax template	2026-01-04 20:51:16 +08:00
Concedo	acfc1e56d2	Merge branch 'upstream' into concedo_experimental # Conflicts: # tests/test-regex-partial.cpp	2026-01-04 11:14:33 +08:00
Concedo	01c70a7d3d	allow transcribe to be used with the LLM instead if no whisper model exists	2026-01-04 11:06:05 +08:00
Aldehir Rojas	cef1d23c5a	common/grammar : replace problematic backtracking regex `[\s\S]` (#18342 ) grammar : add support for std::regex_search() with trigger patterns * common : update hermes2 pro trigger to search instead of match * common : use regex_search with anchoring for partial matching * common : adjust regex partial tests to use new pattern * grammar : check pattern directly instead of adding a type * common : adjust existing patterns to match new semantics	2026-01-03 16:02:43 -06:00
Georgi Gerganov	c69c7ebc90	graph : fix graph reuse logic when `n_pos_per_embd > 1` (#18566 )	2026-01-03 23:59:06 +02:00
Concedo	04f5445bef	fix for macos asserting on exit	2026-01-03 23:26:04 +08:00
Aman Gupta	e57f52334b	ggml-cuda: fixes for concurrent streams (#18496 )	2026-01-03 23:15:01 +08:00
Concedo	5a505cbc62	disable blackwell mma for now	2026-01-03 22:45:06 +08:00
Georgi Gerganov	a554a1ecc7	context : fix reserve token padding to n_seqs (#18536 )	2026-01-03 15:45:34 +02:00
Johannes Gäßler	0f2e42ca1d	CUDA: only allocate FA tmp buffer if needed (#18564 )	2026-01-03 13:55:53 +01:00
pl752	9dba9f5352	(Bugfix, ggml-cuda) Pool alloc count fix + small size computation type adjustment (#18559 ) * CUDA: Fixed obj byte size instead of obj count being passed to pool alloc (fattn-common, dst_tmp_meta) * CUDA: Explicitly casted some of the int alloc counts before multiplication in argsort --------- Co-authored-by: pl752 <maximpl752@gmail.com>	2026-01-03 11:13:40 +01:00
Concedo	e4abf643fa	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-rpc/ggml-rpc.cpp # src/CMakeLists.txt # src/llama-vocab.cpp	2026-01-03 15:37:30 +08:00
Wagner Bruna	0ef55844d3	sd: sync to master-453-4ff2c8c (#1907 )	2026-01-03 15:28:27 +08:00
Shouyu	bcfc8c3cec	ggml-hexagon: optimize activation function (#18393 ) * refactor: refactor silu * refactor: optimize swiglu * refactor: remove unncessary if in swiglu * refactor: refactor swiglu_oai * chore: fix formatting issue	2026-01-02 21:24:24 -08:00
Jeff Bolz	18ddaea2ae	vulkan: Optimize GGML_OP_CUMSUM (#18417 ) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback	2026-01-02 15:32:30 -06:00
Jeff Bolz	706e3f93a6	vulkan: Implement mmvq for iq1_s/iq1_m (#18450 )	2026-01-02 20:19:04 +01:00
Prabod	5755e52d15	model : Maincoder-1B support (#18534 ) * Add Maincoder model support * Removed SPM model vocabulary setting and MOE related GGUF parameters Removed trailing spaces from maincoder.cpp * removed set_vocab * added new line * Fix formatting * Add a new line for PEP8	2026-01-02 20:11:59 +01:00
Georgi Gerganov	f38de16341	metal : adjust extra size for FA buffer to avoid reallocations (#18545 )	2026-01-02 19:02:18 +02:00
Georgi Gerganov	af1e8e1a6c	graph : reduce topology branching (#18548 )	2026-01-02 19:01:56 +02:00
Concedo	77082dddfb	mcp image handling	2026-01-03 00:03:05 +08:00
Georgi Gerganov	d84a6a98be	vocab : reduce debug logs about non-EOG control tokens (#18541 ) * vocab : reduce debug logs about non-EOG control tokens * cont : add comment	2026-01-02 16:17:33 +02:00
Concedo	107def07c8	updated lite and sdui (+1 squashed commits) Squashed commits: [3172b5d19] updated lite (+1 squashed commits) Squashed commits: [45081b0e2] updated glm nothink template	2026-01-02 18:11:32 +08:00
Chris Rohlf	c6f0e832da	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
Concedo	d8942cde14	smartcache allow custom number of slots	2026-01-02 17:19:40 +08:00
Concedo	7e1ae49e7d	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cuda/ggml-cuda.cu # tests/test-backend-ops.cpp # tools/mtmd/CMakeLists.txt	2026-01-02 11:05:20 +08:00
Concedo	0a23388e7d	added images in tool call queries	2026-01-02 10:48:34 +08:00
MeeMin	e86f3c2221	cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (#18433 ) * ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension	2026-01-02 00:24:20 +01:00
Sigbjørn Skjæret	169ee68ffb	model : remove modern-bert iswa template (#18529 ) * remove modern-bert iswa template * forgotten	2026-01-02 00:06:42 +01:00
tt	ced765be44	model: support youtu-vl model (#18479 ) * Support Youtu-VL Model * merge code * fix bug * revert qwen2 code & support rsplit in minja.hpp * update warm info * fix annotation * u * revert minja.hpp * fix * Do not write routed_scaling_factor to gguf when routed_scaling_factor is None * fix expert_weights_scale * LGTM after whitespace fixes * fix * fix * fix * layers to layer_index * enum fix --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 19:25:54 +01:00
Piotr Wilkin (ilintar)	3ccccc83f7	Add conversion support for IQuestCoderForCausalLM (#18524 )	2026-01-01 18:45:55 +01:00
o7si	d0a6a31470	model : add support for JinaBertModel with non-gated ffn (#18475 ) * WIP: Initial commit for fixing JinaBert original FF type support * convert: add jina-v2-de tokenizer variant for German_Semantic_V3 * convert: fix token collision in BERT phantom vocab conversion * convert: add feed_forward_type metadata * model: add feed_forward_type metadata for jina-bert-v2 * model: jina-bert-v2 support standard GELU FFN variant * model: remove ffn_type, detect FFN variant from tensor dimensions * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * revert collision fix to be handled in separate PR --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:38:51 +01:00
o7si	2b2afade9f	convert : fix encoding of WPM vocab for BERT models (#18500 ) * convert: avoid token collision when stripping ## prefix * convert: use token types for BERT special tokens check * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:27:07 +01:00
HelloKS	f4f5019254	model: add Solar Open model (#18511 ) * model: add Solar-Open model * vocab: add solar-open to end eog blacklist * model: add proper llm type * chat: basic template for solar open * typo: fix comment about vocab * convert: sugested changes * convert: suggested changes * chat: change reasoning end tag for solar-open * llama-chat: add solar-open template	2026-01-01 18:01:43 +01:00
Concedo	bfa2ae7744	fixed smartcache bug when used with images	2026-01-02 00:35:05 +08:00
Concedo	774841ffd6	clear the images array from kcpp chat completions	2026-01-01 22:51:00 +08:00
Concedo	51edb6ae61	allow clip fa for anything besides cuda on gpu	2026-01-01 21:09:51 +08:00
Anri Lombard	d5574c919c	webui: fix code copy stripping XML/HTML tags (#18518 ) * webui: fix code copy stripping XML/HTML tags * webui: update static build	2026-01-01 13:44:11 +01:00
Aman Gupta	26831bded9	ggml-cuda: remove unneccesary prints on ggml_cuda_init (#18502 )	2026-01-01 19:18:43 +08:00
Concedo	442fa7cd7c	support for circular textures in sdcpp	2026-01-01 16:34:09 +08:00
Jeff Bolz	be47fb9285	vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 ) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk	2026-01-01 08:58:27 +01:00
Concedo	54e419f587	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # docs/ops.md # docs/ops/Metal.csv # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # grammars/README.md # models/templates/llama-cpp-deepseek-r1.jinja # scripts/sync-ggml.last # tests/test-chat.cpp	2026-01-01 15:34:10 +08:00

1 2 3 4 5 ...

11129 commits