koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 12:11:08 +00:00

Author	SHA1	Message	Date
Concedo	645b09ea20	renamed promptlimit to genlimit, now applies to API requests as well, can be set in the ui. hide API info display if running in CLI mode.	2025-08-30 00:26:05 +08:00
Concedo	3060dfb99f	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/convert-model.sh # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/CMakeLists.txt # scripts/compare-commits.sh	2025-08-28 23:17:29 +08:00
Concedo	3655ecf9b3	minor template and tts ui fixes	2025-08-27 22:30:09 +08:00
Concedo	205a0b8d4c	fix kokoro replacement, add 4096 batch size option	2025-08-25 15:57:13 +08:00
Concedo	b0a8d11584	add tts max length for kokoro (+1 squashed commits) Squashed commits: [c1c6feaf] add tts max length for kokoro	2025-08-24 17:57:29 +08:00
Concedo	a6aa47322b	csv fix	2025-08-23 12:48:11 +08:00
Concedo	80dabbb689	minor adjustments for sdquant: allow backend to do the translation for the type more defensively, adjust the UI dropdown for clarity.	2025-08-22 23:23:32 +08:00
Wagner Bruna	2f8b0ec538	Support q8_0 quantization for image model loading (#1692 ) * Support q8_0 quantization for image model loading q4_0 may degrade quality significantly, especially for smaller models like SD 1.5 and SDXL. q8_0 provides a middle-ground, giving half the memory savings of q4_0 but loading faster and with less quality loss. * Accept --sdquant with no parameters * Use numerical values for the sdquant option	2025-08-22 22:17:15 +08:00
Concedo	7fef0bc949	fix filename regex for whisper	2025-08-22 22:04:05 +08:00
Concedo	9dd6b4c930	improve whisper transcribe apt regex	2025-08-22 17:13:51 +08:00
liuyunrui123	c13db49d5b	Log output supports utf8 encoding display (#1700 )	2025-08-21 16:52:03 +08:00
Concedo	3210b378e8	better tool calls	2025-08-20 22:11:31 +08:00
Concedo	eb33467c8c	fixed text	2025-08-20 12:25:04 +08:00
Wagner Bruna	6003e90e50	Add flash attention and conv2d direct controls for image generation (#1678 ) * Add separate flash attention config for image generation * Add config option for Conv2D Direct	2025-08-20 12:17:57 +08:00
Concedo	9fb0611115	handle contractions correctly, bump defaults	2025-08-18 22:33:44 +08:00
Concedo	2abe11071b	custom voice handling	2025-08-18 16:57:34 +08:00
Concedo	685129fb5a	add missing title, set max tts length to 1024, updated lite (+2 squashed commit) Squashed commit: [0737a028] add missing title [a42328b0] add max tts length 1024	2025-08-17 21:42:56 +08:00
Concedo	bcaf379509	tts.cpp merged and working in kcpp!	2025-08-17 18:09:28 +08:00
Concedo	52606e9b1d	tts cpp model is now loadable in kcpp	2025-08-17 15:47:22 +08:00
Concedo	5a921a40f9	add overridenativecontext flag, stop nagging me	2025-08-14 22:54:45 +08:00
Concedo	4b2ca1169c	more consistency fixes	2025-08-13 19:28:53 +08:00
Concedo	955cf66bbc	load embedding at current maxctx instead of max trained ctx by default	2025-08-13 18:42:14 +08:00
Concedo	06a3ee4c3b	populate better server identifier headers.	2025-08-13 16:10:30 +08:00
Concedo	30e2f25c05	alias tensorsplit , fixed python error	2025-08-10 22:38:14 +08:00
Concedo	8e6d27f629	handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists	2025-08-10 16:51:34 +08:00
kallewoof	204739e7f1	Adapter fixes (#1659 ) * test adapters * add assistant_gen adapter key * add support for chat templates stored as .jinja files * removed mistakenly commited gated-tokenizers link * autoguess: Harmony: add missing newline prefixes to system_end	2025-08-10 16:19:50 +08:00
Concedo	89266ac6b8	autoguess adapter make case insensitive	2025-08-10 00:58:47 +08:00
Concedo	487d509b44	try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits) Squashed commits: [940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95	2025-08-10 00:10:37 +08:00
Concedo	4c1faf61b2	increment version (+1 squashed commits) Squashed commits: [6e5080ad2] increment version	2025-08-09 20:53:26 +08:00
Concedo	ced98823a1	kai api tool calling	2025-08-09 10:51:10 +08:00
Concedo	9e7a940ce4	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/softmax_4_f16.cl # ggml/src/ggml-opencl/kernels/softmax_4_f32.cl # ggml/src/ggml-opencl/kernels/softmax_f16.cl # ggml/src/ggml-opencl/kernels/softmax_f32.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp	2025-08-09 01:24:52 +08:00
Concedo	8a71eb03c0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ggml/cmake/ggml-config.cmake.in # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # requirements/requirements-convert_hf_to_gguf.txt # scripts/compare-llama-bench.py # tests/test-chat-template.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2025-08-07 21:23:09 +08:00
Concedo	e40d26b9e7	allow offloading moe to cpu with --moecpu	2025-08-05 23:42:42 +08:00
Concedo	9fbbd9e127	half measure for mistral spaced formats	2025-08-04 23:48:11 +08:00
Concedo	6cb8f95b5b	tool calling params have been ported over to KAI api and can be used, same syntax as OAI endpoint	2025-08-03 16:21:57 +08:00
Concedo	fa815f76c9	updated model recs (+1 squashed commits) Squashed commits: [3e0431ae1] updated model recs	2025-08-02 11:41:37 +08:00
Concedo	cd0dc0abec	allow tool calls to be triggered by any role	2025-08-02 10:00:35 +08:00
Concedo	a87c05f8c1	move function call determination to separate method	2025-07-31 14:14:38 +08:00
Concedo	cade9f42bc	bump defaults	2025-07-31 12:05:57 +08:00
Concedo	1976bb3f53	fixes for tool calling	2025-07-30 19:25:39 +08:00
Concedo	abf527a207	clearer multimodal capability display	2025-07-28 22:54:49 +08:00
Concedo	ecb2cbf547	fix url params parse search	2025-07-27 16:41:42 +08:00
Concedo	8192cd6747	handle multi tool calls	2025-07-25 23:06:23 +08:00
Concedo	f25339c92b	handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess	2025-07-25 22:22:27 +08:00
Concedo	0d72c794fa	Merge commit '`c8ade30036`' into concedo_experimental # Conflicts: # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/im2col_f16.cl # ggml/src/ggml-opencl/kernels/im2col_f32.cl # ggml/src/ggml-sycl/im2col.cpp # tools/mtmd/clip.cpp	2025-07-25 19:42:45 +08:00
Concedo	8f622cfb50	debugmode longer prints	2025-07-23 19:28:39 +08:00
Concedo	4b348d0b7e	add 2 more save slots	2025-07-22 21:07:19 +08:00
Concedo	75154a3d91	add ping endpoint	2025-07-22 18:55:35 +08:00
Concedo	30675b0798	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # docs/build.md # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tools/imatrix/README.md # tools/imatrix/imatrix.cpp	2025-07-20 22:47:31 +08:00
Concedo	b028dd4e84	minor fixes	2025-07-18 13:22:59 +08:00

1 2 3 4 5 ...

1188 commits