koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	38a8778f24	wip cfg scale	2025-05-06 23:06:25 +08:00
Concedo	13cee48740	embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [fbbaa989f] embed aria2c for windows	2025-05-06 18:56:02 +08:00
Concedo	f59b5eb561	added toggle for guidance	2025-05-05 22:21:46 +08:00
Concedo	9cd6a1add2	allow mmproj to be run on cpu	2025-04-21 21:03:10 +08:00
Concedo	2ed6850c0b	added override tensor	2025-04-20 20:56:17 +08:00
Concedo	c67510718e	kv override option (+1 squashed commits) Squashed commits: [e615fc01] kv override option	2025-04-17 14:22:30 +08:00
Concedo	27f575dc83	inpaining support completed, invert mask added	2025-04-09 23:50:17 +08:00
Concedo	23339ace9b	inpainting works in kcpp!	2025-04-09 23:01:05 +08:00
Concedo	e37f27632f	clear cpu flag manually for templates, added truncation for embeddings	2025-04-02 00:18:30 +08:00
Concedo	2bdf1dacff	embeddings done	2025-03-25 22:41:46 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	c1e58419c7	support for voice cloning is done (+2 squashed commit) Squashed commit: [e7301628] support for voice cloning is done [1653c576] wip adding voice cloning	2025-03-21 22:28:59 +08:00
Concedo	e84596ec1a	add config for default gen tokens and bos toggle	2025-03-15 19:53:06 +08:00
Concedo	eb1809c105	add more perf stats	2025-03-12 18:58:27 +08:00
Concedo	f2ac10c014	added nsigma to lite	2025-02-21 15:11:24 +08:00
EquinoxPsychosis	2740af3660	add top n sigma sampler from llama.cpp (#1384 ) * Add N Sigma Sampler * update nsigma sampler chain * xtc position fix * remove stray newline --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>	2025-02-21 14:31:42 +08:00
Concedo	71016db617	remove tts audio caching	2025-02-12 11:37:43 +08:00
Concedo	70f1d8d746	vision can set max res (+1 squashed commits) Squashed commits: [938fc655] vision can set max res	2025-01-30 00:19:49 +08:00
Concedo	558bc5c901	tts can now set a length limit	2025-01-28 22:06:59 +08:00
Concedo	0e45d3bb7a	quiet flags now set at load time	2025-01-25 16:46:56 +08:00
Concedo	fa7e661133	various fixes	2025-01-18 23:52:39 +08:00
Concedo	e8570de0e6	improved tts default voices quality and sample rate	2025-01-17 18:45:16 +08:00
Concedo	8e3cad1aa2	added audio caching, as a hacky fix for ST TTS bug	2025-01-16 12:04:58 +08:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	91b6e29af3	added multilingual support for whisper	2025-01-09 23:28:52 +08:00
Concedo	0cb599546e	increase max supported llava images to 8	2025-01-09 22:12:06 +08:00
Concedo	568e476997	added toggle for vae tiling, use custom memory buffer	2025-01-08 13:12:03 +08:00
Concedo	60cd68a39d	draft model sets gpu split instead of id, made mmq default for cli	2024-12-14 23:58:45 +08:00
Concedo	595cc6975f	added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid	2024-12-13 17:11:59 +08:00
Concedo	e9d2332dd8	improved tool calls and whisper	2024-12-06 14:34:31 +08:00
Concedo	32ac3153e4	default speculative set to 8. added more adapter fields	2024-11-30 16:18:27 +08:00
Concedo	e0c59486ee	default to 12 tokens drafted	2024-11-30 11:52:07 +08:00
Concedo	b21d0fe3ac	customizable speculative size	2024-11-30 11:28:19 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
Concedo	3813f6c517	added new flag nofastforward allowing users to disable fast forwarding	2024-11-13 10:59:01 +08:00
Concedo	ccbd630a42	allow custom t5, clipl and clipg	2024-11-06 19:05:48 +08:00
Concedo	aa26a58085	added logprobs api and logprobs viewer	2024-11-01 00:22:15 +08:00
Concedo	90f5cd0f67	wip logprobs data	2024-10-30 00:59:34 +08:00
Maya	8bb220329c	Dynamic sizes for sequences (#1157 ) * Dynamic sizes for sequences * cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields * adjust anti abuse limits --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-10-16 23:55:11 +08:00
Concedo	1d40303050	increase again	2024-10-14 22:09:26 +08:00
Concedo	5ad826b82a	updated lite (+2 squashed commit) Squashed commit: [31a99e1f] bump baned phrase a bit more again [c999736b] small fix	2024-10-11 11:05:04 +08:00
Concedo	a3b104a422	further increase some limits	2024-10-10 22:27:28 +08:00
Concedo	d75cbd671d	alias banned_tokens with banned_strings from ST increase max bans to 32 for now	2024-10-10 21:52:46 +08:00
Concedo	fe5479f286	unify antislop and token bans	2024-10-10 18:21:07 +08:00
Concedo	65f3c68399	wip antislop	2024-10-07 20:19:22 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	f289fb494a	bump size of some payload arr sequences from 16 to 24	2024-07-28 20:29:39 +08:00
Llama	264575426e	Add the DRY dynamic N-gram anti-repetition sampler (#982 ) * Add the DRY dynamic N-gram anti-repetition sampler The DRY (Do not Repeat Yourself) sampler is a dynamic N-gram repetition penalty that negatively scores tokens that would extend sequences that already appear in the context. See this discussion for a motivation and explanation of the sampler: https://github.com/oobabooga/text-generation-webui/pull/5677 This implementation of DRY mostly aligns with the obabooga version with a few modifications. It uses a more efficient linear scanning algorithm to identify repetitions. It also supports multi-token sequence breakers. As a limitation, this implementation reuses the rep pen range parameter, rather than introducing a new range just for the DRY sampler. There is a separate change to lite.koboldai.net that exposes the DRY sampler parameters to KoboldAI Lite, so none of the embed files have been changed as part of this commit. * Update default DRY parameters to match lite * Improve DRY token debug logging * Replace `and` with `&&` to fix MSVC compile error Little known fact: The C++98 standard defines `and` as an alternative token for the `&&` operator (along with a bunch of other digraphs). MSVC does not allow these without using the /Za option or including the <iso646.h> header. Change to the more standard operator to make this code more portable. * Fix MSVC compile error because log is not constexpr Replace the compile-time computation with a floating-point approximation of log(std::numeric_limits<float>::max()). * Remove unused llama sampler variables and clean up sequence breakers. * Remove KCPP_SAMPLER_DRY as a separate enum entry The DRY sampler is effectively a repetition penalty and there are very few reasons to apply it at a different place in sampler order than the standard single-token penalty. There are also multiple projects that have dependencies on the existing sampler IDs, including KoboldAI, KoboldAI Lite, and Silly Tavern. In order to minimize the impact of the dependencies of adding the DRY sampler to koboldcpp, it makes the most sense to not add a new ID for now, and instead to piggyback on KCPP_SAMPLER_REP_PEN. In the future if we find a use case for splitting the application of rep pen and DRY we can introduce a new enum entry then. * Add the dry_penalty_last_n to independently control DRY penalty range This parameter follows the oobabooga semantics: it's optional, with a default value of zero. Zero means that DRY should sample the entire context. Otherwise, it's the number of tokens from the end of the context that are scanned for repetitions. * Limit sequence breaker lengths in tokens and characters The core DRY sampler algorithm is linear in the context length, but there are several parts of the sampler related to multi-token sequence breakers that are potentially quadratic. Without any restrictions, a suitably crafted context and sequence breaker could result in a denial-of-service attack on a server running koboldcpp. This change limits the maximum number of characters and the maximum token length of a sequence breaker in order to limit the maximum overhead associated with the sampler. This change also improves some comments, adding more detail and changing the wording to increase clarity.	2024-07-13 19:08:23 +08:00
Lexi	8ac8abb720	expose.h: initialise constants (#895 ) This avoids compile-time warnings with clang: ./expose.h:66:15: note: const member 'seed' will never be initialized 66 \| const int seed; \| ^ No functional change intended.	2024-06-09 15:16:33 +08:00
Concedo	10a1d628ad	added new binding fields for quant k and quant v	2024-06-03 14:35:59 +08:00

1 2 3

139 commits