Commit graph

139 commits

Author SHA1 Message Date
Concedo
38a8778f24 wip cfg scale 2025-05-06 23:06:25 +08:00
Concedo
13cee48740 embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:

[b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[fbbaa989f] embed aria2c for windows
2025-05-06 18:56:02 +08:00
Concedo
f59b5eb561 added toggle for guidance 2025-05-05 22:21:46 +08:00
Concedo
9cd6a1add2 allow mmproj to be run on cpu 2025-04-21 21:03:10 +08:00
Concedo
2ed6850c0b added override tensor 2025-04-20 20:56:17 +08:00
Concedo
c67510718e kv override option (+1 squashed commits)
Squashed commits:

[e615fc01] kv override option
2025-04-17 14:22:30 +08:00
Concedo
27f575dc83 inpaining support completed, invert mask added 2025-04-09 23:50:17 +08:00
Concedo
23339ace9b inpainting works in kcpp! 2025-04-09 23:01:05 +08:00
Concedo
e37f27632f clear cpu flag manually for templates, added truncation for embeddings 2025-04-02 00:18:30 +08:00
Concedo
2bdf1dacff embeddings done 2025-03-25 22:41:46 +08:00
Concedo
3992fb79cc wip adding embeddings support 2025-03-24 18:01:23 +08:00
Concedo
c1e58419c7 support for voice cloning is done (+2 squashed commit)
Squashed commit:

[e7301628] support for voice cloning is done

[1653c576] wip adding voice cloning
2025-03-21 22:28:59 +08:00
Concedo
e84596ec1a add config for default gen tokens and bos toggle 2025-03-15 19:53:06 +08:00
Concedo
eb1809c105 add more perf stats 2025-03-12 18:58:27 +08:00
Concedo
f2ac10c014 added nsigma to lite 2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp (#1384)
* Add N Sigma Sampler

* update nsigma sampler chain

* xtc position fix

* remove stray newline

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
71016db617 remove tts audio caching 2025-02-12 11:37:43 +08:00
Concedo
70f1d8d746 vision can set max res (+1 squashed commits)
Squashed commits:

[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
558bc5c901 tts can now set a length limit 2025-01-28 22:06:59 +08:00
Concedo
0e45d3bb7a quiet flags now set at load time 2025-01-25 16:46:56 +08:00
Concedo
fa7e661133 various fixes 2025-01-18 23:52:39 +08:00
Concedo
e8570de0e6 improved tts default voices quality and sample rate 2025-01-17 18:45:16 +08:00
Concedo
8e3cad1aa2 added audio caching, as a hacky fix for ST TTS bug 2025-01-16 12:04:58 +08:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
91b6e29af3 added multilingual support for whisper 2025-01-09 23:28:52 +08:00
Concedo
0cb599546e increase max supported llava images to 8 2025-01-09 22:12:06 +08:00
Concedo
568e476997 added toggle for vae tiling, use custom memory buffer 2025-01-08 13:12:03 +08:00
Concedo
60cd68a39d draft model sets gpu split instead of id, made mmq default for cli 2024-12-14 23:58:45 +08:00
Concedo
595cc6975f added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid 2024-12-13 17:11:59 +08:00
Concedo
e9d2332dd8 improved tool calls and whisper 2024-12-06 14:34:31 +08:00
Concedo
32ac3153e4 default speculative set to 8. added more adapter fields 2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee default to 12 tokens drafted 2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac customizable speculative size 2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
ccbd630a42 allow custom t5, clipl and clipg 2024-11-06 19:05:48 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
1d40303050 increase again 2024-10-14 22:09:26 +08:00
Concedo
5ad826b82a updated lite (+2 squashed commit)
Squashed commit:

[31a99e1f] bump baned phrase a bit more again

[c999736b] small fix
2024-10-11 11:05:04 +08:00
Concedo
a3b104a422 further increase some limits 2024-10-10 22:27:28 +08:00
Concedo
d75cbd671d alias banned_tokens with banned_strings from ST
increase max bans to 32 for now
2024-10-10 21:52:46 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
5bf527a6ae added xtc sampler 2024-08-21 23:57:15 +08:00
Concedo
f289fb494a bump size of some payload arr sequences from 16 to 24 2024-07-28 20:29:39 +08:00
Llama
264575426e
Add the DRY dynamic N-gram anti-repetition sampler (#982)
* Add the DRY dynamic N-gram anti-repetition sampler

The DRY (Do not Repeat Yourself) sampler is a dynamic N-gram
repetition penalty that negatively scores tokens that would extend
sequences that already appear in the context.

See this discussion for a motivation and explanation of the sampler:
https://github.com/oobabooga/text-generation-webui/pull/5677

This implementation of DRY mostly aligns with the obabooga version
with a few modifications. It uses a more efficient linear scanning
algorithm to identify repetitions. It also supports multi-token
sequence breakers. As a limitation, this implementation reuses
the rep pen range parameter, rather than introducing a new range
just for the DRY sampler.

There is a separate change to lite.koboldai.net that exposes the DRY
sampler parameters to KoboldAI Lite, so none of the embed files have
been changed as part of this commit.

* Update default DRY parameters to match lite

* Improve DRY token debug logging

* Replace `and` with `&&` to fix MSVC compile error

Little known fact: The C++98 standard defines `and` as an
alternative token for the `&&` operator (along with a bunch
of other digraphs). MSVC does not allow these without using
the /Za option or including the <iso646.h> header. Change to
the more standard operator to make this code more portable.

* Fix MSVC compile error because log is not constexpr

Replace the compile-time computation with a floating-point
approximation of log(std::numeric_limits<float>::max()).

* Remove unused llama sampler variables and clean up sequence breakers.

* Remove KCPP_SAMPLER_DRY as a separate enum entry

The DRY sampler is effectively a repetition penalty and there
are very few reasons to apply it at a different place in sampler
order than the standard single-token penalty. There are also
multiple projects that have dependencies on the existing sampler
IDs, including KoboldAI, KoboldAI Lite, and Silly Tavern. In order
to minimize the impact of the dependencies of adding the DRY sampler
to koboldcpp, it makes the most sense to not add a new ID for now,
and instead to piggyback on KCPP_SAMPLER_REP_PEN. In the future
if we find a use case for splitting the application of rep pen and DRY
we can introduce a new enum entry then.

* Add the dry_penalty_last_n to independently control DRY penalty range

This parameter follows the oobabooga semantics: it's optional, with a
default value of zero. Zero means that DRY should sample the entire
context. Otherwise, it's the number of tokens from the end of the
context that are scanned for repetitions.

* Limit sequence breaker lengths in tokens and characters

The core DRY sampler algorithm is linear in the context length, but
there are several parts of the sampler related to multi-token
sequence breakers that are potentially quadratic. Without any
restrictions, a suitably crafted context and sequence breaker could
result in a denial-of-service attack on a server running koboldcpp.
This change limits the maximum number of characters and the maximum
token length of a sequence breaker in order to limit the maximum
overhead associated with the sampler.

This change also improves some comments, adding more detail and
changing the wording to increase clarity.
2024-07-13 19:08:23 +08:00
Lexi
8ac8abb720
expose.h: initialise constants (#895)
This avoids compile-time warnings with clang:

./expose.h:66:15: note: const member 'seed' will never be initialized
   66 |     const int seed;
      |               ^

No functional change intended.
2024-06-09 15:16:33 +08:00
Concedo
10a1d628ad added new binding fields for quant k and quant v 2024-06-03 14:35:59 +08:00