Commit graph

798 commits

Author SHA1 Message Date
Concedo
e2b36aa6cf fixed dry loading seq when not in use, set kcppt to -1 layers by default 2024-07-22 15:44:34 +08:00
Concedo
4d9ccddc2c don't unpack pyd 2024-07-20 18:58:49 +08:00
Concedo
1a23d49c32 serve tags endpoint 2024-07-19 16:08:54 +08:00
Concedo
a998588f3a improved estimation 2024-07-19 00:20:11 +08:00
Concedo
caab9cb8ae fixed unwanted removal 2024-07-18 22:27:22 +08:00
BBC-Esq
621801da0e
Streamline misc (#1007)
* fix typo and streamline a little

* streamline togglehorde

* oops
2024-07-18 22:25:38 +08:00
Concedo
8b0a9f7e56 remove keys, use tuple 2024-07-18 22:11:13 +08:00
BBC-Esq
7de1ebf897
Streamline with dictionaries (#1005)
* dictionary #1

* dictionary #2
2024-07-18 22:05:30 +08:00
BBC-Esq
ce971a0f3d
Streamline with fstrings (#1006)
* fstring #1

* fstring #2
2024-07-18 21:48:46 +08:00
Concedo
90c1bbbcb9 more url downoad support 2024-07-18 11:56:05 +08:00
Concedo
ad86b1aeb8 Implemented Kcpp Launch Templates (+1 squashed commits)
Squashed commits:

[5ea4c1de] wip integrating skcpps templates (+1 squashed commits)

Squashed commits:

[737daa7f] skcpps wip
2024-07-18 00:22:59 +08:00
Concedo
8ccc0144d2 ability to set -1 as gpulayers and determine at runtime (+1 squashed commits)
Squashed commits:

[594263c3] ability to set -1 as gpulayers and determine at runtime
2024-07-17 20:31:19 +08:00
Concedo
6c883a4803 dummy skcpps format 2024-07-17 18:35:27 +08:00
Concedo
eca7521c13 allowed embedded chat adapters 2024-07-17 18:08:43 +08:00
Concedo
5988243aee fix wrong order, fix llava debug mode failure 2024-07-17 15:30:19 +08:00
Concedo
e99fa531a2 reorder items 2024-07-17 00:28:48 +08:00
Concedo
d775a419b2 updated lite with chat inject, added layer detect, added more console logging 2024-07-16 23:10:15 +08:00
Concedo
516fd35e93 error popups on python exits 2024-07-16 00:46:32 +08:00
Concedo
21179d675b try ci for avx1, up ver (+2 squashed commit)
Squashed commit:

[74150175] up version

[97b6163c] try ci for avx1 linux
2024-07-15 23:07:07 +08:00
teddybear082
c08309e773
Rudimentary support of openai chat completions tools calls (#981)
* Rudimentary support of openai chat completions tools calls

-Most small models are not smart enough to do this, especially a combined tool call + role play response, but at least this allows experimentation along these lines with koboldcpp

* try to also support specified function and tool choice set to none

Allow tools start and end messages to be configured in adapter

Try to force grammar to specific function call if specified (untested)

* ensure tools get listed right after user content and before end of user message content

* omit grammars approach try prompting instead

-use more extensive json parsing and direct instructions to models to try to obtain the desired result

-seems to work relatively well with Mistral-7B-Instruct-v.0.3.Q4_K_M.gguf and neuralhermes-2.5-mistral-7b.Q4_K_M.gguf

-question of whether this is too opinionated of an approach, should the instructions be things that can be passed with the prompt template?

* add back llamacpp recommended json grammar

Go back to adding grammar but use "official" llamacpp grammar only not a custom one just for openai

* Tidy up, remove unnecessary globals

* clarity

* fix missing local variable error

This worked to fix the error I mentioned on my last comment

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-07-14 11:22:45 +08:00
Llama
264575426e
Add the DRY dynamic N-gram anti-repetition sampler (#982)
* Add the DRY dynamic N-gram anti-repetition sampler

The DRY (Do not Repeat Yourself) sampler is a dynamic N-gram
repetition penalty that negatively scores tokens that would extend
sequences that already appear in the context.

See this discussion for a motivation and explanation of the sampler:
https://github.com/oobabooga/text-generation-webui/pull/5677

This implementation of DRY mostly aligns with the obabooga version
with a few modifications. It uses a more efficient linear scanning
algorithm to identify repetitions. It also supports multi-token
sequence breakers. As a limitation, this implementation reuses
the rep pen range parameter, rather than introducing a new range
just for the DRY sampler.

There is a separate change to lite.koboldai.net that exposes the DRY
sampler parameters to KoboldAI Lite, so none of the embed files have
been changed as part of this commit.

* Update default DRY parameters to match lite

* Improve DRY token debug logging

* Replace `and` with `&&` to fix MSVC compile error

Little known fact: The C++98 standard defines `and` as an
alternative token for the `&&` operator (along with a bunch
of other digraphs). MSVC does not allow these without using
the /Za option or including the <iso646.h> header. Change to
the more standard operator to make this code more portable.

* Fix MSVC compile error because log is not constexpr

Replace the compile-time computation with a floating-point
approximation of log(std::numeric_limits<float>::max()).

* Remove unused llama sampler variables and clean up sequence breakers.

* Remove KCPP_SAMPLER_DRY as a separate enum entry

The DRY sampler is effectively a repetition penalty and there
are very few reasons to apply it at a different place in sampler
order than the standard single-token penalty. There are also
multiple projects that have dependencies on the existing sampler
IDs, including KoboldAI, KoboldAI Lite, and Silly Tavern. In order
to minimize the impact of the dependencies of adding the DRY sampler
to koboldcpp, it makes the most sense to not add a new ID for now,
and instead to piggyback on KCPP_SAMPLER_REP_PEN. In the future
if we find a use case for splitting the application of rep pen and DRY
we can introduce a new enum entry then.

* Add the dry_penalty_last_n to independently control DRY penalty range

This parameter follows the oobabooga semantics: it's optional, with a
default value of zero. Zero means that DRY should sample the entire
context. Otherwise, it's the number of tokens from the end of the
context that are scanned for repetitions.

* Limit sequence breaker lengths in tokens and characters

The core DRY sampler algorithm is linear in the context length, but
there are several parts of the sampler related to multi-token
sequence breakers that are potentially quadratic. Without any
restrictions, a suitably crafted context and sequence breaker could
result in a denial-of-service attack on a server running koboldcpp.
This change limits the maximum number of characters and the maximum
token length of a sequence breaker in order to limit the maximum
overhead associated with the sampler.

This change also improves some comments, adding more detail and
changing the wording to increase clarity.
2024-07-13 19:08:23 +08:00
Concedo
f529ef26df alias completion to completions 2024-07-12 22:53:15 +08:00
Concedo
1bf07ceabd remove unused 2024-07-12 00:17:41 +08:00
Concedo
116d5fe58e updated lite 2024-07-09 20:42:51 +08:00
Concedo
7f48ed39c2 allow unpacking in CLI 2024-07-06 23:00:45 +08:00
Concedo
8e5fd6f509 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.gitignore
#	README.md
#	docs/backend/BLIS.md
#	docs/backend/SYCL.md
#	docs/development/llama-star/idea-arch.key
#	docs/development/llama-star/idea-arch.pdf
#	docs/development/token_generation_performance_tips.md
#	src/llama.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
#	tests/test-tokenizer-random.py
2024-07-06 19:39:24 +08:00
Concedo
6b0756506b improvements to model downloader and chat completions adapter loader 2024-07-04 15:34:08 +08:00
Concedo
3fdbe3351d adjust some defaults and gui launcher 2024-07-04 00:52:21 +08:00
Concedo
0fc18d2d82 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package.nix
#	CMakePresets.json
#	README.md
#	flake.lock
#	ggml/src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
2024-07-02 21:05:45 +08:00
Concedo
8a07ce306a edit sampler order warning 2024-06-30 11:46:09 +08:00
Concedo
18df56b8cf add tensor split gui input for vulkan 2024-06-30 11:16:48 +08:00
Concedo
5e671c2162 ignore utf decode errors 2024-06-28 16:39:48 +08:00
Concedo
1801594972 allow forced positive prompt 2024-06-27 20:21:17 +08:00
Concedo
73b99a7266 add premade chat completions adapter 2024-06-27 00:13:06 +08:00
Concedo
e42bc5d677 add negative prompt support to chat completions adapter 2024-06-26 11:12:24 +08:00
Concedo
151ff95a67 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml-cuda.cu
#	ggml-cuda/common.cuh
2024-06-25 19:25:14 +08:00
Concedo
13398477a1 fix ubatch, autoselect vulkan dgpu if possible 2024-06-22 00:23:46 +08:00
Nexesenex
153527745b
Augmented benchmark stats (#929)
* Augmented benchmark stats v1

* output instead of coherence

* populate bench flags as a flags field instead of multiple lines

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-18 21:30:36 +08:00
Concedo
ba9ef4d01b fix to allow clblast to work even after blas backend splitoff 2024-06-17 15:02:55 +08:00
Concedo
623390e4ab allow sdui when img model not loaded, allow sdclamped to provide a custom clamp size (+1 squashed commits)
Squashed commits:

[957c9c9c] allow sdui when img model not loaded, allow sdclamped to provide a custom clamp size
2024-06-14 16:58:50 +08:00
Concedo
e69da9c9d8 strings rename kobold lite to koboldai lite 2024-06-13 20:00:28 +08:00
Concedo
49e4c3fd7b adjust lite default port, disable double BOS warning, whisper and SD go quiet when horde mode is set too 2024-06-13 15:10:35 +08:00
Concedo
02357eadf8 Merge commit '7672adeec7' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	kompute-shaders/op_rope_f16.comp
#	kompute-shaders/op_rope_f32.comp
#	kompute-shaders/rope_common.comp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-rope.cpp
2024-06-09 15:35:51 +08:00
Concedo
813cf829b5 allow selecting multigpu on vulkan 2024-06-06 18:36:56 +08:00
Concedo
10b148f4c2 added skip bos for tokenize endpoint 2024-06-05 10:49:11 +08:00
Concedo
a541a3d509 quantkv will not trigger if fa is off or ctx shift is on 2024-06-03 19:14:22 +08:00
Concedo
efee37a708 gui for quantkv 2024-06-03 18:25:57 +08:00
Concedo
10a1d628ad added new binding fields for quant k and quant v 2024-06-03 14:35:59 +08:00
Concedo
267ee78651 change max payload to 32mb 2024-06-02 16:44:19 +08:00
Concedo
b0a7d1aba6 fixed makefile (+1 squashed commits)
Squashed commits:

[ef6ddaf5] try fix makefile
2024-06-02 15:21:48 +08:00