koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	e2b36aa6cf	fixed dry loading seq when not in use, set kcppt to -1 layers by default	2024-07-22 15:44:34 +08:00
Concedo	4d9ccddc2c	don't unpack pyd	2024-07-20 18:58:49 +08:00
Concedo	1a23d49c32	serve tags endpoint	2024-07-19 16:08:54 +08:00
Concedo	a998588f3a	improved estimation	2024-07-19 00:20:11 +08:00
Concedo	caab9cb8ae	fixed unwanted removal	2024-07-18 22:27:22 +08:00
BBC-Esq	621801da0e	Streamline misc (#1007 ) * fix typo and streamline a little * streamline togglehorde * oops	2024-07-18 22:25:38 +08:00
Concedo	8b0a9f7e56	remove keys, use tuple	2024-07-18 22:11:13 +08:00
BBC-Esq	7de1ebf897	Streamline with dictionaries (#1005 ) * dictionary #1 * dictionary #2	2024-07-18 22:05:30 +08:00
BBC-Esq	ce971a0f3d	Streamline with fstrings (#1006 ) * fstring #1 * fstring #2	2024-07-18 21:48:46 +08:00
Concedo	90c1bbbcb9	more url downoad support	2024-07-18 11:56:05 +08:00
Concedo	ad86b1aeb8	Implemented Kcpp Launch Templates (+1 squashed commits) Squashed commits: [5ea4c1de] wip integrating skcpps templates (+1 squashed commits) Squashed commits: [737daa7f] skcpps wip	2024-07-18 00:22:59 +08:00
Concedo	8ccc0144d2	ability to set -1 as gpulayers and determine at runtime (+1 squashed commits) Squashed commits: [594263c3] ability to set -1 as gpulayers and determine at runtime	2024-07-17 20:31:19 +08:00
Concedo	6c883a4803	dummy skcpps format	2024-07-17 18:35:27 +08:00
Concedo	eca7521c13	allowed embedded chat adapters	2024-07-17 18:08:43 +08:00
Concedo	5988243aee	fix wrong order, fix llava debug mode failure	2024-07-17 15:30:19 +08:00
Concedo	e99fa531a2	reorder items	2024-07-17 00:28:48 +08:00
Concedo	d775a419b2	updated lite with chat inject, added layer detect, added more console logging	2024-07-16 23:10:15 +08:00
Concedo	516fd35e93	error popups on python exits	2024-07-16 00:46:32 +08:00
Concedo	21179d675b	try ci for avx1, up ver (+2 squashed commit) Squashed commit: [74150175] up version [97b6163c] try ci for avx1 linux	2024-07-15 23:07:07 +08:00
teddybear082	c08309e773	Rudimentary support of openai chat completions tools calls (#981 ) * Rudimentary support of openai chat completions tools calls -Most small models are not smart enough to do this, especially a combined tool call + role play response, but at least this allows experimentation along these lines with koboldcpp * try to also support specified function and tool choice set to none Allow tools start and end messages to be configured in adapter Try to force grammar to specific function call if specified (untested) * ensure tools get listed right after user content and before end of user message content * omit grammars approach try prompting instead -use more extensive json parsing and direct instructions to models to try to obtain the desired result -seems to work relatively well with Mistral-7B-Instruct-v.0.3.Q4_K_M.gguf and neuralhermes-2.5-mistral-7b.Q4_K_M.gguf -question of whether this is too opinionated of an approach, should the instructions be things that can be passed with the prompt template? * add back llamacpp recommended json grammar Go back to adding grammar but use "official" llamacpp grammar only not a custom one just for openai * Tidy up, remove unnecessary globals * clarity * fix missing local variable error This worked to fix the error I mentioned on my last comment --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-07-14 11:22:45 +08:00
Llama	264575426e	Add the DRY dynamic N-gram anti-repetition sampler (#982 ) * Add the DRY dynamic N-gram anti-repetition sampler The DRY (Do not Repeat Yourself) sampler is a dynamic N-gram repetition penalty that negatively scores tokens that would extend sequences that already appear in the context. See this discussion for a motivation and explanation of the sampler: https://github.com/oobabooga/text-generation-webui/pull/5677 This implementation of DRY mostly aligns with the obabooga version with a few modifications. It uses a more efficient linear scanning algorithm to identify repetitions. It also supports multi-token sequence breakers. As a limitation, this implementation reuses the rep pen range parameter, rather than introducing a new range just for the DRY sampler. There is a separate change to lite.koboldai.net that exposes the DRY sampler parameters to KoboldAI Lite, so none of the embed files have been changed as part of this commit. * Update default DRY parameters to match lite * Improve DRY token debug logging * Replace `and` with `&&` to fix MSVC compile error Little known fact: The C++98 standard defines `and` as an alternative token for the `&&` operator (along with a bunch of other digraphs). MSVC does not allow these without using the /Za option or including the <iso646.h> header. Change to the more standard operator to make this code more portable. * Fix MSVC compile error because log is not constexpr Replace the compile-time computation with a floating-point approximation of log(std::numeric_limits<float>::max()). * Remove unused llama sampler variables and clean up sequence breakers. * Remove KCPP_SAMPLER_DRY as a separate enum entry The DRY sampler is effectively a repetition penalty and there are very few reasons to apply it at a different place in sampler order than the standard single-token penalty. There are also multiple projects that have dependencies on the existing sampler IDs, including KoboldAI, KoboldAI Lite, and Silly Tavern. In order to minimize the impact of the dependencies of adding the DRY sampler to koboldcpp, it makes the most sense to not add a new ID for now, and instead to piggyback on KCPP_SAMPLER_REP_PEN. In the future if we find a use case for splitting the application of rep pen and DRY we can introduce a new enum entry then. * Add the dry_penalty_last_n to independently control DRY penalty range This parameter follows the oobabooga semantics: it's optional, with a default value of zero. Zero means that DRY should sample the entire context. Otherwise, it's the number of tokens from the end of the context that are scanned for repetitions. * Limit sequence breaker lengths in tokens and characters The core DRY sampler algorithm is linear in the context length, but there are several parts of the sampler related to multi-token sequence breakers that are potentially quadratic. Without any restrictions, a suitably crafted context and sequence breaker could result in a denial-of-service attack on a server running koboldcpp. This change limits the maximum number of characters and the maximum token length of a sequence breaker in order to limit the maximum overhead associated with the sampler. This change also improves some comments, adding more detail and changing the wording to increase clarity.	2024-07-13 19:08:23 +08:00
Concedo	f529ef26df	alias completion to completions	2024-07-12 22:53:15 +08:00
Concedo	1bf07ceabd	remove unused	2024-07-12 00:17:41 +08:00
Concedo	116d5fe58e	updated lite	2024-07-09 20:42:51 +08:00
Concedo	7f48ed39c2	allow unpacking in CLI	2024-07-06 23:00:45 +08:00
Concedo	8e5fd6f509	Merge branch 'upstream' into concedo_experimental # Conflicts: # .gitignore # README.md # docs/backend/BLIS.md # docs/backend/SYCL.md # docs/development/llama-star/idea-arch.key # docs/development/llama-star/idea-arch.pdf # docs/development/token_generation_performance_tips.md # src/llama.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp # tests/test-tokenizer-random.py	2024-07-06 19:39:24 +08:00
Concedo	6b0756506b	improvements to model downloader and chat completions adapter loader	2024-07-04 15:34:08 +08:00
Concedo	3fdbe3351d	adjust some defaults and gui launcher	2024-07-04 00:52:21 +08:00
Concedo	0fc18d2d82	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakePresets.json # README.md # flake.lock # ggml/src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2024-07-02 21:05:45 +08:00
Concedo	8a07ce306a	edit sampler order warning	2024-06-30 11:46:09 +08:00
Concedo	18df56b8cf	add tensor split gui input for vulkan	2024-06-30 11:16:48 +08:00
Concedo	5e671c2162	ignore utf decode errors	2024-06-28 16:39:48 +08:00
Concedo	1801594972	allow forced positive prompt	2024-06-27 20:21:17 +08:00
Concedo	73b99a7266	add premade chat completions adapter	2024-06-27 00:13:06 +08:00
Concedo	e42bc5d677	add negative prompt support to chat completions adapter	2024-06-26 11:12:24 +08:00
Concedo	151ff95a67	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ggml-cuda.cu # ggml-cuda/common.cuh	2024-06-25 19:25:14 +08:00
Concedo	13398477a1	fix ubatch, autoselect vulkan dgpu if possible	2024-06-22 00:23:46 +08:00
Nexesenex	153527745b	Augmented benchmark stats (#929 ) * Augmented benchmark stats v1 * output instead of coherence * populate bench flags as a flags field instead of multiple lines --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-18 21:30:36 +08:00
Concedo	ba9ef4d01b	fix to allow clblast to work even after blas backend splitoff	2024-06-17 15:02:55 +08:00
Concedo	623390e4ab	allow sdui when img model not loaded, allow sdclamped to provide a custom clamp size (+1 squashed commits) Squashed commits: [957c9c9c] allow sdui when img model not loaded, allow sdclamped to provide a custom clamp size	2024-06-14 16:58:50 +08:00
Concedo	e69da9c9d8	strings rename kobold lite to koboldai lite	2024-06-13 20:00:28 +08:00
Concedo	49e4c3fd7b	adjust lite default port, disable double BOS warning, whisper and SD go quiet when horde mode is set too	2024-06-13 15:10:35 +08:00
Concedo	02357eadf8	Merge commit '`7672adeec7`' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # kompute-shaders/op_rope_f16.comp # kompute-shaders/op_rope_f32.comp # kompute-shaders/rope_common.comp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-rope.cpp	2024-06-09 15:35:51 +08:00
Concedo	813cf829b5	allow selecting multigpu on vulkan	2024-06-06 18:36:56 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	a541a3d509	quantkv will not trigger if fa is off or ctx shift is on	2024-06-03 19:14:22 +08:00
Concedo	efee37a708	gui for quantkv	2024-06-03 18:25:57 +08:00
Concedo	10a1d628ad	added new binding fields for quant k and quant v	2024-06-03 14:35:59 +08:00
Concedo	267ee78651	change max payload to 32mb	2024-06-02 16:44:19 +08:00
Concedo	b0a7d1aba6	fixed makefile (+1 squashed commits) Squashed commits: [ef6ddaf5] try fix makefile	2024-06-02 15:21:48 +08:00

... 3 4 5 6 7 ...

798 commits