Commit graph

727 commits

Author SHA1 Message Date
Concedo
52cc908f7f default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change. 2024-12-03 22:44:10 +08:00
Concedo
2ba5949054 updated sdcpp, also set euler as default sampler 2024-12-01 17:00:20 +08:00
Concedo
42228b9746 warning when selecting non gguf models 2024-12-01 13:35:51 +08:00
Concedo
b7cd210cd2 more linting with Ruff (+1 squashed commits)
Squashed commits:

[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Concedo
409e393d10 fixed critical bug in image model loader 2024-11-30 23:28:24 +08:00
Concedo
0028e71993 special handling to resolve incomplete utf8 token sequences in qwen 2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4 default speculative set to 8. added more adapter fields 2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee default to 12 tokens drafted 2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac customizable speculative size 2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
kallewoof
fd320f6682
/props endpoint: provide context size through default_generation_settings (#1237) 2024-11-26 16:15:27 +08:00
Concedo
1e0792a3ef comfyui emulation also done 2024-11-24 15:39:03 +08:00
Concedo
9bd27323e7 emulate comfyui txt2img 2024-11-24 11:28:12 +08:00
Concedo
bf28d956ae ollama chat api done 2024-11-24 00:10:15 +08:00
Concedo
62dde8cfb2 ollama sync completions mostly working. stupid api. 2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
c0da7e4dcf multiplayer activity tracking 2024-11-23 19:59:55 +08:00
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
Concedo
18f227625b multiplayer fixes 2024-11-22 19:02:31 +08:00
mkarr
ac6a0cde91
Support chunked encoding. (#1226)
* Support chunked encoding.

The koboldcpp API does not support HTTP chunked encoding. Some HTTP
libraries, notable Go's net/http can automatically choose to use chunked
encoding. This adds support for chunked encoding within the do_POST()
handler.

* refactor slightly to add additional safety checks and follow original format

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-11-21 18:24:04 +08:00
Concedo
c2ca2ec2bc updated docs, fixed a few issues with multiplayer 2024-11-21 18:16:13 +08:00
Concedo
272828cab0 tweaks to chat template 2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
8ab3eb89a8 updated lite 2024-11-21 10:43:48 +08:00
Concedo
a439dcb38e multiplayer error handling 2024-11-19 23:31:48 +08:00
Concedo
1b663e10c8 first functional multiplayer 2024-11-19 22:49:28 +08:00
Concedo
14cbd07eaa more wip multiplayer 2024-11-19 18:09:26 +08:00
Concedo
39124828ab wip multiplayer 2024-11-17 23:29:25 +08:00
Concedo
a8694698fd accept gguf text encoders for sd 2024-11-16 17:23:02 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00
Concedo
a5f8e596d3 unset sc if ff off 2024-11-16 10:52:33 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
df7c2b9923 renamed some labels 2024-11-11 19:40:47 +08:00
Concedo
c9977a5cb5 model downloading for new params 2024-11-07 14:41:25 +08:00
Concedo
ccbd630a42 allow custom t5, clipl and clipg 2024-11-06 19:05:48 +08:00
Concedo
f153a14daf add common identity provider /.well-known/serviceinfo, updated docs 2024-11-04 21:29:26 +08:00
Concedo
847689e74c fixed incorrect makefile flags 2024-11-04 20:39:10 +08:00
Concedo
6ac8b2bdb3 tweak ratios 2024-11-02 12:35:04 +08:00
Concedo
2a07f2dc2c minor fix 2024-11-01 22:42:57 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
6a27003a06 logprobs feature completed 2024-11-01 15:24:07 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
6731dd64f1 quick fix for trim stop 2024-10-30 11:24:55 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
bd05efd648 fix trim_stop failing on some edge cases 2024-10-27 21:41:47 +08:00
Concedo
4ec12756b3 multiuser fixes 2024-10-26 09:33:11 +08:00
Concedo
d0a6a52855 hide flash attention in quick launch for vulkan, updated lite 2024-10-24 22:00:09 +08:00
Concedo
6da5a63852 fix for uploaded wav files being incomplete due to fragmentation when converting to b64 2024-10-20 17:47:19 +08:00
Concedo
a9dbcdd3ec Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	docs/build.md
#	examples/infill/infill.cpp
#	examples/main/README.md
#	examples/server/README.md
#	flake.lock
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-sampling.cpp
2024-10-17 16:36:02 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00