Commit graph

713 commits

Author SHA1 Message Date
Concedo
62dde8cfb2 ollama sync completions mostly working. stupid api. 2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
c0da7e4dcf multiplayer activity tracking 2024-11-23 19:59:55 +08:00
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
Concedo
18f227625b multiplayer fixes 2024-11-22 19:02:31 +08:00
mkarr
ac6a0cde91
Support chunked encoding. (#1226)
* Support chunked encoding.

The koboldcpp API does not support HTTP chunked encoding. Some HTTP
libraries, notable Go's net/http can automatically choose to use chunked
encoding. This adds support for chunked encoding within the do_POST()
handler.

* refactor slightly to add additional safety checks and follow original format

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-11-21 18:24:04 +08:00
Concedo
c2ca2ec2bc updated docs, fixed a few issues with multiplayer 2024-11-21 18:16:13 +08:00
Concedo
272828cab0 tweaks to chat template 2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
8ab3eb89a8 updated lite 2024-11-21 10:43:48 +08:00
Concedo
a439dcb38e multiplayer error handling 2024-11-19 23:31:48 +08:00
Concedo
1b663e10c8 first functional multiplayer 2024-11-19 22:49:28 +08:00
Concedo
14cbd07eaa more wip multiplayer 2024-11-19 18:09:26 +08:00
Concedo
39124828ab wip multiplayer 2024-11-17 23:29:25 +08:00
Concedo
a8694698fd accept gguf text encoders for sd 2024-11-16 17:23:02 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00
Concedo
a5f8e596d3 unset sc if ff off 2024-11-16 10:52:33 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
df7c2b9923 renamed some labels 2024-11-11 19:40:47 +08:00
Concedo
c9977a5cb5 model downloading for new params 2024-11-07 14:41:25 +08:00
Concedo
ccbd630a42 allow custom t5, clipl and clipg 2024-11-06 19:05:48 +08:00
Concedo
f153a14daf add common identity provider /.well-known/serviceinfo, updated docs 2024-11-04 21:29:26 +08:00
Concedo
847689e74c fixed incorrect makefile flags 2024-11-04 20:39:10 +08:00
Concedo
6ac8b2bdb3 tweak ratios 2024-11-02 12:35:04 +08:00
Concedo
2a07f2dc2c minor fix 2024-11-01 22:42:57 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
6a27003a06 logprobs feature completed 2024-11-01 15:24:07 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
6731dd64f1 quick fix for trim stop 2024-10-30 11:24:55 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
bd05efd648 fix trim_stop failing on some edge cases 2024-10-27 21:41:47 +08:00
Concedo
4ec12756b3 multiuser fixes 2024-10-26 09:33:11 +08:00
Concedo
d0a6a52855 hide flash attention in quick launch for vulkan, updated lite 2024-10-24 22:00:09 +08:00
Concedo
6da5a63852 fix for uploaded wav files being incomplete due to fragmentation when converting to b64 2024-10-20 17:47:19 +08:00
Concedo
a9dbcdd3ec Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	docs/build.md
#	examples/infill/infill.cpp
#	examples/main/README.md
#	examples/server/README.md
#	flake.lock
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-sampling.cpp
2024-10-17 16:36:02 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
JR
8e5ffc5a58
Add header X-Accel-Buffering set to no for SSE stream requests (#1168) 2024-10-16 17:28:05 +08:00
Concedo
21b2f6168e Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental 2024-10-14 22:09:38 +08:00
Concedo
1d40303050 increase again 2024-10-14 22:09:26 +08:00
YellowRoseCx
f029de6e46
Merge pull request #69 from matoro/main (#1165)
Fix gpulayers autodetection for cublas & clblast backends
2024-10-14 20:10:41 +08:00
Concedo
8d81519ca3 direct user to gguf model resources 2024-10-12 18:39:21 +08:00
Concedo
5ad826b82a updated lite (+2 squashed commit)
Squashed commit:

[31a99e1f] bump baned phrase a bit more again

[c999736b] small fix
2024-10-11 11:05:04 +08:00
Maya
3dab63887f
Add custom_token_bans (#1153) 2024-10-10 23:45:07 +08:00
Concedo
a3b104a422 further increase some limits 2024-10-10 22:27:28 +08:00
Concedo
d75cbd671d alias banned_tokens with banned_strings from ST
increase max bans to 32 for now
2024-10-10 21:52:46 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
a6bf568fda prevent GUI settings from being overridden 2024-10-10 11:46:57 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
3e8bb10e2d wip on rewind function 2024-10-06 16:21:03 +08:00
Concedo
d9fcb94472 do not suppress stdout if debugmode 2024-10-04 16:04:29 +08:00