Commit graph

806 commits

Author SHA1 Message Date
Concedo
272828cab0 tweaks to chat template 2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
8ab3eb89a8 updated lite 2024-11-21 10:43:48 +08:00
Concedo
a439dcb38e multiplayer error handling 2024-11-19 23:31:48 +08:00
Concedo
1b663e10c8 first functional multiplayer 2024-11-19 22:49:28 +08:00
Concedo
14cbd07eaa more wip multiplayer 2024-11-19 18:09:26 +08:00
Concedo
39124828ab wip multiplayer 2024-11-17 23:29:25 +08:00
Concedo
a8694698fd accept gguf text encoders for sd 2024-11-16 17:23:02 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00
Concedo
a5f8e596d3 unset sc if ff off 2024-11-16 10:52:33 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
df7c2b9923 renamed some labels 2024-11-11 19:40:47 +08:00
Concedo
c9977a5cb5 model downloading for new params 2024-11-07 14:41:25 +08:00
Concedo
ccbd630a42 allow custom t5, clipl and clipg 2024-11-06 19:05:48 +08:00
Concedo
f153a14daf add common identity provider /.well-known/serviceinfo, updated docs 2024-11-04 21:29:26 +08:00
Concedo
847689e74c fixed incorrect makefile flags 2024-11-04 20:39:10 +08:00
Concedo
6ac8b2bdb3 tweak ratios 2024-11-02 12:35:04 +08:00
Concedo
2a07f2dc2c minor fix 2024-11-01 22:42:57 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
6a27003a06 logprobs feature completed 2024-11-01 15:24:07 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
6731dd64f1 quick fix for trim stop 2024-10-30 11:24:55 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
bd05efd648 fix trim_stop failing on some edge cases 2024-10-27 21:41:47 +08:00
Concedo
4ec12756b3 multiuser fixes 2024-10-26 09:33:11 +08:00
Concedo
d0a6a52855 hide flash attention in quick launch for vulkan, updated lite 2024-10-24 22:00:09 +08:00
Concedo
6da5a63852 fix for uploaded wav files being incomplete due to fragmentation when converting to b64 2024-10-20 17:47:19 +08:00
Concedo
a9dbcdd3ec Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	docs/build.md
#	examples/infill/infill.cpp
#	examples/main/README.md
#	examples/server/README.md
#	flake.lock
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-sampling.cpp
2024-10-17 16:36:02 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
JR
8e5ffc5a58
Add header X-Accel-Buffering set to no for SSE stream requests (#1168) 2024-10-16 17:28:05 +08:00
Concedo
21b2f6168e Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental 2024-10-14 22:09:38 +08:00
Concedo
1d40303050 increase again 2024-10-14 22:09:26 +08:00
YellowRoseCx
f029de6e46
Merge pull request #69 from matoro/main (#1165)
Fix gpulayers autodetection for cublas & clblast backends
2024-10-14 20:10:41 +08:00
Concedo
8d81519ca3 direct user to gguf model resources 2024-10-12 18:39:21 +08:00
Concedo
5ad826b82a updated lite (+2 squashed commit)
Squashed commit:

[31a99e1f] bump baned phrase a bit more again

[c999736b] small fix
2024-10-11 11:05:04 +08:00
Maya
3dab63887f
Add custom_token_bans (#1153) 2024-10-10 23:45:07 +08:00
Concedo
a3b104a422 further increase some limits 2024-10-10 22:27:28 +08:00
Concedo
d75cbd671d alias banned_tokens with banned_strings from ST
increase max bans to 32 for now
2024-10-10 21:52:46 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
a6bf568fda prevent GUI settings from being overridden 2024-10-10 11:46:57 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
3e8bb10e2d wip on rewind function 2024-10-06 16:21:03 +08:00
Concedo
d9fcb94472 do not suppress stdout if debugmode 2024-10-04 16:04:29 +08:00
Concedo
a785a91e56 every request has timestamp 2024-09-27 22:10:41 +08:00
Concedo
ea55f69dc1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
#	examples/infill/infill.cpp
#	examples/perplexity/perplexity.cpp
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-sampling.cpp
2024-09-27 11:21:28 +08:00
Concedo
5a4bc89c8d quiet mode on perf endpoint 2024-09-22 13:03:02 +08:00
Concedo
c38d1ecc8d update templates, fix rwkv 2024-09-22 01:32:12 +08:00
Concedo
229108f877 fixed incorrect auto gpu settings, fixed clblast not working 2024-09-21 17:59:52 +08:00
Concedo
004a35b16d add mutually exclusive group 2024-09-21 15:46:55 +08:00
Concedo
4b6a12e9c0 allow overriding kcpps values with explicit args 2024-09-21 11:00:10 +08:00