Commit graph

822 commits

Author SHA1 Message Date
Concedo
5ad826b82a updated lite (+2 squashed commit)
Squashed commit:

[31a99e1f] bump baned phrase a bit more again

[c999736b] small fix
2024-10-11 11:05:04 +08:00
Maya
3dab63887f
Add custom_token_bans (#1153) 2024-10-10 23:45:07 +08:00
Concedo
a3b104a422 further increase some limits 2024-10-10 22:27:28 +08:00
Concedo
d75cbd671d alias banned_tokens with banned_strings from ST
increase max bans to 32 for now
2024-10-10 21:52:46 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
a6bf568fda prevent GUI settings from being overridden 2024-10-10 11:46:57 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
3e8bb10e2d wip on rewind function 2024-10-06 16:21:03 +08:00
Concedo
d9fcb94472 do not suppress stdout if debugmode 2024-10-04 16:04:29 +08:00
Concedo
a785a91e56 every request has timestamp 2024-09-27 22:10:41 +08:00
Concedo
ea55f69dc1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
#	examples/infill/infill.cpp
#	examples/perplexity/perplexity.cpp
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-sampling.cpp
2024-09-27 11:21:28 +08:00
Concedo
5a4bc89c8d quiet mode on perf endpoint 2024-09-22 13:03:02 +08:00
Concedo
c38d1ecc8d update templates, fix rwkv 2024-09-22 01:32:12 +08:00
Concedo
229108f877 fixed incorrect auto gpu settings, fixed clblast not working 2024-09-21 17:59:52 +08:00
Concedo
004a35b16d add mutually exclusive group 2024-09-21 15:46:55 +08:00
Concedo
4b6a12e9c0 allow overriding kcpps values with explicit args 2024-09-21 11:00:10 +08:00
Concedo
68aee56498 updated lite, allow GUI to import launcher args and config files with showgui 2024-09-20 17:52:52 +08:00
Concedo
e958b2f78b updated lite 2024-09-17 00:16:21 +08:00
Concedo
a4249abe5d alias noblas to usecpu 2024-09-15 21:25:48 +08:00
Concedo
53bf0fb32d removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified. 2024-09-15 19:21:52 +08:00
Concedo
5b658ab6d4 updated lite 2024-09-12 10:47:47 +08:00
Concedo
70cdb55cc9 Merge commit '947538acb8' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	CMakePresets.json
#	examples/llama-bench/llama-bench.cpp
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-quantize-fns.cpp
2024-09-09 11:26:34 +08:00
Concedo
d777995991 able to handle kcpp protected model name endpoints 2024-09-04 16:26:28 +08:00
Concedo
5d34de0c08 fix basepath 2024-09-02 18:09:58 +08:00
Concedo
3c4fa57026 allow horde worker to work with password protected instances 2024-08-31 21:30:47 +08:00
Concedo
0f9968ef64 fixed some incorrect protocol prefix for localhost 2024-08-29 10:37:43 +08:00
Concedo
5f360f659c Add 5m timeout for horde worker 2024-08-28 23:17:06 +08:00
Concedo
6acbf1d7f4 macos default to full offload when using gpulayers auto (-1) 2024-08-26 12:12:51 +08:00
Concedo
97aa8648ed allow launching with no models loaded 2024-08-25 23:57:32 +08:00
Concedo
0b96097439 add version number into help page 2024-08-22 00:52:30 +08:00
Concedo
5bf527a6ae added xtc sampler 2024-08-21 23:57:15 +08:00
Concedo
cd69ab218e fixed DRY 2024-08-21 17:01:28 +08:00
Concedo
2cf6d16c40 adjust sleep time 2024-08-21 01:06:41 +08:00
Concedo
c1ae350e5b fixed race condition when generating 2024-08-20 20:17:55 +08:00
Concedo
7ee359a59b on multigpu setups, pick lowest free mem instead of highest for auto layers 2024-08-20 19:02:16 +08:00
Concedo
e9eb6fe51a move chat compl to models tab 2024-08-18 14:56:10 +08:00
Concedo
e2e6d892b4 fix declaration order 2024-08-18 02:15:34 +08:00
Concedo
d71b5477c5 update lite, cleanup, fix interrogate format 2024-08-18 00:48:53 +08:00
Concedo
2c108ab17e correct phrasing 2024-08-14 21:55:53 +08:00
Concedo
f4f24d0e14 small text change 2024-08-11 21:30:46 +08:00
Concedo
139ab3d198 generate passes whole object now 2024-08-11 00:08:13 +08:00
Concedo
da8a96199c add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow (+1 squashed commits)
Squashed commits:

[44a689de] add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow
2024-08-10 19:35:56 +08:00
Concedo
86e687ae8b updated lite, added promptlimit 2024-08-10 16:05:24 +08:00
Concedo
03adb90dc6 prompt command done 2024-08-07 20:52:28 +08:00
Concedo
853d57c53c wip prompt 2024-08-06 21:54:08 +08:00
Concedo
6b8b50b350 try fix ipv6 (+1 squashed commits)
Squashed commits:

[8d95a639] try fix ipv6
2024-08-06 15:36:46 +08:00
Concedo
381b4a1844 default multiuser true 2024-08-05 20:03:29 +08:00
Concedo
bd4e55eb74 add used memory checks, add gpulayers for metal 2024-08-05 16:32:05 +08:00
Concedo
23caa63f94 up ver 2024-08-04 23:42:22 +08:00
Concedo
bfdf4b021f adjust v4-v6 allocation, default back to localhost 2024-08-04 11:42:16 +08:00