Concedo
a4249abe5d
alias noblas to usecpu
2024-09-15 21:25:48 +08:00
Concedo
53bf0fb32d
removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.
2024-09-15 19:21:52 +08:00
Concedo
5b658ab6d4
updated lite
2024-09-12 10:47:47 +08:00
Concedo
70cdb55cc9
Merge commit ' 947538acb8
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakePresets.json
# examples/llama-bench/llama-bench.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-09-09 11:26:34 +08:00
Concedo
d777995991
able to handle kcpp protected model name endpoints
2024-09-04 16:26:28 +08:00
Concedo
5d34de0c08
fix basepath
2024-09-02 18:09:58 +08:00
Concedo
3c4fa57026
allow horde worker to work with password protected instances
2024-08-31 21:30:47 +08:00
Concedo
0f9968ef64
fixed some incorrect protocol prefix for localhost
2024-08-29 10:37:43 +08:00
Concedo
5f360f659c
Add 5m timeout for horde worker
2024-08-28 23:17:06 +08:00
Concedo
6acbf1d7f4
macos default to full offload when using gpulayers auto (-1)
2024-08-26 12:12:51 +08:00
Concedo
97aa8648ed
allow launching with no models loaded
2024-08-25 23:57:32 +08:00
Concedo
0b96097439
add version number into help page
2024-08-22 00:52:30 +08:00
Concedo
5bf527a6ae
added xtc sampler
2024-08-21 23:57:15 +08:00
Concedo
cd69ab218e
fixed DRY
2024-08-21 17:01:28 +08:00
Concedo
2cf6d16c40
adjust sleep time
2024-08-21 01:06:41 +08:00
Concedo
c1ae350e5b
fixed race condition when generating
2024-08-20 20:17:55 +08:00
Concedo
7ee359a59b
on multigpu setups, pick lowest free mem instead of highest for auto layers
2024-08-20 19:02:16 +08:00
Concedo
e9eb6fe51a
move chat compl to models tab
2024-08-18 14:56:10 +08:00
Concedo
e2e6d892b4
fix declaration order
2024-08-18 02:15:34 +08:00
Concedo
d71b5477c5
update lite, cleanup, fix interrogate format
2024-08-18 00:48:53 +08:00
Concedo
2c108ab17e
correct phrasing
2024-08-14 21:55:53 +08:00
Concedo
f4f24d0e14
small text change
2024-08-11 21:30:46 +08:00
Concedo
139ab3d198
generate passes whole object now
2024-08-11 00:08:13 +08:00
Concedo
da8a96199c
add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow (+1 squashed commits)
...
Squashed commits:
[44a689de] add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow
2024-08-10 19:35:56 +08:00
Concedo
86e687ae8b
updated lite, added promptlimit
2024-08-10 16:05:24 +08:00
Concedo
03adb90dc6
prompt command done
2024-08-07 20:52:28 +08:00
Concedo
853d57c53c
wip prompt
2024-08-06 21:54:08 +08:00
Concedo
6b8b50b350
try fix ipv6 (+1 squashed commits)
...
Squashed commits:
[8d95a639] try fix ipv6
2024-08-06 15:36:46 +08:00
Concedo
381b4a1844
default multiuser true
2024-08-05 20:03:29 +08:00
Concedo
bd4e55eb74
add used memory checks, add gpulayers for metal
2024-08-05 16:32:05 +08:00
Concedo
23caa63f94
up ver
2024-08-04 23:42:22 +08:00
Concedo
bfdf4b021f
adjust v4-v6 allocation, default back to localhost
2024-08-04 11:42:16 +08:00
Concedo
40481abf0c
allow ipv6 as well
2024-08-04 00:53:19 +08:00
Concedo
9a0976761e
use loopback ip instead of localhost
2024-08-03 00:41:32 +08:00
Concedo
6bf78967f9
more janky nonsense
2024-08-02 21:58:28 +08:00
Concedo
3a72410804
Added vulkan support for SD (+1 squashed commits)
...
Squashed commits:
[13f42f83] Added vulkan support for SD
2024-08-01 17:12:33 +08:00
Concedo
9a04060aaa
also apply even if tensor split is set
2024-07-30 23:01:50 +08:00
Concedo
2f04f848e1
if gpuid is specified, force specific order
2024-07-30 22:58:25 +08:00
Concedo
43c55bb7e2
hack to fix bad unicode fragments corrupting streamed output
2024-07-30 22:18:22 +08:00
Concedo
102eec3d22
more bugfixes in auto gpu layers selection
2024-07-29 20:38:24 +08:00
Llama
26f1df5e5f
Fix the penultimate token sometimes being lost with SSE streaming ( #1031 )
...
The token immediately before an eot token was lost when SSE streaming
was enabled if that token was contained entirely within a stop sequence.
As an example of when this could happen, consider this prompt:
Type the phrase 'pleas' once.
In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token
'as' is contained within this instruct mode stop sequence:
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
due to the word 'assistant'. Since `string_contains_sequence_substring`
returns True for 'as', this token is added to `tokenReserve` instead of
being streamed immediately. If the '<|eot_id|>' token was generated
next, the text in `tokenReserve` would be discarded.
2024-07-29 20:16:47 +08:00
Concedo
948646ff7a
do not offload if auto layers is less than 2, as its usually slower
2024-07-29 20:13:43 +08:00
Concedo
e39b8aab8b
improvements to auto layer calcs
2024-07-29 18:51:10 +08:00
Concedo
f289fb494a
bump size of some payload arr sequences from 16 to 24
2024-07-28 20:29:39 +08:00
Concedo
01afb28a63
not working
2024-07-28 11:43:10 +08:00
Concedo
eaa702852d
increased padding, it is still way too little but whatever
2024-07-27 22:32:13 +08:00
Concedo
4531ab5465
refactor some fields
2024-07-27 00:04:29 +08:00
Concedo
9f2076b4b3
fix rocminfo error
2024-07-25 22:23:36 +08:00
Concedo
57a98ba308
fixed dict loading
2024-07-25 11:41:05 +08:00
Concedo
0024d9d682
fixed order of selection
2024-07-25 11:15:30 +08:00