Commit graph

277 commits

Author SHA1 Message Date
Concedo
f288c6b5e3 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	build.zig
#	scripts/sync-ggml.sh
2023-10-10 00:09:46 +08:00
Matěj Štágl
96e9539f05
OpenAI compat API adapter (#466)
* feat: oai-adapter

* simplify optional adapter for instruct start and end tags

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-10-09 23:24:48 +08:00
Concedo
4e5b6293ab adjust streaming timings 2023-10-08 23:12:45 +08:00
Concedo
a2b8473354 force flush sse 2023-10-08 15:12:07 +08:00
Concedo
07a114de63 force debugmode to be indicated on horde, allow 64k context for gguf 2023-10-07 10:23:33 +08:00
Concedo
120695ddf7 add update link 2023-10-07 01:33:18 +08:00
Concedo
2a36c85558 abort has multiuser support via genkey too 2023-10-06 23:27:00 +08:00
Concedo
1d1232ffbc show horde job count 2023-10-06 18:42:59 +08:00
Concedo
efd0567f10 Merge branch 'concedo' into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-06 11:22:01 +08:00
grawity
9d0dd7ab11
avoid leaving a zombie process for --onready (#462)
Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
2023-10-06 11:06:37 +08:00
Concedo
da8a09ba10 use filename as default model name 2023-10-05 22:24:20 +08:00
Concedo
a0c1ba7747 Merge branch 'concedo_experimental' of https://github.com/LostRuins/llamacpp-for-kobold into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-05 21:20:21 +08:00
Concedo
b4b5c35074 add documentation for koboldcpp 2023-10-05 21:17:36 +08:00
teddybear082
f9f4cdf3c0
Implement basic chat/completions openai endpoint (#461)
* Implement basic chat/completions openai endpoint

-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create

-Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions.

-Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py

-Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine.

-Still TODO / evaluate before merging:

(1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters

(2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code)

(3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible)

Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.

* Fix typographical error on deleted streaming argument

-Mistakenly left code relating to streaming argument from main branch in experimental.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* Revert "add additional openai chat completions parameters"

This reverts commit 443a6f7ff6.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* add /n after formatting prompts from openaiformat

to conform with alpaca standard used as default in lite.koboldai.net

* tidy up and simplify code, do not set globals for streaming

* oai endpoints must start with v1

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-10-05 20:13:10 +08:00
Concedo
ce065d39d0 allow drag and drop kcpps file and openwith 2023-10-05 11:38:37 +08:00
Concedo
47f7ebb632 adjust horde worker and debugmode 2023-10-04 14:00:07 +08:00
Concedo
ea726fcffa cleanup threaded horde submit 2023-10-04 00:34:26 +08:00
Concedo
0cc740115d updated lite, improve horde worker (+1 squashed commits)
Squashed commits:

[a7c25999] improve horde worker
2023-10-03 23:44:27 +08:00
Concedo
ae8ccdc1be Remove old tkinter gui (+1 squashed commits)
Squashed commits:

[0933c1da] Remove old tkinter gui
2023-10-03 22:05:44 +08:00
Concedo
d10470a1e3 Breaking Change: Remove deprecated commands 2023-10-03 17:16:09 +08:00
Concedo
5d3e142145 use_default_badwordsids defaults to false if the parameter is missing 2023-10-02 19:41:07 +08:00
Concedo
23b9d3af49 force oai endpoints to return json 2023-10-02 12:45:14 +08:00
Concedo
0c47e79537 updated the API routing path and fixed a bug with threads 2023-10-02 11:05:19 +08:00
Concedo
dffc6bee74 deprecate some launcher arguments. 2023-10-01 22:30:48 +08:00
Concedo
b49a5bc546 formatting of text 2023-10-01 18:38:32 +08:00
Concedo
bc841ec302 flag to retain grammar, fix makefile (+2 squashed commit)
Squashed commit:

[d5cd3f28] flag to retain grammar, fix makefile

[b3352963] updated lite to v73
2023-10-01 14:39:56 +08:00
Concedo
191de1e8a3 allow launching with kcpps files 2023-09-30 19:35:03 +08:00
Concedo
ca8b315202 increase context for gguf to 32k, horde worker stats, fixed glitch in horde launcher ui, oai freq penalty, updated lite 2023-09-28 23:50:08 +08:00
Concedo
6a821b268a improved SSE streamiing 2023-09-28 17:33:34 +08:00
Concedo
cf31658cbf added a flag to keep console in foreground 2023-09-27 01:53:30 +08:00
Concedo
eb86cd4027 bump token limits 2023-09-27 01:26:00 +08:00
Concedo
8bf6f7f8b0 added simulated OAI endpoint 2023-09-27 00:49:24 +08:00
Concedo
7f112e2cd4 support genkeys in polled streaming 2023-09-26 23:46:07 +08:00
Concedo
6c2134a860 improved makefile, allowing building without k quants 2023-09-25 22:10:47 +08:00
Concedo
17ee719c56 improved remotelink cmd, fixed lib unload, updated class.py 2023-09-25 17:50:00 +08:00
Concedo
8ecf505d5d improved embedded horde worker (+2 squashed commit)
Squashed commit:

[99234379] improved embedded horde worker

[ebcd1968] update lite
2023-09-24 15:16:49 +08:00
Concedo
32cf02487e colab use mmq, update lite and ver 2023-09-23 23:32:00 +08:00
Concedo
bfc696fcc4 update lite, update ver 2023-09-23 12:35:23 +08:00
Concedo
14295922f9 updated ver, updated lite (+1 squashed commits)
Squashed commits:

[891291bc] updated lite to v67
2023-09-21 17:44:01 +08:00
Concedo
b63cf223c9 add queue info 2023-09-20 21:07:21 +08:00
Concedo
8c453d1e4e added grammar sampling 2023-09-18 23:02:00 +08:00
Concedo
951614bfc6 library unloading is working 2023-09-18 15:03:52 +08:00
Concedo
53885de6db added multiuser mode 2023-09-16 11:23:39 +08:00
YellowRoseCx
4218641d97
Separate CuBLAS/hipBLAS (#438) 2023-09-16 10:13:44 +08:00
Concedo
63fcbbb3f1 Change label to avoid confusion - rocm hipblas users should obtain binaries from yellowrosecx fork. The rocm support in this repo requires self-compilation 2023-09-16 00:04:11 +08:00
Concedo
4d3a64fbb2 add endpoint to fetch true max context 2023-09-14 23:27:12 +08:00
Concedo
3d50c6fe0b only add dll directory on windows 2023-09-13 18:45:54 +08:00
Concedo
8f8a530b83 add additional paths to loook for DLLs inside 2023-09-13 14:30:13 +08:00
Concedo
74384cfbb5 added onready argument to execute a command after load is done 2023-09-12 17:10:52 +08:00
Concedo
6667fdcec8 add option for 4th gpu, also fixed missing case in auto rope scaling 2023-09-11 11:43:54 +08:00