* update .gitignore
Remove .idea folder created by Jet Brains products.
* Front end, and partial backe-end
Tensor Split pulled in, shows in console, then not respected on model load.
* UI Tweak + Tensor Split Fix
Made Tensor Flow input match similar boxes around it. Also, fixed Tensor Split to populate the correct argument.
* Changed int to float for tensor split
Accidentally set int, needed to be float when setting tensor split args
* If the rocm/hip sdk is installed on windows, then include the sdk
as a potential location to load the hipBlas/rocBlas .dlls from. This
allows running koboldcpp.py directly with python after building
work on windows without having to build the .exe and run that or
copy .dlls around.
Co-authored-by: one-lithe-rune <skapusniak@lithe-runes.com>
Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
* Implement basic chat/completions openai endpoint
-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create
-Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions.
-Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py
-Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine.
-Still TODO / evaluate before merging:
(1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters
(2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code)
(3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible)
Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.
* Fix typographical error on deleted streaming argument
-Mistakenly left code relating to streaming argument from main branch in experimental.
* add additional openai chat completions parameters
-support stop parameter mapped to koboldai stop_sequence parameter
-make default max_length / max_tokens parameter consistent with default 80 token length in generate function
-add support for providing name of local model in openai responses
* Revert "add additional openai chat completions parameters"
This reverts commit 443a6f7ff6.
* add additional openai chat completions parameters
-support stop parameter mapped to koboldai stop_sequence parameter
-make default max_length / max_tokens parameter consistent with default 80 token length in generate function
-add support for providing name of local model in openai responses
* add /n after formatting prompts from openaiformat
to conform with alpaca standard used as default in lite.koboldai.net
* tidy up and simplify code, do not set globals for streaming
* oai endpoints must start with v1
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>