Commit graph

56 commits

Author SHA1 Message Date
Concedo
08e0745e7e added singleinstance flag and local shutdown api 2025-05-31 11:37:32 +08:00
Concedo
7a7bdeab6d json to gbnf endpoint added 2025-04-12 11:41:11 +08:00
Concedo
30e3d24ead embd include name 2025-04-02 00:40:38 +08:00
Concedo
396875e1c4 update api docs and lite 2025-03-29 15:39:25 +08:00
Concedo
2bdf1dacff embeddings done 2025-03-25 22:41:46 +08:00
Concedo
6a1dd57435 gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3 2025-03-14 17:47:01 +08:00
Concedo
6b7d2349a7 Rewrite history to fix bad vulkan shader commits without increasing repo size
added dpe colab (+8 squashed commit)

Squashed commit:

[b8362da4] updated lite

[ed6c037d] move nsigma into the regular sampler stack

[ac5f61c6] relative filepath fixed

[05fe96ab] export template

[ed0a5a3e] nix_example.md: refactor (#1401)

* nix_example.md: add override example

* nix_example.md: drop graphics example, already basic nixos knowledge

* nix_example.md: format

* nix_example.md: Vulkan is disabled on macOS

Disabled in: 1ccd253acc

* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}

Fixes: https://github.com/LostRuins/koboldcpp/issues/1367

[675c62f7] AutoGuess: Phi 4 (mini) (#1402)

[4bf56982] phrasing

[b8c0df04] Add Rep Pen to Top N Sigma sampler chain (#1397)

- place after nsigma and before xtc (+3 squashed commit)

Squashed commit:

[87c52b97] disable VMM from HIP

[ee8906f3] edit description

[e85c0e69] Remove Unnecessary Rep Counting (#1394)

* stop counting reps

* fix range-based initializer

* strike that - reverse it
2025-03-05 00:02:20 +08:00
Concedo
ccd2dbe020 added support for server side save slots 2025-02-24 00:20:16 +08:00
Concedo
e8570de0e6 improved tts default voices quality and sample rate 2025-01-17 18:45:16 +08:00
Concedo
f8a9634aa2 better xtts and oai speech (+1 squashed commits)
Squashed commits:

[34b9c15f] better xtts and oai speech
2025-01-16 00:26:21 +08:00
Concedo
e07de2ea92 try fix webbrowser again 2025-01-15 00:53:24 +08:00
Concedo
636beac6d2 added a nicer built in voice 2025-01-13 23:26:54 +08:00
Concedo
91b6e29af3 added multilingual support for whisper 2025-01-09 23:28:52 +08:00
Concedo
c73d99ccac updated lite 2025-01-08 13:35:59 +08:00
Concedo
3fea11675d websearch integrated into lite, changed to POST 2024-12-30 17:30:41 +08:00
Concedo
52cc908f7f default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change. 2024-12-03 22:44:10 +08:00
Concedo
bf28d956ae ollama chat api done 2024-11-24 00:10:15 +08:00
Concedo
c0da7e4dcf multiplayer activity tracking 2024-11-23 19:59:55 +08:00
Concedo
c2ca2ec2bc updated docs, fixed a few issues with multiplayer 2024-11-21 18:16:13 +08:00
Concedo
e7897f3257 update docs 2024-11-17 11:43:49 +08:00
Concedo
df7c2b9923 renamed some labels 2024-11-11 19:40:47 +08:00
Concedo
f153a14daf add common identity provider /.well-known/serviceinfo, updated docs 2024-11-04 21:29:26 +08:00
Concedo
6a27003a06 logprobs feature completed 2024-11-01 15:24:07 +08:00
Concedo
2da346a5ff updated docs 2024-10-11 20:42:05 +08:00
Concedo
4f57862040 update docs 2024-08-31 22:52:09 +08:00
Concedo
efb8be013e fixed swagger 2024-08-25 23:29:55 +08:00
Concedo
d775a419b2 updated lite with chat inject, added layer detect, added more console logging 2024-07-16 23:10:15 +08:00
Concedo
e69da9c9d8 strings rename kobold lite to koboldai lite 2024-06-13 20:00:28 +08:00
Concedo
a97f7d5f91 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/full-rocm.Dockerfile
#	.devops/full.Dockerfile
#	.devops/main-cuda.Dockerfile
#	.devops/main-intel.Dockerfile
#	.devops/main-rocm.Dockerfile
#	.devops/main.Dockerfile
#	.devops/server-cuda.Dockerfile
#	.devops/server-intel.Dockerfile
#	.devops/server-rocm.Dockerfile
#	.devops/server.Dockerfile
#	.devops/tools.sh
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	llama.cpp
#	requirements.txt
#	requirements/requirements-convert-hf-to-gguf-update.txt
#	requirements/requirements-convert-hf-to-gguf.txt
#	requirements/requirements-convert-legacy-llama.txt
#	requirements/requirements-convert-llama-ggml-to-gguf.txt
#	scripts/check-requirements.sh
#	scripts/compare-llama-bench.py
#	scripts/convert-gg.sh
#	scripts/pod-llama.sh
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-tokenizer-0.sh
#	tests/test-tokenizer-random.py
2024-06-02 12:28:38 +08:00
Concedo
a65e0800ab update docs, added gui for whisper 2024-06-01 02:01:49 +08:00
Concedo
868446bd1a replace sdconfig and hordeconfig 2024-05-09 22:43:50 +08:00
Concedo
0d1cd0171a update docs 2024-05-06 21:17:11 +08:00
Concedo
6c3fd5b685 updated lite (+2 squashed commit)
Squashed commit:

[d10a731e] update lite

[2554b8e6] update docs
2024-04-28 10:40:57 +08:00
Concedo
c230b78906 refactored a lot of code, remove bantokens, move it to api 2024-04-27 17:57:13 +08:00
Concedo
b4d2031215 merged, added ability to render special tokens 2024-04-22 18:19:58 +08:00
Concedo
0061299cce fixed quant tools not compiling, updated docs 2024-04-06 23:11:05 +08:00
Concedo
6c6ad93f01 added basic support for password protection (+2 squashed commit)
Squashed commit:

[ff91ca72] added basic support for password protection

[91b0b208] updated docs
2024-03-12 19:47:12 +08:00
Concedo
d59ec68753 added interrogate endpoint (+1 squashed commits)
Squashed commits:

[7bf96261] added interrogate endpoint
2024-03-11 18:50:18 +08:00
Concedo
d4a12133e7 added SD samplers endpoint 2024-03-04 14:26:49 +08:00
Concedo
0c59c1ed90 allow specifying width and height 2024-03-03 15:44:15 +08:00
Concedo
59c5448ac8 fixed colab (+1 squashed commits)
Squashed commits:

[1d1c686f] updated colab and docs
2024-03-02 10:09:07 +08:00
Concedo
f75e479db0 WIP on sdcpp integration 2024-02-29 00:40:07 +08:00
Concedo
488777114a added json mode 2024-02-12 17:08:34 +08:00
Concedo
332c5e713b json self format 2024-02-12 16:50:27 +08:00
Concedo
4cd571db89 vulkan multigpu, show uptime 2024-02-08 16:54:38 +08:00
Concedo
504300784f updated lite 2024-02-03 21:11:06 +08:00
kalomaze
123bff9a0f
Full DynaTemp implementation + UI (#600)
* move Dynatemp changes to new branch

* fix float header

* Properly reintroduce variable expert count

Controllable through experts.txt

* first pass at DynaTemp UI

Checkbox partial implemented, Min and Max Temp implemented

* DynaTemp UI Checkbox

Trigger DynaTemp on checkbox

* DynaTemp UI checkbox edition

Hell Yeah! DynaTemp!

* Remove greedy dynatemp

* Fix race condition caused by debug print

* Fixed broken presets and miro

Fixes broken presets and mirostat

* Remove debug function + HHI temp

Also removed unnecessary softmax double precision

* Fix whitespace (?) for generate function

* epic upstream renaming scheme fix

* fix stupid indents

* Other cleanup

Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp

* Slight indent fix

* revert batch pyinstaller maker to mainline

and also delete experts.txt since adjustable routing is also being removed for the PR

* compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value.

---------

Co-authored-by: Alexander Abushady <aabushady214@gmail.com>
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-01-06 11:13:16 +08:00
DebuggingLife46
e733a9e425
Add logit_bias to the OpenAI api (#577)
* Add logit_bias to the OpenAI api

* Cleanup and refactor, test in swagger.

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-12-27 00:26:19 +08:00
Concedo
2810151b98 update docs 2023-12-13 22:48:29 +08:00
Concedo
a012342a77 updated docs, shifted kv extra space to be subtracted from user's ctx value instead of added on load. 2023-11-30 14:19:40 +08:00