Commit graph

64 commits

Author SHA1 Message Date
Concedo
35111ce01a row split mode is now a toggle 2024-02-09 18:35:58 +08:00
Concedo
4cd571db89 vulkan multigpu, show uptime 2024-02-08 16:54:38 +08:00
Alexander Abushady
4cb956c7db
Quadratic Sampling UI (#652)
* Quadratic Sampling UI

Kalomaze's Quadratic Sampling, now has a UI within KCPP.

* remove debug prints

* cleanup, add smooth sampler to dynatemp

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-02-04 16:26:27 +08:00
Concedo
2a4a7241e6 Merge branch 'vulkan_test' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	llama.cpp
2024-01-25 23:01:44 +08:00
Concedo
08236ccc97 better abort handling, added support for dynatemp exponent 2024-01-23 16:56:12 +08:00
kalomaze
123bff9a0f
Full DynaTemp implementation + UI (#600)
* move Dynatemp changes to new branch

* fix float header

* Properly reintroduce variable expert count

Controllable through experts.txt

* first pass at DynaTemp UI

Checkbox partial implemented, Min and Max Temp implemented

* DynaTemp UI Checkbox

Trigger DynaTemp on checkbox

* DynaTemp UI checkbox edition

Hell Yeah! DynaTemp!

* Remove greedy dynatemp

* Fix race condition caused by debug print

* Fixed broken presets and miro

Fixes broken presets and mirostat

* Remove debug function + HHI temp

Also removed unnecessary softmax double precision

* Fix whitespace (?) for generate function

* epic upstream renaming scheme fix

* fix stupid indents

* Other cleanup

Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp

* Slight indent fix

* revert batch pyinstaller maker to mainline

and also delete experts.txt since adjustable routing is also being removed for the PR

* compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value.

---------

Co-authored-by: Alexander Abushady <aabushady214@gmail.com>
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-01-06 11:13:16 +08:00
Concedo
94e68fe474 added field to show recent seed 2024-01-02 15:35:04 +08:00
DebuggingLife46
e733a9e425
Add logit_bias to the OpenAI api (#577)
* Add logit_bias to the OpenAI api

* Cleanup and refactor, test in swagger.

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-12-27 00:26:19 +08:00
Concedo
77463e0e9c batch size improvements 2023-12-22 15:27:40 +08:00
Concedo
3f863eed72 add presence penalty 2023-12-19 23:18:56 +08:00
Concedo
ec21fa7712 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	Package.swift
#	README.md
#	ggml-cuda.cu
#	llama.cpp
#	llama.h
#	scripts/sync-ggml.sh
#	tests/CMakeLists.txt
2023-12-08 17:42:26 +08:00
Concedo
6570a2005b token count includes ids 2023-12-03 15:44:53 +08:00
Concedo
bffa78116d explore quiet mode 2023-11-26 23:57:27 +08:00
Concedo
be92cfa125 added preloadstory 2023-11-10 13:05:22 +08:00
Concedo
fb3bcac368 handle memory separately for kcpp 2023-11-07 17:15:14 +08:00
Concedo
ae2cd56de8 kobold integration of min_p sampler (+1 squashed commits)
Squashed commits:

[8ad2e349] kobold integration for min_p sampler
2023-11-01 19:08:45 +08:00
Concedo
7924592a83 context shift feature done 2023-10-29 18:21:39 +08:00
Concedo
d10470a1e3 Breaking Change: Remove deprecated commands 2023-10-03 17:16:09 +08:00
Concedo
bc841ec302 flag to retain grammar, fix makefile (+2 squashed commit)
Squashed commit:

[d5cd3f28] flag to retain grammar, fix makefile

[b3352963] updated lite to v73
2023-10-01 14:39:56 +08:00
Concedo
eb86cd4027 bump token limits 2023-09-27 01:26:00 +08:00
Concedo
8c453d1e4e added grammar sampling 2023-09-18 23:02:00 +08:00
Concedo
89495c0716 handle token unbanning over api 2023-08-30 10:51:49 +08:00
Concedo
18bb0ab127 up ver, support 16k ctx 2023-08-04 21:47:17 +08:00
Concedo
46682e5cb3 added mmq launch flag 2023-08-01 17:57:13 +08:00
Concedo
c7136f03d9 added support for tensor_split parameter as an advanced parameter. 2023-07-24 17:16:19 +08:00
Concedo
280abaf029 added stop reason in the perf endpoint 2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4 added token count, updated lite 2023-07-20 14:41:06 +08:00
Concedo
374fffb9c6 Reworking rope WIP 2023-07-19 00:54:41 +08:00
Concedo
1d1111e10f expose timing info in web api 2023-07-11 18:56:06 +08:00
Concedo
7222877069 Merge remote-tracking branch 'ren/concedo' into concedo_experimental 2023-07-11 18:45:36 +08:00
Concedo
4be167915a added linear rope option, added warning for bad samplers 2023-07-11 18:08:19 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo 2023-07-08 09:33:26 +08:00
shutup
1727e652f1 expose some useful info that can be used in statistics of performence 2023-07-07 11:52:58 +08:00
Concedo
8424a35c62 added the ability to ban any substring tokens 2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
c6c0afdf18 refactor to avoid code duplication 2023-07-04 18:35:54 +08:00
Ycros
309534dcd0 implement sampler order, expose sampler order and mirostat in api 2023-07-02 18:15:34 +00:00
YellowRoseCx
8afa800fb6 Expose low_vram for CUDA
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
8775dd99f4 various debug logging improvements 2023-06-18 15:24:58 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00
Concedo
abfdfb702e added top_a sampler 2023-05-27 17:32:37 +08:00
Concedo
466cd21368 test cmakefile for cublas. 2023-05-15 14:50:38 +08:00
Concedo
8a964e76c8 integrated mirostat as a launch parameter, works on all models 2023-05-06 00:47:17 +08:00
Concedo
851f55325a Merge remote-tracking branch 'temp/concedo' into concedo_experimental 2023-05-05 23:55:53 +08:00
Concedo
2edbcebe27 added optional force versioning flag 2023-05-05 22:02:00 +08:00
Hendrik Langer
8131bc8b56 add new sampling algorithm mirostat 2023-05-05 13:23:47 +02:00