Commit graph

27 commits

Author SHA1 Message Date
YellowRoseCx
8afa800fb6 Expose low_vram for CUDA
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
8775dd99f4 various debug logging improvements 2023-06-18 15:24:58 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00
Concedo
abfdfb702e added top_a sampler 2023-05-27 17:32:37 +08:00
Concedo
466cd21368 test cmakefile for cublas. 2023-05-15 14:50:38 +08:00
Concedo
8a964e76c8 integrated mirostat as a launch parameter, works on all models 2023-05-06 00:47:17 +08:00
Concedo
851f55325a Merge remote-tracking branch 'temp/concedo' into concedo_experimental 2023-05-05 23:55:53 +08:00
Concedo
2edbcebe27 added optional force versioning flag 2023-05-05 22:02:00 +08:00
Hendrik Langer
8131bc8b56 add new sampling algorithm mirostat 2023-05-05 13:23:47 +02:00
Concedo
4857739ab5 allow specifying a different thread count for GPU blas 2023-05-03 21:19:59 +08:00
Concedo
966cd2ce91 Merge remote-tracking branch 'temp/concedo' into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-05-02 22:43:34 +08:00
Concedo
7afad2b9b5 integrated the new samplers 2023-04-29 19:41:41 +08:00
Concedo
e8a389f85b updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching 2023-04-28 11:41:03 +08:00
Concedo
3962eb39c7 added token unbanning 2023-04-24 21:50:20 +08:00
Concedo
6e908c1792 added lora support 2023-04-22 12:29:38 +08:00
Concedo
c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter 2023-04-18 17:36:44 +08:00
Concedo
525184930d added a kobold API compatible implementation of stopping sequences 2023-04-16 18:37:49 +08:00
Concedo
ad5676810a merge CLBlast improvements - GPU dequant 2023-04-16 01:17:40 +08:00
Concedo
adb4df78d6 Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation. 2023-04-14 21:24:16 +08:00
Concedo
23c675b2e6 integrated optional (experimentl) CLBlast support 2023-04-11 23:33:44 +08:00
Concedo
f53238f570 Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed. 2023-04-10 12:00:34 +08:00
Concedo
085a9f90a7 still refactoring 2023-04-01 11:56:34 +08:00
Concedo
6b86f5ea22 halfway refactoring, wip adding other model types 2023-04-01 01:13:05 +08:00