YellowRoseCx
8afa800fb6
Expose low_vram for CUDA
...
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
8775dd99f4
various debug logging improvements
2023-06-18 15:24:58 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation
2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite
2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming
2023-06-08 18:34:08 +02:00
Concedo
abfdfb702e
added top_a sampler
2023-05-27 17:32:37 +08:00
Concedo
466cd21368
test cmakefile for cublas.
2023-05-15 14:50:38 +08:00
Concedo
8a964e76c8
integrated mirostat as a launch parameter, works on all models
2023-05-06 00:47:17 +08:00
Concedo
851f55325a
Merge remote-tracking branch 'temp/concedo' into concedo_experimental
2023-05-05 23:55:53 +08:00
Concedo
2edbcebe27
added optional force versioning flag
2023-05-05 22:02:00 +08:00
Hendrik Langer
8131bc8b56
add new sampling algorithm mirostat
2023-05-05 13:23:47 +02:00
Concedo
4857739ab5
allow specifying a different thread count for GPU blas
2023-05-03 21:19:59 +08:00
Concedo
966cd2ce91
Merge remote-tracking branch 'temp/concedo' into concedo_experimental
...
# Conflicts:
# koboldcpp.py
2023-05-02 22:43:34 +08:00
Concedo
7afad2b9b5
integrated the new samplers
2023-04-29 19:41:41 +08:00
Concedo
e8a389f85b
updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching
2023-04-28 11:41:03 +08:00
Concedo
3962eb39c7
added token unbanning
2023-04-24 21:50:20 +08:00
Concedo
6e908c1792
added lora support
2023-04-22 12:29:38 +08:00
Concedo
c200b674f4
updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter
2023-04-18 17:36:44 +08:00
Concedo
525184930d
added a kobold API compatible implementation of stopping sequences
2023-04-16 18:37:49 +08:00
Concedo
ad5676810a
merge CLBlast improvements - GPU dequant
2023-04-16 01:17:40 +08:00
Concedo
adb4df78d6
Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation.
2023-04-14 21:24:16 +08:00
Concedo
23c675b2e6
integrated optional (experimentl) CLBlast support
2023-04-11 23:33:44 +08:00
Concedo
f53238f570
Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed.
2023-04-10 12:00:34 +08:00
Concedo
085a9f90a7
still refactoring
2023-04-01 11:56:34 +08:00
Concedo
6b86f5ea22
halfway refactoring, wip adding other model types
2023-04-01 01:13:05 +08:00