Commit graph

49 commits

Author SHA1 Message Date
Concedo
ae2cd56de8 kobold integration of min_p sampler (+1 squashed commits)
Squashed commits:

[8ad2e349] kobold integration for min_p sampler
2023-11-01 19:08:45 +08:00
Concedo
7924592a83 context shift feature done 2023-10-29 18:21:39 +08:00
Concedo
d10470a1e3 Breaking Change: Remove deprecated commands 2023-10-03 17:16:09 +08:00
Concedo
bc841ec302 flag to retain grammar, fix makefile (+2 squashed commit)
Squashed commit:

[d5cd3f28] flag to retain grammar, fix makefile

[b3352963] updated lite to v73
2023-10-01 14:39:56 +08:00
Concedo
eb86cd4027 bump token limits 2023-09-27 01:26:00 +08:00
Concedo
8c453d1e4e added grammar sampling 2023-09-18 23:02:00 +08:00
Concedo
89495c0716 handle token unbanning over api 2023-08-30 10:51:49 +08:00
Concedo
18bb0ab127 up ver, support 16k ctx 2023-08-04 21:47:17 +08:00
Concedo
46682e5cb3 added mmq launch flag 2023-08-01 17:57:13 +08:00
Concedo
c7136f03d9 added support for tensor_split parameter as an advanced parameter. 2023-07-24 17:16:19 +08:00
Concedo
280abaf029 added stop reason in the perf endpoint 2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4 added token count, updated lite 2023-07-20 14:41:06 +08:00
Concedo
374fffb9c6 Reworking rope WIP 2023-07-19 00:54:41 +08:00
Concedo
1d1111e10f expose timing info in web api 2023-07-11 18:56:06 +08:00
Concedo
7222877069 Merge remote-tracking branch 'ren/concedo' into concedo_experimental 2023-07-11 18:45:36 +08:00
Concedo
4be167915a added linear rope option, added warning for bad samplers 2023-07-11 18:08:19 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo 2023-07-08 09:33:26 +08:00
shutup
1727e652f1 expose some useful info that can be used in statistics of performence 2023-07-07 11:52:58 +08:00
Concedo
8424a35c62 added the ability to ban any substring tokens 2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
c6c0afdf18 refactor to avoid code duplication 2023-07-04 18:35:54 +08:00
Ycros
309534dcd0 implement sampler order, expose sampler order and mirostat in api 2023-07-02 18:15:34 +00:00
YellowRoseCx
8afa800fb6 Expose low_vram for CUDA
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
8775dd99f4 various debug logging improvements 2023-06-18 15:24:58 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00
Concedo
abfdfb702e added top_a sampler 2023-05-27 17:32:37 +08:00
Concedo
466cd21368 test cmakefile for cublas. 2023-05-15 14:50:38 +08:00
Concedo
8a964e76c8 integrated mirostat as a launch parameter, works on all models 2023-05-06 00:47:17 +08:00
Concedo
851f55325a Merge remote-tracking branch 'temp/concedo' into concedo_experimental 2023-05-05 23:55:53 +08:00
Concedo
2edbcebe27 added optional force versioning flag 2023-05-05 22:02:00 +08:00
Hendrik Langer
8131bc8b56 add new sampling algorithm mirostat 2023-05-05 13:23:47 +02:00
Concedo
4857739ab5 allow specifying a different thread count for GPU blas 2023-05-03 21:19:59 +08:00
Concedo
966cd2ce91 Merge remote-tracking branch 'temp/concedo' into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-05-02 22:43:34 +08:00
Concedo
7afad2b9b5 integrated the new samplers 2023-04-29 19:41:41 +08:00
Concedo
e8a389f85b updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching 2023-04-28 11:41:03 +08:00
Concedo
3962eb39c7 added token unbanning 2023-04-24 21:50:20 +08:00
Concedo
6e908c1792 added lora support 2023-04-22 12:29:38 +08:00
Concedo
c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter 2023-04-18 17:36:44 +08:00
Concedo
525184930d added a kobold API compatible implementation of stopping sequences 2023-04-16 18:37:49 +08:00
Concedo
ad5676810a merge CLBlast improvements - GPU dequant 2023-04-16 01:17:40 +08:00
Concedo
adb4df78d6 Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation. 2023-04-14 21:24:16 +08:00
Concedo
23c675b2e6 integrated optional (experimentl) CLBlast support 2023-04-11 23:33:44 +08:00
Concedo
f53238f570 Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed. 2023-04-10 12:00:34 +08:00
Concedo
085a9f90a7 still refactoring 2023-04-01 11:56:34 +08:00
Concedo
6b86f5ea22 halfway refactoring, wip adding other model types 2023-04-01 01:13:05 +08:00