Concedo
ae2cd56de8
kobold integration of min_p sampler (+1 squashed commits)
...
Squashed commits:
[8ad2e349] kobold integration for min_p sampler
2023-11-01 19:08:45 +08:00
Concedo
7924592a83
context shift feature done
2023-10-29 18:21:39 +08:00
Concedo
d10470a1e3
Breaking Change: Remove deprecated commands
2023-10-03 17:16:09 +08:00
Concedo
bc841ec302
flag to retain grammar, fix makefile (+2 squashed commit)
...
Squashed commit:
[d5cd3f28] flag to retain grammar, fix makefile
[b3352963] updated lite to v73
2023-10-01 14:39:56 +08:00
Concedo
eb86cd4027
bump token limits
2023-09-27 01:26:00 +08:00
Concedo
8c453d1e4e
added grammar sampling
2023-09-18 23:02:00 +08:00
Concedo
89495c0716
handle token unbanning over api
2023-08-30 10:51:49 +08:00
Concedo
18bb0ab127
up ver, support 16k ctx
2023-08-04 21:47:17 +08:00
Concedo
46682e5cb3
added mmq launch flag
2023-08-01 17:57:13 +08:00
Concedo
c7136f03d9
added support for tensor_split parameter as an advanced parameter.
2023-07-24 17:16:19 +08:00
Concedo
280abaf029
added stop reason in the perf endpoint
2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4
added token count, updated lite
2023-07-20 14:41:06 +08:00
Concedo
374fffb9c6
Reworking rope WIP
2023-07-19 00:54:41 +08:00
Concedo
1d1111e10f
expose timing info in web api
2023-07-11 18:56:06 +08:00
Concedo
7222877069
Merge remote-tracking branch 'ren/concedo' into concedo_experimental
2023-07-11 18:45:36 +08:00
Concedo
4be167915a
added linear rope option, added warning for bad samplers
2023-07-11 18:08:19 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo
2023-07-08 09:33:26 +08:00
shutup
1727e652f1
expose some useful info that can be used in statistics of performence
2023-07-07 11:52:58 +08:00
Concedo
8424a35c62
added the ability to ban any substring tokens
2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa
backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas
2023-07-06 22:33:46 +08:00
Concedo
c6c0afdf18
refactor to avoid code duplication
2023-07-04 18:35:54 +08:00
Ycros
309534dcd0
implement sampler order, expose sampler order and mirostat in api
2023-07-02 18:15:34 +00:00
YellowRoseCx
8afa800fb6
Expose low_vram for CUDA
...
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
8775dd99f4
various debug logging improvements
2023-06-18 15:24:58 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation
2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite
2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming
2023-06-08 18:34:08 +02:00
Concedo
abfdfb702e
added top_a sampler
2023-05-27 17:32:37 +08:00
Concedo
466cd21368
test cmakefile for cublas.
2023-05-15 14:50:38 +08:00
Concedo
8a964e76c8
integrated mirostat as a launch parameter, works on all models
2023-05-06 00:47:17 +08:00
Concedo
851f55325a
Merge remote-tracking branch 'temp/concedo' into concedo_experimental
2023-05-05 23:55:53 +08:00
Concedo
2edbcebe27
added optional force versioning flag
2023-05-05 22:02:00 +08:00
Hendrik Langer
8131bc8b56
add new sampling algorithm mirostat
2023-05-05 13:23:47 +02:00
Concedo
4857739ab5
allow specifying a different thread count for GPU blas
2023-05-03 21:19:59 +08:00
Concedo
966cd2ce91
Merge remote-tracking branch 'temp/concedo' into concedo_experimental
...
# Conflicts:
# koboldcpp.py
2023-05-02 22:43:34 +08:00
Concedo
7afad2b9b5
integrated the new samplers
2023-04-29 19:41:41 +08:00
Concedo
e8a389f85b
updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching
2023-04-28 11:41:03 +08:00
Concedo
3962eb39c7
added token unbanning
2023-04-24 21:50:20 +08:00
Concedo
6e908c1792
added lora support
2023-04-22 12:29:38 +08:00
Concedo
c200b674f4
updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter
2023-04-18 17:36:44 +08:00
Concedo
525184930d
added a kobold API compatible implementation of stopping sequences
2023-04-16 18:37:49 +08:00
Concedo
ad5676810a
merge CLBlast improvements - GPU dequant
2023-04-16 01:17:40 +08:00
Concedo
adb4df78d6
Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation.
2023-04-14 21:24:16 +08:00
Concedo
23c675b2e6
integrated optional (experimentl) CLBlast support
2023-04-11 23:33:44 +08:00
Concedo
f53238f570
Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed.
2023-04-10 12:00:34 +08:00
Concedo
085a9f90a7
still refactoring
2023-04-01 11:56:34 +08:00
Concedo
6b86f5ea22
halfway refactoring, wip adding other model types
2023-04-01 01:13:05 +08:00