Concedo
374fffb9c6
Reworking rope WIP
2023-07-19 00:54:41 +08:00
Concedo
a286776435
updated lite
2023-07-11 21:48:01 +08:00
Concedo
1d1111e10f
expose timing info in web api
2023-07-11 18:56:06 +08:00
Concedo
7222877069
Merge remote-tracking branch 'ren/concedo' into concedo_experimental
2023-07-11 18:45:36 +08:00
Concedo
4be167915a
added linear rope option, added warning for bad samplers
2023-07-11 18:08:19 +08:00
Concedo
2827920044
fix compile errors, rwkv not working
2023-07-10 18:23:25 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo
2023-07-08 09:33:26 +08:00
shutup
1727e652f1
expose some useful info that can be used in statistics of performence
2023-07-07 11:52:58 +08:00
Concedo
8424a35c62
added the ability to ban any substring tokens
2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa
backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas
2023-07-06 22:33:46 +08:00
Concedo
fff705d4f6
Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental
2023-07-04 18:42:02 +08:00
Concedo
c6c0afdf18
refactor to avoid code duplication
2023-07-04 18:35:54 +08:00
Concedo
784628a2be
Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental
2023-07-04 16:38:32 +08:00
Concedo
ca9a11697c
possibly slower, but cannot use larger batches without modifying ggml library.
2023-07-04 00:35:02 +08:00
Ycros
309534dcd0
implement sampler order, expose sampler order and mirostat in api
2023-07-02 18:15:34 +00:00
Concedo
ef3b8dc0d9
GPU accel for rwkv is slow, disable it
2023-07-02 00:41:46 +08:00
Concedo
e1a7042943
try out the new rwkv but it seems worse, may revert
2023-07-02 00:10:56 +08:00
YellowRoseCx
8afa800fb6
Expose low_vram for CUDA
...
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
d2034ced7b
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# build.zig
# flake.nix
# tests/test-grad0.c
# tests/test-sampling.cpp
# tests/test-tokenizer-0.cpp
2023-06-25 17:01:15 +08:00
Concedo
0485fa65a2
wstring convert for mpt
2023-06-24 11:43:42 +08:00
Concedo
f39a746089
bug fixes for openblas
2023-06-23 22:45:22 +08:00
Concedo
43c2891afa
option to not use scratch
2023-06-23 19:01:36 +08:00
Concedo
df9135e3a9
fixing memory bugs
2023-06-23 18:41:23 +08:00
Concedo
1b71752a9f
Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX
2023-06-22 00:43:25 +08:00
Concedo
537ff22ec9
fixed a bug with token timings, updated lite
2023-06-20 20:41:42 +08:00
Concedo
8e2dc19dc6
updated tokenizer, added support for scratch buffers for neox and gpt2
2023-06-19 21:29:06 +08:00
Concedo
b08b371983
allow hordeconfig to set a max ctx length too.
2023-06-18 16:42:32 +08:00
Concedo
8775dd99f4
various debug logging improvements
2023-06-18 15:24:58 +08:00
Concedo
8bc4143e14
Merge branch 'concedo' into concedo_experimental
2023-06-17 22:29:38 +08:00
YellowRoseCx
971fe9f007
add tokens per second output ( #246 )
...
* add tokens per second output
* Update gpttype_adapter.cpp
simplify
---------
Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>
2023-06-17 19:54:29 +08:00
Concedo
0971f83bca
added eos token id handling for starcoder models, as they use a different EOS ID
2023-06-15 22:57:14 +08:00
Concedo
3ed3e7b7e2
reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models
2023-06-14 20:03:14 +08:00
Concedo
82cf97ce92
hotfix for rwkv
2023-06-13 23:38:41 +08:00
Concedo
871009dfab
integrated world tokenizer for RWKV
2023-06-13 20:06:19 +08:00
Concedo
9b6c35b651
rwkv speed enhancements (batch processing), fixed a rwkv token processing bug
2023-06-13 16:02:12 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
Concedo
b92f9fe3a2
Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental
2023-06-09 20:41:02 +08:00
12Boti
e1ab14c4ab
fix format string vulnerability ( #223 )
2023-06-09 20:16:03 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation
2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite
2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming
2023-06-08 18:34:08 +02:00
Concedo
a6a0fa338a
cleanup indentation, fixing cublas build
2023-06-08 22:40:53 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
37659d2c4e
allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.
2023-06-01 22:33:50 +08:00
Concedo
49272e3c53
adjusted defaults
2023-06-01 20:03:44 +08:00
Concedo
ea336bfa33
rwkv eos
2023-05-29 22:40:27 +08:00
Concedo
28f1196f65
adjust default rep pen range
2023-05-28 19:36:21 +08:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Concedo
55e0fbf024
wip integrating new rwkv
2023-05-27 22:45:28 +08:00