Commit graph

111 commits

Author SHA1 Message Date
Concedo
374fffb9c6 Reworking rope WIP 2023-07-19 00:54:41 +08:00
Concedo
a286776435 updated lite 2023-07-11 21:48:01 +08:00
Concedo
1d1111e10f expose timing info in web api 2023-07-11 18:56:06 +08:00
Concedo
7222877069 Merge remote-tracking branch 'ren/concedo' into concedo_experimental 2023-07-11 18:45:36 +08:00
Concedo
4be167915a added linear rope option, added warning for bad samplers 2023-07-11 18:08:19 +08:00
Concedo
2827920044 fix compile errors, rwkv not working 2023-07-10 18:23:25 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo 2023-07-08 09:33:26 +08:00
shutup
1727e652f1 expose some useful info that can be used in statistics of performence 2023-07-07 11:52:58 +08:00
Concedo
8424a35c62 added the ability to ban any substring tokens 2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
fff705d4f6 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental 2023-07-04 18:42:02 +08:00
Concedo
c6c0afdf18 refactor to avoid code duplication 2023-07-04 18:35:54 +08:00
Concedo
784628a2be Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental 2023-07-04 16:38:32 +08:00
Concedo
ca9a11697c possibly slower, but cannot use larger batches without modifying ggml library. 2023-07-04 00:35:02 +08:00
Ycros
309534dcd0 implement sampler order, expose sampler order and mirostat in api 2023-07-02 18:15:34 +00:00
Concedo
ef3b8dc0d9 GPU accel for rwkv is slow, disable it 2023-07-02 00:41:46 +08:00
Concedo
e1a7042943 try out the new rwkv but it seems worse, may revert 2023-07-02 00:10:56 +08:00
YellowRoseCx
8afa800fb6 Expose low_vram for CUDA
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
d2034ced7b Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	build.zig
#	flake.nix
#	tests/test-grad0.c
#	tests/test-sampling.cpp
#	tests/test-tokenizer-0.cpp
2023-06-25 17:01:15 +08:00
Concedo
0485fa65a2 wstring convert for mpt 2023-06-24 11:43:42 +08:00
Concedo
f39a746089 bug fixes for openblas 2023-06-23 22:45:22 +08:00
Concedo
43c2891afa option to not use scratch 2023-06-23 19:01:36 +08:00
Concedo
df9135e3a9 fixing memory bugs 2023-06-23 18:41:23 +08:00
Concedo
1b71752a9f Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX 2023-06-22 00:43:25 +08:00
Concedo
537ff22ec9 fixed a bug with token timings, updated lite 2023-06-20 20:41:42 +08:00
Concedo
8e2dc19dc6 updated tokenizer, added support for scratch buffers for neox and gpt2 2023-06-19 21:29:06 +08:00
Concedo
b08b371983 allow hordeconfig to set a max ctx length too. 2023-06-18 16:42:32 +08:00
Concedo
8775dd99f4 various debug logging improvements 2023-06-18 15:24:58 +08:00
Concedo
8bc4143e14 Merge branch 'concedo' into concedo_experimental 2023-06-17 22:29:38 +08:00
YellowRoseCx
971fe9f007
add tokens per second output (#246)
* add tokens per second output

* Update gpttype_adapter.cpp

simplify

---------

Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>
2023-06-17 19:54:29 +08:00
Concedo
0971f83bca added eos token id handling for starcoder models, as they use a different EOS ID 2023-06-15 22:57:14 +08:00
Concedo
3ed3e7b7e2 reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models 2023-06-14 20:03:14 +08:00
Concedo
82cf97ce92 hotfix for rwkv 2023-06-13 23:38:41 +08:00
Concedo
871009dfab integrated world tokenizer for RWKV 2023-06-13 20:06:19 +08:00
Concedo
9b6c35b651 rwkv speed enhancements (batch processing), fixed a rwkv token processing bug 2023-06-13 16:02:12 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
Concedo
b92f9fe3a2 Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental 2023-06-09 20:41:02 +08:00
12Boti
e1ab14c4ab
fix format string vulnerability (#223) 2023-06-09 20:16:03 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00
Concedo
a6a0fa338a cleanup indentation, fixing cublas build 2023-06-08 22:40:53 +08:00
Concedo
6f82e17b7a added MPT support 2023-06-03 16:14:08 +08:00
Concedo
37659d2c4e allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads. 2023-06-01 22:33:50 +08:00
Concedo
49272e3c53 adjusted defaults 2023-06-01 20:03:44 +08:00
Concedo
ea336bfa33 rwkv eos 2023-05-29 22:40:27 +08:00
Concedo
28f1196f65 adjust default rep pen range 2023-05-28 19:36:21 +08:00
Concedo
5d9f5b28a6 rwkv integration completed 2023-05-28 00:48:56 +08:00
Concedo
55e0fbf024 wip integrating new rwkv 2023-05-27 22:45:28 +08:00