Commit graph

90 commits

Author SHA1 Message Date
Concedo
43c2891afa option to not use scratch 2023-06-23 19:01:36 +08:00
Concedo
df9135e3a9 fixing memory bugs 2023-06-23 18:41:23 +08:00
Concedo
1b71752a9f Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX 2023-06-22 00:43:25 +08:00
Concedo
537ff22ec9 fixed a bug with token timings, updated lite 2023-06-20 20:41:42 +08:00
Concedo
8e2dc19dc6 updated tokenizer, added support for scratch buffers for neox and gpt2 2023-06-19 21:29:06 +08:00
Concedo
b08b371983 allow hordeconfig to set a max ctx length too. 2023-06-18 16:42:32 +08:00
Concedo
8775dd99f4 various debug logging improvements 2023-06-18 15:24:58 +08:00
Concedo
8bc4143e14 Merge branch 'concedo' into concedo_experimental 2023-06-17 22:29:38 +08:00
YellowRoseCx
971fe9f007
add tokens per second output (#246)
* add tokens per second output

* Update gpttype_adapter.cpp

simplify

---------

Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>
2023-06-17 19:54:29 +08:00
Concedo
0971f83bca added eos token id handling for starcoder models, as they use a different EOS ID 2023-06-15 22:57:14 +08:00
Concedo
3ed3e7b7e2 reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models 2023-06-14 20:03:14 +08:00
Concedo
82cf97ce92 hotfix for rwkv 2023-06-13 23:38:41 +08:00
Concedo
871009dfab integrated world tokenizer for RWKV 2023-06-13 20:06:19 +08:00
Concedo
9b6c35b651 rwkv speed enhancements (batch processing), fixed a rwkv token processing bug 2023-06-13 16:02:12 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
Concedo
b92f9fe3a2 Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental 2023-06-09 20:41:02 +08:00
12Boti
e1ab14c4ab
fix format string vulnerability (#223) 2023-06-09 20:16:03 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00
Concedo
a6a0fa338a cleanup indentation, fixing cublas build 2023-06-08 22:40:53 +08:00
Concedo
6f82e17b7a added MPT support 2023-06-03 16:14:08 +08:00
Concedo
37659d2c4e allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads. 2023-06-01 22:33:50 +08:00
Concedo
49272e3c53 adjusted defaults 2023-06-01 20:03:44 +08:00
Concedo
ea336bfa33 rwkv eos 2023-05-29 22:40:27 +08:00
Concedo
28f1196f65 adjust default rep pen range 2023-05-28 19:36:21 +08:00
Concedo
5d9f5b28a6 rwkv integration completed 2023-05-28 00:48:56 +08:00
Concedo
55e0fbf024 wip integrating new rwkv 2023-05-27 22:45:28 +08:00
Concedo
abfdfb702e added top_a sampler 2023-05-27 17:32:37 +08:00
Concedo
bd4fe936f5 cleanup sampling code 2023-05-27 11:58:39 +08:00
Concedo
3c8f404243 integrated token probability viewer in debugmode 2023-05-26 16:40:26 +08:00
Concedo
cd4012c3ed minor fixes to debug logging, fixed a typo, added a new failsafe mode 2023-05-23 21:31:42 +08:00
Concedo
d418146535 fixed a token decoding bug 2023-05-21 00:53:20 +08:00
Concedo
5032e0fd64 trying to fix ggjt v3 2023-05-21 00:29:50 +08:00
Concedo
c048bcfec4 remove old filever checks (+7 squashed commit)
Squashed commit:

[b72627a] new format not working

[e568870] old ver works

[7053b77] compile errors fixed, fixing linkers

[4ae8889] add new ver

[ff82dfd] file format checks

[25b8aa8] refactoring type names

[931063b] still merging
2023-05-21 00:15:39 +08:00
Concedo
a0cfed1e30 still merging in process 2023-05-20 15:58:33 +08:00
Concedo
a8958f6b76 merging, do not use 2023-05-20 15:12:31 +08:00
Concedo
010b2753d9 Merge commit '6986c7835a' into concedo_experimental
# Conflicts:
#	README.md
2023-05-20 11:30:51 +08:00
Concedo
487ac226b4 need to set the unshuffle before loading the model 2023-05-17 17:58:21 +08:00
Concedo
2c6ac06936 gpu offload not working for other arch. debug in future. 2023-05-17 17:13:01 +08:00
Concedo
00da2a5f4e neox is updated 2023-05-17 14:56:54 +08:00
Concedo
90fe9096b4 clean and refactoring pass before supporting newer models for different arch 2023-05-17 11:23:29 +08:00
Concedo
466cd21368 test cmakefile for cublas. 2023-05-15 14:50:38 +08:00
Concedo
b692e4d2a4 wip 2023-05-14 17:21:07 +08:00
Concedo
8a5fe628df recognize q8_0 as an older format as the new clblast doesnt work correctly with it 2023-05-14 11:06:23 +08:00
Concedo
e05455f852 fixed wrong sized struct from legacy q8_1, fixed opencl varsize arrays 2023-05-13 23:56:08 +08:00
Concedo
05cf5f7d6e partially working, but the blas matmul is broken 2023-05-13 11:35:38 +08:00
Concedo
54194911ac Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-05-09 16:50:43 +08:00
Concedo
2f2eff6e13 the dark gods have been sated, and redpajama is integrated... but at what cost? 2023-05-08 20:58:00 +08:00