Concedo
43c2891afa
option to not use scratch
2023-06-23 19:01:36 +08:00
Concedo
df9135e3a9
fixing memory bugs
2023-06-23 18:41:23 +08:00
Concedo
1b71752a9f
Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX
2023-06-22 00:43:25 +08:00
Concedo
537ff22ec9
fixed a bug with token timings, updated lite
2023-06-20 20:41:42 +08:00
Concedo
8e2dc19dc6
updated tokenizer, added support for scratch buffers for neox and gpt2
2023-06-19 21:29:06 +08:00
Concedo
b08b371983
allow hordeconfig to set a max ctx length too.
2023-06-18 16:42:32 +08:00
Concedo
8775dd99f4
various debug logging improvements
2023-06-18 15:24:58 +08:00
Concedo
8bc4143e14
Merge branch 'concedo' into concedo_experimental
2023-06-17 22:29:38 +08:00
YellowRoseCx
971fe9f007
add tokens per second output ( #246 )
...
* add tokens per second output
* Update gpttype_adapter.cpp
simplify
---------
Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>
2023-06-17 19:54:29 +08:00
Concedo
0971f83bca
added eos token id handling for starcoder models, as they use a different EOS ID
2023-06-15 22:57:14 +08:00
Concedo
3ed3e7b7e2
reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models
2023-06-14 20:03:14 +08:00
Concedo
82cf97ce92
hotfix for rwkv
2023-06-13 23:38:41 +08:00
Concedo
871009dfab
integrated world tokenizer for RWKV
2023-06-13 20:06:19 +08:00
Concedo
9b6c35b651
rwkv speed enhancements (batch processing), fixed a rwkv token processing bug
2023-06-13 16:02:12 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
Concedo
b92f9fe3a2
Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental
2023-06-09 20:41:02 +08:00
12Boti
e1ab14c4ab
fix format string vulnerability ( #223 )
2023-06-09 20:16:03 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation
2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite
2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming
2023-06-08 18:34:08 +02:00
Concedo
a6a0fa338a
cleanup indentation, fixing cublas build
2023-06-08 22:40:53 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
37659d2c4e
allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.
2023-06-01 22:33:50 +08:00
Concedo
49272e3c53
adjusted defaults
2023-06-01 20:03:44 +08:00
Concedo
ea336bfa33
rwkv eos
2023-05-29 22:40:27 +08:00
Concedo
28f1196f65
adjust default rep pen range
2023-05-28 19:36:21 +08:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Concedo
55e0fbf024
wip integrating new rwkv
2023-05-27 22:45:28 +08:00
Concedo
abfdfb702e
added top_a sampler
2023-05-27 17:32:37 +08:00
Concedo
bd4fe936f5
cleanup sampling code
2023-05-27 11:58:39 +08:00
Concedo
3c8f404243
integrated token probability viewer in debugmode
2023-05-26 16:40:26 +08:00
Concedo
cd4012c3ed
minor fixes to debug logging, fixed a typo, added a new failsafe mode
2023-05-23 21:31:42 +08:00
Concedo
d418146535
fixed a token decoding bug
2023-05-21 00:53:20 +08:00
Concedo
5032e0fd64
trying to fix ggjt v3
2023-05-21 00:29:50 +08:00
Concedo
c048bcfec4
remove old filever checks (+7 squashed commit)
...
Squashed commit:
[b72627a] new format not working
[e568870] old ver works
[7053b77] compile errors fixed, fixing linkers
[4ae8889] add new ver
[ff82dfd] file format checks
[25b8aa8] refactoring type names
[931063b] still merging
2023-05-21 00:15:39 +08:00
Concedo
a0cfed1e30
still merging in process
2023-05-20 15:58:33 +08:00
Concedo
a8958f6b76
merging, do not use
2023-05-20 15:12:31 +08:00
Concedo
010b2753d9
Merge commit ' 6986c7835a
' into concedo_experimental
...
# Conflicts:
# README.md
2023-05-20 11:30:51 +08:00
Concedo
487ac226b4
need to set the unshuffle before loading the model
2023-05-17 17:58:21 +08:00
Concedo
2c6ac06936
gpu offload not working for other arch. debug in future.
2023-05-17 17:13:01 +08:00
Concedo
00da2a5f4e
neox is updated
2023-05-17 14:56:54 +08:00
Concedo
90fe9096b4
clean and refactoring pass before supporting newer models for different arch
2023-05-17 11:23:29 +08:00
Concedo
466cd21368
test cmakefile for cublas.
2023-05-15 14:50:38 +08:00
Concedo
b692e4d2a4
wip
2023-05-14 17:21:07 +08:00
Concedo
8a5fe628df
recognize q8_0 as an older format as the new clblast doesnt work correctly with it
2023-05-14 11:06:23 +08:00
Concedo
e05455f852
fixed wrong sized struct from legacy q8_1, fixed opencl varsize arrays
2023-05-13 23:56:08 +08:00
Concedo
05cf5f7d6e
partially working, but the blas matmul is broken
2023-05-13 11:35:38 +08:00
Concedo
54194911ac
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-05-09 16:50:43 +08:00
Concedo
2f2eff6e13
the dark gods have been sated, and redpajama is integrated... but at what cost?
2023-05-08 20:58:00 +08:00