Concedo
886f4eed79
updated lite, up ver, remove bell
2023-08-10 22:01:33 +08:00
Concedo
6659652c9f
lower actual temp used when temp=0
2023-08-07 11:05:06 +08:00
Concedo
bcfdd0e662
fixed bbs -1 and allow bbs = 2048
2023-08-06 17:47:05 +08:00
Concedo
18bb0ab127
up ver, support 16k ctx
2023-08-04 21:47:17 +08:00
Concedo
46682e5cb3
added mmq launch flag
2023-08-01 17:57:13 +08:00
Concedo
e221843147
trying out mmq
...
Merge branch 'master' into concedo_experimental
# Conflicts:
# CMakeLists.txt
# README.md
2023-07-31 22:51:15 +08:00
Concedo
c7136f03d9
added support for tensor_split parameter as an advanced parameter.
2023-07-24 17:16:19 +08:00
Concedo
280abaf029
added stop reason in the perf endpoint
2023-07-24 11:55:35 +08:00
Concedo
910744e2c0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# flake.nix
# llama.cpp
2023-07-23 22:37:38 +08:00
Ycros
56995caa48
Fix mirostatv2. ( #338 )
2023-07-23 09:52:03 +08:00
Concedo
39dc1a46c4
added token count, updated lite
2023-07-20 14:41:06 +08:00
Concedo
e9467f5a44
auto rope scale adjustments, added sched yield fix for apple, adjust warning for mirostat
2023-07-19 16:44:44 +08:00
Concedo
374fffb9c6
Reworking rope WIP
2023-07-19 00:54:41 +08:00
Concedo
a286776435
updated lite
2023-07-11 21:48:01 +08:00
Concedo
1d1111e10f
expose timing info in web api
2023-07-11 18:56:06 +08:00
Concedo
7222877069
Merge remote-tracking branch 'ren/concedo' into concedo_experimental
2023-07-11 18:45:36 +08:00
Concedo
4be167915a
added linear rope option, added warning for bad samplers
2023-07-11 18:08:19 +08:00
Concedo
2827920044
fix compile errors, rwkv not working
2023-07-10 18:23:25 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo
2023-07-08 09:33:26 +08:00
shutup
1727e652f1
expose some useful info that can be used in statistics of performence
2023-07-07 11:52:58 +08:00
Concedo
8424a35c62
added the ability to ban any substring tokens
2023-07-06 23:24:21 +08:00
Concedo
27a0907cfa
backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas
2023-07-06 22:33:46 +08:00
Concedo
fff705d4f6
Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental
2023-07-04 18:42:02 +08:00
Concedo
c6c0afdf18
refactor to avoid code duplication
2023-07-04 18:35:54 +08:00
Concedo
784628a2be
Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental
2023-07-04 16:38:32 +08:00
Concedo
ca9a11697c
possibly slower, but cannot use larger batches without modifying ggml library.
2023-07-04 00:35:02 +08:00
Ycros
309534dcd0
implement sampler order, expose sampler order and mirostat in api
2023-07-02 18:15:34 +00:00
Concedo
ef3b8dc0d9
GPU accel for rwkv is slow, disable it
2023-07-02 00:41:46 +08:00
Concedo
e1a7042943
try out the new rwkv but it seems worse, may revert
2023-07-02 00:10:56 +08:00
YellowRoseCx
8afa800fb6
Expose low_vram for CUDA
...
Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA
2023-06-26 16:47:22 -05:00
Concedo
d2034ced7b
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# build.zig
# flake.nix
# tests/test-grad0.c
# tests/test-sampling.cpp
# tests/test-tokenizer-0.cpp
2023-06-25 17:01:15 +08:00
Concedo
0485fa65a2
wstring convert for mpt
2023-06-24 11:43:42 +08:00
Concedo
f39a746089
bug fixes for openblas
2023-06-23 22:45:22 +08:00
Concedo
43c2891afa
option to not use scratch
2023-06-23 19:01:36 +08:00
Concedo
df9135e3a9
fixing memory bugs
2023-06-23 18:41:23 +08:00
Concedo
1b71752a9f
Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX
2023-06-22 00:43:25 +08:00
Concedo
537ff22ec9
fixed a bug with token timings, updated lite
2023-06-20 20:41:42 +08:00
Concedo
8e2dc19dc6
updated tokenizer, added support for scratch buffers for neox and gpt2
2023-06-19 21:29:06 +08:00
Concedo
b08b371983
allow hordeconfig to set a max ctx length too.
2023-06-18 16:42:32 +08:00
Concedo
8775dd99f4
various debug logging improvements
2023-06-18 15:24:58 +08:00
Concedo
8bc4143e14
Merge branch 'concedo' into concedo_experimental
2023-06-17 22:29:38 +08:00
YellowRoseCx
971fe9f007
add tokens per second output ( #246 )
...
* add tokens per second output
* Update gpttype_adapter.cpp
simplify
---------
Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>
2023-06-17 19:54:29 +08:00
Concedo
0971f83bca
added eos token id handling for starcoder models, as they use a different EOS ID
2023-06-15 22:57:14 +08:00
Concedo
3ed3e7b7e2
reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models
2023-06-14 20:03:14 +08:00
Concedo
82cf97ce92
hotfix for rwkv
2023-06-13 23:38:41 +08:00
Concedo
871009dfab
integrated world tokenizer for RWKV
2023-06-13 20:06:19 +08:00
Concedo
9b6c35b651
rwkv speed enhancements (batch processing), fixed a rwkv token processing bug
2023-06-13 16:02:12 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
Concedo
b92f9fe3a2
Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental
2023-06-09 20:41:02 +08:00