Commit graph

229 commits

Author SHA1 Message Date
askmyteapot
8263fd7bdb
Update llama_v3.cpp (#393)
Fixing C2065 compiler error. 
Missed '3' on 3 separate identifiers (kB > kB3, MB > MB3)
2023-08-23 22:15:48 +08:00
Concedo
af170fc2db Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	llama.cpp
#	scripts/sync-ggml.sh
#	tests/test-tokenizer-0.cpp
2023-08-23 17:08:09 +08:00
Concedo
981c9131f0 gguf for llama is working 2023-08-23 16:07:07 +08:00
Concedo
39cc83e8c9 incomplete merge, compiles but generates rubbish 2023-08-22 23:12:47 +08:00
Concedo
a07e6dd3ad revert cuda changes as they are bugggy 2023-08-09 22:36:41 +08:00
Concedo
18bb0ab127 up ver, support 16k ctx 2023-08-04 21:47:17 +08:00
Concedo
ba2040d1df compile fix for ARM NEON 2023-08-03 12:52:06 +08:00
Concedo
34e60be41a compile fix 2023-08-03 10:36:14 +08:00
Concedo
c58ffc92e5 fixed compile error 2023-08-01 18:28:49 +08:00
Concedo
45456fa6ca switch noavx2 to not use openblas, as it has incompatible instructions 2023-07-30 16:47:33 +08:00
Concedo
2807d98fd4 touchup (+2 squashed commit)
Squashed commit:

[8b06458] fixed broken param order

[7eabdc0] very broken, do not use
2023-07-22 22:57:56 +08:00
Concedo
374fffb9c6 Reworking rope WIP 2023-07-19 00:54:41 +08:00
Concedo
523fc3be52 fixed rwkv, standardized new ctx usage 2023-07-10 20:05:53 +08:00
Concedo
2827920044 fix compile errors, rwkv not working 2023-07-10 18:23:25 +08:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
ca9a11697c possibly slower, but cannot use larger batches without modifying ggml library. 2023-07-04 00:35:02 +08:00
Concedo
bfeb3471d7 fix typos 2023-07-03 21:36:42 +08:00
Concedo
3d2907d208 make gptneox and gptj work with extended context too 2023-07-02 18:28:09 +08:00
Concedo
ef3b8dc0d9 GPU accel for rwkv is slow, disable it 2023-07-02 00:41:46 +08:00
Concedo
e1a7042943 try out the new rwkv but it seems worse, may revert 2023-07-02 00:10:56 +08:00
Concedo
86469d15c4 fix for yr-rocm, large gpu scratch 2023-06-30 12:40:08 +08:00
Concedo
86b061b98c wip on unified cublas integration, add all the small libraries but exclude the large ones 2023-06-29 18:35:31 +08:00
Concedo
c2f1ed6556 fix compile errors 2023-06-29 17:54:12 +08:00
Concedo
b4698abafc Wip, CUDA porting malloc improvements, gpu accel for non-llama, backport old quants 2023-06-28 18:20:46 +08:00
Concedo
9527a783ea fix rope inplace 2023-06-27 19:44:33 +08:00
Concedo
8342fe81b1 revert the wstring tokenization. coherency was affected 2023-06-24 12:58:49 +08:00
Concedo
0485fa65a2 wstring convert for mpt 2023-06-24 11:43:42 +08:00
Concedo
490cf395f8 better alloc error 2023-06-23 22:51:51 +08:00
Concedo
f39a746089 bug fixes for openblas 2023-06-23 22:45:22 +08:00
Concedo
43c2891afa option to not use scratch 2023-06-23 19:01:36 +08:00
Concedo
d5e4cf7ffe handle ctx manip 2023-06-23 19:01:15 +08:00
Concedo
df9135e3a9 fixing memory bugs 2023-06-23 18:41:23 +08:00
Concedo
e6ddb15c3a cleanup 2023-06-22 10:38:27 +08:00
Concedo
1b71752a9f Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX 2023-06-22 00:43:25 +08:00
Concedo
dfdd20240c gpt j use scratch buffers 2023-06-21 16:10:31 +08:00
Concedo
8e2dc19dc6 updated tokenizer, added support for scratch buffers for neox and gpt2 2023-06-19 21:29:06 +08:00
Concedo
3ed3e7b7e2 reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models 2023-06-14 20:03:14 +08:00
Concedo
871009dfab integrated world tokenizer for RWKV 2023-06-13 20:06:19 +08:00
Concedo
860fb026df rwkv compile fix (+1 squashed commits)
Squashed commits:

[8b0ebb1] upgraded rwkv + added memory overheads + added state_out bufs
2023-06-12 23:04:40 +08:00
Concedo
c44b9c3ecf added the llama_v2 cuda back (+2 squashed commit)
Squashed commit:

[1c97fd4] Revert "fix for cublas"

This reverts commit 994be9a4db.

[fce03c3] Revert "fix for cublas"

This reverts commit 33528f5b1d.
2023-06-11 23:23:24 +08:00
Concedo
a6a0fa338a cleanup indentation, fixing cublas build 2023-06-08 22:40:53 +08:00
Concedo
c046db5197 lite bugfixes, buffer size changes, fixed a topk bug. 2023-06-06 22:38:25 +08:00
Concedo
9270056269 fixed compile error in cmake VS 2023-06-05 11:48:04 +08:00
Concedo
9aa2d8535b hide gpu input box when dropdown not selected, minor memory fix for neox and gptj 2023-06-04 21:47:17 +08:00
Concedo
20803c221e cleaning up some old junk 2023-06-04 11:05:46 +08:00
Concedo
b62279cb39 buf size for starcoder still not good 2023-06-04 00:41:08 +08:00
Concedo
c1b293d31a fixed MPT ooms 2023-06-03 18:37:13 +08:00
Concedo
6f82e17b7a added MPT support 2023-06-03 16:14:08 +08:00
Concedo
234270bd83 back to 32 block size, not better 2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6 change dmmv block size 2023-05-31 21:40:12 +08:00