askmyteapot
|
8263fd7bdb
|
Update llama_v3.cpp (#393)
Fixing C2065 compiler error.
Missed '3' on 3 separate identifiers (kB > kB3, MB > MB3)
|
2023-08-23 22:15:48 +08:00 |
|
Concedo
|
af170fc2db
|
Merge branch 'master' into concedo_experimental
# Conflicts:
# README.md
# llama.cpp
# scripts/sync-ggml.sh
# tests/test-tokenizer-0.cpp
|
2023-08-23 17:08:09 +08:00 |
|
Concedo
|
981c9131f0
|
gguf for llama is working
|
2023-08-23 16:07:07 +08:00 |
|
Concedo
|
39cc83e8c9
|
incomplete merge, compiles but generates rubbish
|
2023-08-22 23:12:47 +08:00 |
|
Concedo
|
a07e6dd3ad
|
revert cuda changes as they are bugggy
|
2023-08-09 22:36:41 +08:00 |
|
Concedo
|
18bb0ab127
|
up ver, support 16k ctx
|
2023-08-04 21:47:17 +08:00 |
|
Concedo
|
ba2040d1df
|
compile fix for ARM NEON
|
2023-08-03 12:52:06 +08:00 |
|
Concedo
|
34e60be41a
|
compile fix
|
2023-08-03 10:36:14 +08:00 |
|
Concedo
|
c58ffc92e5
|
fixed compile error
|
2023-08-01 18:28:49 +08:00 |
|
Concedo
|
45456fa6ca
|
switch noavx2 to not use openblas, as it has incompatible instructions
|
2023-07-30 16:47:33 +08:00 |
|
Concedo
|
2807d98fd4
|
touchup (+2 squashed commit)
Squashed commit:
[8b06458] fixed broken param order
[7eabdc0] very broken, do not use
|
2023-07-22 22:57:56 +08:00 |
|
Concedo
|
374fffb9c6
|
Reworking rope WIP
|
2023-07-19 00:54:41 +08:00 |
|
Concedo
|
523fc3be52
|
fixed rwkv, standardized new ctx usage
|
2023-07-10 20:05:53 +08:00 |
|
Concedo
|
2827920044
|
fix compile errors, rwkv not working
|
2023-07-10 18:23:25 +08:00 |
|
Concedo
|
27a0907cfa
|
backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas
|
2023-07-06 22:33:46 +08:00 |
|
Concedo
|
ca9a11697c
|
possibly slower, but cannot use larger batches without modifying ggml library.
|
2023-07-04 00:35:02 +08:00 |
|
Concedo
|
bfeb3471d7
|
fix typos
|
2023-07-03 21:36:42 +08:00 |
|
Concedo
|
3d2907d208
|
make gptneox and gptj work with extended context too
|
2023-07-02 18:28:09 +08:00 |
|
Concedo
|
ef3b8dc0d9
|
GPU accel for rwkv is slow, disable it
|
2023-07-02 00:41:46 +08:00 |
|
Concedo
|
e1a7042943
|
try out the new rwkv but it seems worse, may revert
|
2023-07-02 00:10:56 +08:00 |
|
Concedo
|
86469d15c4
|
fix for yr-rocm, large gpu scratch
|
2023-06-30 12:40:08 +08:00 |
|
Concedo
|
86b061b98c
|
wip on unified cublas integration, add all the small libraries but exclude the large ones
|
2023-06-29 18:35:31 +08:00 |
|
Concedo
|
c2f1ed6556
|
fix compile errors
|
2023-06-29 17:54:12 +08:00 |
|
Concedo
|
b4698abafc
|
Wip, CUDA porting malloc improvements, gpu accel for non-llama, backport old quants
|
2023-06-28 18:20:46 +08:00 |
|
Concedo
|
9527a783ea
|
fix rope inplace
|
2023-06-27 19:44:33 +08:00 |
|
Concedo
|
8342fe81b1
|
revert the wstring tokenization. coherency was affected
|
2023-06-24 12:58:49 +08:00 |
|
Concedo
|
0485fa65a2
|
wstring convert for mpt
|
2023-06-24 11:43:42 +08:00 |
|
Concedo
|
490cf395f8
|
better alloc error
|
2023-06-23 22:51:51 +08:00 |
|
Concedo
|
f39a746089
|
bug fixes for openblas
|
2023-06-23 22:45:22 +08:00 |
|
Concedo
|
43c2891afa
|
option to not use scratch
|
2023-06-23 19:01:36 +08:00 |
|
Concedo
|
d5e4cf7ffe
|
handle ctx manip
|
2023-06-23 19:01:15 +08:00 |
|
Concedo
|
df9135e3a9
|
fixing memory bugs
|
2023-06-23 18:41:23 +08:00 |
|
Concedo
|
e6ddb15c3a
|
cleanup
|
2023-06-22 10:38:27 +08:00 |
|
Concedo
|
1b71752a9f
|
Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX
|
2023-06-22 00:43:25 +08:00 |
|
Concedo
|
dfdd20240c
|
gpt j use scratch buffers
|
2023-06-21 16:10:31 +08:00 |
|
Concedo
|
8e2dc19dc6
|
updated tokenizer, added support for scratch buffers for neox and gpt2
|
2023-06-19 21:29:06 +08:00 |
|
Concedo
|
3ed3e7b7e2
|
reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models
|
2023-06-14 20:03:14 +08:00 |
|
Concedo
|
871009dfab
|
integrated world tokenizer for RWKV
|
2023-06-13 20:06:19 +08:00 |
|
Concedo
|
860fb026df
|
rwkv compile fix (+1 squashed commits)
Squashed commits:
[8b0ebb1] upgraded rwkv + added memory overheads + added state_out bufs
|
2023-06-12 23:04:40 +08:00 |
|
Concedo
|
c44b9c3ecf
|
added the llama_v2 cuda back (+2 squashed commit)
Squashed commit:
[1c97fd4] Revert "fix for cublas"
This reverts commit 994be9a4db .
[fce03c3] Revert "fix for cublas"
This reverts commit 33528f5b1d .
|
2023-06-11 23:23:24 +08:00 |
|
Concedo
|
a6a0fa338a
|
cleanup indentation, fixing cublas build
|
2023-06-08 22:40:53 +08:00 |
|
Concedo
|
c046db5197
|
lite bugfixes, buffer size changes, fixed a topk bug.
|
2023-06-06 22:38:25 +08:00 |
|
Concedo
|
9270056269
|
fixed compile error in cmake VS
|
2023-06-05 11:48:04 +08:00 |
|
Concedo
|
9aa2d8535b
|
hide gpu input box when dropdown not selected, minor memory fix for neox and gptj
|
2023-06-04 21:47:17 +08:00 |
|
Concedo
|
20803c221e
|
cleaning up some old junk
|
2023-06-04 11:05:46 +08:00 |
|
Concedo
|
b62279cb39
|
buf size for starcoder still not good
|
2023-06-04 00:41:08 +08:00 |
|
Concedo
|
c1b293d31a
|
fixed MPT ooms
|
2023-06-03 18:37:13 +08:00 |
|
Concedo
|
6f82e17b7a
|
added MPT support
|
2023-06-03 16:14:08 +08:00 |
|
Concedo
|
234270bd83
|
back to 32 block size, not better
|
2023-06-01 00:14:22 +08:00 |
|
Concedo
|
446e42a8c6
|
change dmmv block size
|
2023-05-31 21:40:12 +08:00 |
|