Concedo
|
1b9b9068b1
|
merged q4_2 and q4_3 dequants and FIXED CLBLAST SLOWNESS!
|
2023-04-24 21:33:01 +08:00 |
|
Concedo
|
eb73b4c261
|
remove writing to cl_buffer_c and change it to a writeonly buffer - should work since beta is always zero.
|
2023-04-22 23:19:17 +08:00 |
|
Concedo
|
cd6c121357
|
reinstated the reusable buffers -> approx 10% speedup for prompt processing
|
2023-04-22 22:49:27 +08:00 |
|
Concedo
|
8bf2e50a11
|
converted the cl file to be a string literal instead
|
2023-04-16 15:57:30 +08:00 |
|
0cc4m
|
57d046eeb6
|
Enable dequantization on GPU for ClBlast
|
2023-04-15 18:04:24 +02:00 |
|
0cc4m
|
67d220210f
|
Revert buffer changes, no improvements in benchmarks
|
2023-04-12 23:10:35 +02:00 |
|
0cc4m
|
c7e5c4f7b2
|
Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers
|
2023-04-12 23:10:33 +02:00 |
|
Concedo
|
4faae0afa9
|
Merged upstream, fixed OSX compile errors, integrated noavx2 build into main
|
2023-04-12 18:08:55 +08:00 |
|
Concedo
|
ca69e05d1f
|
update readme and fixed typos
|
2023-04-11 23:53:21 +08:00 |
|
Concedo
|
23c675b2e6
|
integrated optional (experimentl) CLBlast support
|
2023-04-11 23:33:44 +08:00 |
|