Commit graph

13 commits

Author SHA1 Message Date
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev 2023-06-04 18:00:06 +08:00
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
59fe16877d Clblast fixes + enhancements to save VRAM:
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
0cc4m
24239f0df7 Improve implementation 2023-06-01 18:57:08 +02:00
0cc4m
5e1eecfe12 Adapt to #1612 cl_mem malloc changes 2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387 Merge remote-tracking branch 'origin/master' into opencl-dev 2023-05-31 06:58:51 +02:00
0cc4m
ac6b49ed45 Reduce queueing overhead for contiguous tensors by using single mul kernel call 2023-05-30 18:49:53 +02:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
0cc4m
97c5cca4e5 OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel 2023-05-27 12:00:56 +02:00
0cc4m
ebc5d0651a Use events instead of clFinish, where possible 2023-05-27 10:03:35 +02:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
* Move back to C++ for OpenCL

* Refactor OpenCL code to work more like the CUDA code, add missing functions

* Deduplicate dequant kernels

* Add OpenCL compile options

* Use compile args for preprocessing constants

* Restore default platform + device selection by id behavior

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00