Commit graph

8 commits

Author SHA1 Message Date
0cc4m
49aaf08387 Merge remote-tracking branch 'origin/master' into opencl-dev 2023-05-31 06:58:51 +02:00
0cc4m
ac6b49ed45 Reduce queueing overhead for contiguous tensors by using single mul kernel call 2023-05-30 18:49:53 +02:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
0cc4m
97c5cca4e5 OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel 2023-05-27 12:00:56 +02:00
0cc4m
ebc5d0651a Use events instead of clFinish, where possible 2023-05-27 10:03:35 +02:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
* Move back to C++ for OpenCL

* Refactor OpenCL code to work more like the CUDA code, add missing functions

* Deduplicate dequant kernels

* Add OpenCL compile options

* Use compile args for preprocessing constants

* Restore default platform + device selection by id behavior

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00