Commit graph

106 commits

Author SHA1 Message Date
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
96b0e536b7 Merge branch 'opencl-dev-concedo' into concedo_experimental 2023-06-02 22:12:14 +08:00
Concedo
59fe16877d Clblast fixes + enhancements to save VRAM:
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-06-02 12:19:59 +08:00
0cc4m
24239f0df7 Improve implementation 2023-06-01 18:57:08 +02:00
Concedo
234270bd83 back to 32 block size, not better 2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6 change dmmv block size 2023-05-31 21:40:12 +08:00
Concedo
077ee4e989 Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612)""
This reverts commit 4afa38e744.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 17:53:14 +08:00
0cc4m
5e1eecfe12 Adapt to #1612 cl_mem malloc changes 2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387 Merge remote-tracking branch 'origin/master' into opencl-dev 2023-05-31 06:58:51 +02:00
Concedo
85c9f7df41 Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 10:20:32 +08:00
Concedo
4afa38e744 Revert "opencl : no need to allocate cl_mem on heap (#1612)"
This reverts commit bb051d9723.
2023-05-31 10:20:23 +08:00
0cc4m
ac6b49ed45 Reduce queueing overhead for contiguous tensors by using single mul kernel call 2023-05-30 18:49:53 +02:00
Concedo
3a73ebe8d2 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/full.Dockerfile
#	.devops/main.Dockerfile
#	Makefile
2023-05-29 16:47:32 +08:00
Concedo
254a9ff12c Merge commit 'ebc5d0651a' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-05-29 16:26:24 +08:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
0cc4m
97c5cca4e5 OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel 2023-05-27 12:00:56 +02:00
0cc4m
ebc5d0651a Use events instead of clFinish, where possible 2023-05-27 10:03:35 +02:00
Concedo
6d7749c98f no difference 2023-05-27 12:42:19 +08:00
Concedo
bd4fe936f5 cleanup sampling code 2023-05-27 11:58:39 +08:00
Concedo
c97e10c50c Merge branch 'master' into concedo_experimental 2023-05-24 00:36:30 +08:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
Concedo
5bf9784381 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	ggml-opencl.cpp
#	llama.cpp
2023-05-23 18:19:16 +08:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
* Move back to C++ for OpenCL

* Refactor OpenCL code to work more like the CUDA code, add missing functions

* Deduplicate dequant kernels

* Add OpenCL compile options

* Use compile args for preprocessing constants

* Restore default platform + device selection by id behavior

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00
Concedo
981d5ba866 Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml-opencl.cpp
#	llama.cpp
#	otherarch/ggml_v2-opencl-legacy.c
2023-05-22 16:16:48 +08:00
Concedo
587308a202 fixed some build errors on linux, changed icon resolution, added more error printing 2023-05-22 12:18:42 +08:00
0cc4m
18e9dd87da Explicitely set GEMM type 2023-05-21 08:34:17 +02:00
0cc4m
b6b39960c0 Use compile args for preprocessing constants 2023-05-21 08:17:17 +02:00
0cc4m
a1657d0233 Add OpenCL compile options 2023-05-21 07:53:22 +02:00
0cc4m
e41a7ae40c Fix convert_row_f16 kernel issue 2023-05-21 07:53:22 +02:00
0cc4m
457eff920e Deduplicate dequant kernels 2023-05-21 07:53:22 +02:00
0cc4m
42e1a2ba3d Fix tensor load to device
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-05-21 07:50:27 +02:00
0cc4m
cda2d488f9 Fix error in convert f16 to f32 kernel call 2023-05-21 07:49:54 +02:00
0cc4m
915d0d1168 Generate dequant_mul_mat kernels from simple templates 2023-05-21 07:49:24 +02:00
0cc4m
cb588e2aa4 Add remaining dequant_mul_mat functions 2023-05-21 07:47:18 +02:00
0cc4m
8c7a7cea2e Fix dequant_mul_mat kernel 2023-05-21 07:44:49 +02:00
0cc4m
5f610c90bf Fix bugs in dequant_mul_mat code 2023-05-21 07:44:48 +02:00
0cc4m
17e53dbb7e Refactor OpenCL code to work more like the CUDA code, add missing functions 2023-05-21 07:42:06 +02:00
0cc4m
a7e3bee4cc Move back to C++ for OpenCL 2023-05-21 06:17:31 +02:00
Concedo
c048bcfec4 remove old filever checks (+7 squashed commit)
Squashed commit:

[b72627a] new format not working

[e568870] old ver works

[7053b77] compile errors fixed, fixing linkers

[4ae8889] add new ver

[ff82dfd] file format checks

[25b8aa8] refactoring type names

[931063b] still merging
2023-05-21 00:15:39 +08:00
0cc4m
02914698f0 Update Q4_0, Q4_1 and Q8_0 to use half instead of float 2023-05-20 07:45:56 +02:00
0cc4m
285f8f990b Explicitely set CLBlast GEMM type 2023-05-20 07:26:38 +02:00
0cc4m
78b1d8351f Add OpenCL compile options 2023-05-19 21:18:57 +02:00
0cc4m
b73c437e83 Fix convert_row_f16 kernel issue 2023-05-18 08:05:19 +02:00
0cc4m
0df55da4ca Deduplicate dequant kernels 2023-05-18 07:35:40 +02:00
0cc4m
67dbd356b6 Remove redundant constant values 2023-05-17 19:20:46 +02:00
0cc4m
de10afa80f Fix tensor load to device
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-05-16 18:49:49 +02:00
0cc4m
b3ff66d87f Fix error in convert f16 to f32 kernel call 2023-05-16 13:05:33 +02:00