0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel ( #1653 )
...
* Use events instead of clFinish, where possible
* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
* Reduce queueing overhead for contiguous tensors by using single mul kernel call
* Adapt to #1612 cl_mem malloc changes
* Reduce code duplication between cuda and opencl branches
* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
96b0e536b7
Merge branch 'opencl-dev-concedo' into concedo_experimental
2023-06-02 22:12:14 +08:00
Concedo
59fe16877d
Clblast fixes + enhancements to save VRAM:
...
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-06-02 12:19:59 +08:00
0cc4m
24239f0df7
Improve implementation
2023-06-01 18:57:08 +02:00
Concedo
234270bd83
back to 32 block size, not better
2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6
change dmmv block size
2023-05-31 21:40:12 +08:00
Concedo
077ee4e989
Revert "Revert "opencl : no need to allocate cl_mem on heap ( #1612 )""
...
This reverts commit 4afa38e744
.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 17:53:14 +08:00
0cc4m
5e1eecfe12
Adapt to #1612 cl_mem malloc changes
2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387
Merge remote-tracking branch 'origin/master' into opencl-dev
2023-05-31 06:58:51 +02:00
Concedo
85c9f7df41
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 10:20:32 +08:00
Concedo
4afa38e744
Revert "opencl : no need to allocate cl_mem on heap ( #1612 )"
...
This reverts commit bb051d9723
.
2023-05-31 10:20:23 +08:00
0cc4m
ac6b49ed45
Reduce queueing overhead for contiguous tensors by using single mul kernel call
2023-05-30 18:49:53 +02:00
Concedo
3a73ebe8d2
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/full.Dockerfile
# .devops/main.Dockerfile
# Makefile
2023-05-29 16:47:32 +08:00
Concedo
254a9ff12c
Merge commit ' ebc5d0651a
' into concedo_experimental
...
# Conflicts:
# ggml-opencl.cpp
2023-05-29 16:26:24 +08:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap ( #1612 )
2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported ( #1611 )
...
* Use strstr to check if fp16 supported
* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
0cc4m
97c5cca4e5
OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
2023-05-27 12:00:56 +02:00
0cc4m
ebc5d0651a
Use events instead of clFinish, where possible
2023-05-27 10:03:35 +02:00
Concedo
6d7749c98f
no difference
2023-05-27 12:42:19 +08:00
Concedo
bd4fe936f5
cleanup sampling code
2023-05-27 11:58:39 +08:00
Concedo
c97e10c50c
Merge branch 'master' into concedo_experimental
2023-05-24 00:36:30 +08:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue ( #1565 )
...
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
Concedo
5bf9784381
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# ggml-opencl.cpp
# llama.cpp
2023-05-23 18:19:16 +08:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration ( #1459 )
...
* Move back to C++ for OpenCL
* Refactor OpenCL code to work more like the CUDA code, add missing functions
* Deduplicate dequant kernels
* Add OpenCL compile options
* Use compile args for preprocessing constants
* Restore default platform + device selection by id behavior
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00
Concedo
981d5ba866
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ggml-opencl.cpp
# llama.cpp
# otherarch/ggml_v2-opencl-legacy.c
2023-05-22 16:16:48 +08:00
Concedo
587308a202
fixed some build errors on linux, changed icon resolution, added more error printing
2023-05-22 12:18:42 +08:00
0cc4m
18e9dd87da
Explicitely set GEMM type
2023-05-21 08:34:17 +02:00
0cc4m
b6b39960c0
Use compile args for preprocessing constants
2023-05-21 08:17:17 +02:00
0cc4m
a1657d0233
Add OpenCL compile options
2023-05-21 07:53:22 +02:00
0cc4m
e41a7ae40c
Fix convert_row_f16 kernel issue
2023-05-21 07:53:22 +02:00
0cc4m
457eff920e
Deduplicate dequant kernels
2023-05-21 07:53:22 +02:00
0cc4m
42e1a2ba3d
Fix tensor load to device
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-05-21 07:50:27 +02:00
0cc4m
cda2d488f9
Fix error in convert f16 to f32 kernel call
2023-05-21 07:49:54 +02:00
0cc4m
915d0d1168
Generate dequant_mul_mat kernels from simple templates
2023-05-21 07:49:24 +02:00
0cc4m
cb588e2aa4
Add remaining dequant_mul_mat functions
2023-05-21 07:47:18 +02:00
0cc4m
8c7a7cea2e
Fix dequant_mul_mat kernel
2023-05-21 07:44:49 +02:00
0cc4m
5f610c90bf
Fix bugs in dequant_mul_mat code
2023-05-21 07:44:48 +02:00
0cc4m
17e53dbb7e
Refactor OpenCL code to work more like the CUDA code, add missing functions
2023-05-21 07:42:06 +02:00
0cc4m
a7e3bee4cc
Move back to C++ for OpenCL
2023-05-21 06:17:31 +02:00
Concedo
c048bcfec4
remove old filever checks (+7 squashed commit)
...
Squashed commit:
[b72627a] new format not working
[e568870] old ver works
[7053b77] compile errors fixed, fixing linkers
[4ae8889] add new ver
[ff82dfd] file format checks
[25b8aa8] refactoring type names
[931063b] still merging
2023-05-21 00:15:39 +08:00
0cc4m
02914698f0
Update Q4_0, Q4_1 and Q8_0 to use half instead of float
2023-05-20 07:45:56 +02:00
0cc4m
285f8f990b
Explicitely set CLBlast GEMM type
2023-05-20 07:26:38 +02:00
0cc4m
78b1d8351f
Add OpenCL compile options
2023-05-19 21:18:57 +02:00
0cc4m
b73c437e83
Fix convert_row_f16 kernel issue
2023-05-18 08:05:19 +02:00
0cc4m
0df55da4ca
Deduplicate dequant kernels
2023-05-18 07:35:40 +02:00
0cc4m
67dbd356b6
Remove redundant constant values
2023-05-17 19:20:46 +02:00
0cc4m
de10afa80f
Fix tensor load to device
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-05-16 18:49:49 +02:00
0cc4m
b3ff66d87f
Fix error in convert f16 to f32 kernel call
2023-05-16 13:05:33 +02:00