Commit graph

126 commits

Author SHA1 Message Date
Concedo
f558e4c297 Finish dequant kernels 2023-06-13 12:05:35 +02:00
Concedo
56151bb875 Replace uchar with uint8_t 2023-06-13 12:05:35 +02:00
0cc4m
a4ee2b89d2 Fix q4_k opencl struct order 2023-06-13 12:05:35 +02:00
Concedo
1506affd0a Added q6_k kernel 2023-06-13 12:05:35 +02:00
0cc4m
44422fd567 Set global and local sizes for kernel calls for dequantizing k-quants 2023-06-13 12:05:35 +02:00
Concedo
9b41865312 Porting q2_k kernel to OpenCL 2023-06-13 12:05:23 +02:00
Concedo
9830871d0f pulled all Occam's fixes and the kquants are all working now 2023-06-13 16:15:13 +08:00
Concedo
215edf420b Merge branch 'master' into concedo_experimental 2023-06-12 21:53:13 +08:00
Concedo
9c08017051 this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not. 2023-06-12 21:47:57 +08:00
Howard Su
58970a4c39
Leverage mmap for offloading tensors to GPU (#1597)
* Rebase to latest

* Show progress

* Add assert to make sure we only allocate temp buffer for non-CPU backend tensor

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-06-12 14:44:16 +02:00
Concedo
b9f74db89e Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-06-10 21:07:20 +08:00
Robert Sung-wook Shin
98ed165574
OpenCL: Add release memory (#1741)
* Add opencl release memory

* Rename function name
2023-06-09 18:24:40 +02:00
Concedo
e78c675a6e Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	flake.lock
#	flake.nix
#	ggml-opencl.cpp
2023-06-07 15:23:29 +08:00
Johannes Gäßler
17366df842
Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)
* CUDA multi GPU + scratch

ggml_cuda_compute_forward

Tensor parallelism

ggml_cuda_add

ggml_cuda_rms_norm

ggml_cuda_silu

CUDA scratch buffer

--main-gpu CLI option
2023-06-06 21:33:23 +02:00
LostRuins
d5b111f53d
Clblast fixes + enhancements to save VRAM and offload more layers (#1675)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation

* Clblast fixes + enhancements to save VRAM:

1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.

* change max value size_t to use limits

* removed flags from the CL pool malloc, apply code tidying suggestions.
2023-06-06 19:00:01 +02:00
Concedo
54dc75ce73 Merge branch 'concedo-opencl-dev' into concedo_experimental 2023-06-05 13:31:53 +08:00
Concedo
f6431ded5d removed flags from the CL pool malloc, apply code tidying suggestions. 2023-06-05 13:31:37 +08:00
Concedo
1ddbb9acd9 Merge branch 'concedo-opencl-dev' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-06-04 18:07:27 +08:00
Concedo
64e3e74556 change max value size_t to use limits 2023-06-04 18:04:52 +08:00
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev 2023-06-04 18:00:06 +08:00
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
96b0e536b7 Merge branch 'opencl-dev-concedo' into concedo_experimental 2023-06-02 22:12:14 +08:00
Concedo
59fe16877d Clblast fixes + enhancements to save VRAM:
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-06-02 12:19:59 +08:00
0cc4m
24239f0df7 Improve implementation 2023-06-01 18:57:08 +02:00
Concedo
234270bd83 back to 32 block size, not better 2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6 change dmmv block size 2023-05-31 21:40:12 +08:00
Concedo
077ee4e989 Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612)""
This reverts commit 4afa38e744.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 17:53:14 +08:00
0cc4m
5e1eecfe12 Adapt to #1612 cl_mem malloc changes 2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387 Merge remote-tracking branch 'origin/master' into opencl-dev 2023-05-31 06:58:51 +02:00
Concedo
85c9f7df41 Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 10:20:32 +08:00
Concedo
4afa38e744 Revert "opencl : no need to allocate cl_mem on heap (#1612)"
This reverts commit bb051d9723.
2023-05-31 10:20:23 +08:00
0cc4m
ac6b49ed45 Reduce queueing overhead for contiguous tensors by using single mul kernel call 2023-05-30 18:49:53 +02:00
Concedo
3a73ebe8d2 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/full.Dockerfile
#	.devops/main.Dockerfile
#	Makefile
2023-05-29 16:47:32 +08:00
Concedo
254a9ff12c Merge commit 'ebc5d0651a' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-05-29 16:26:24 +08:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
0cc4m
97c5cca4e5 OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel 2023-05-27 12:00:56 +02:00
0cc4m
ebc5d0651a Use events instead of clFinish, where possible 2023-05-27 10:03:35 +02:00
Concedo
6d7749c98f no difference 2023-05-27 12:42:19 +08:00
Concedo
bd4fe936f5 cleanup sampling code 2023-05-27 11:58:39 +08:00
Concedo
c97e10c50c Merge branch 'master' into concedo_experimental 2023-05-24 00:36:30 +08:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
Concedo
5bf9784381 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	ggml-opencl.cpp
#	llama.cpp
2023-05-23 18:19:16 +08:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
* Move back to C++ for OpenCL

* Refactor OpenCL code to work more like the CUDA code, add missing functions

* Deduplicate dequant kernels

* Add OpenCL compile options

* Use compile args for preprocessing constants

* Restore default platform + device selection by id behavior

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00
Concedo
981d5ba866 Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml-opencl.cpp
#	llama.cpp
#	otherarch/ggml_v2-opencl-legacy.c
2023-05-22 16:16:48 +08:00
Concedo
587308a202 fixed some build errors on linux, changed icon resolution, added more error printing 2023-05-22 12:18:42 +08:00
0cc4m
18e9dd87da Explicitely set GEMM type 2023-05-21 08:34:17 +02:00
0cc4m
b6b39960c0 Use compile args for preprocessing constants 2023-05-21 08:17:17 +02:00