Commit graph

106 commits

Author SHA1 Message Date
Concedo
32102c2064 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-07 14:15:39 +08:00
Howard Su
481f793acc
Fix opencl by wrap #if-else-endif with \n (#2086) 2023-07-07 05:34:18 +02:00
Concedo
69add28324 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
2023-07-04 18:51:42 +08:00
Govlzkoy
14a2cc71f6
[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) 2023-07-04 07:50:00 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
Concedo
6d718525c4 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-23 23:56:31 +08:00
Concedo
f7b096374d fixed string too long CI issue 2023-06-23 23:56:22 +08:00
Concedo
da668e685f fixing address spaces 2023-06-20 22:46:11 +08:00
Concedo
cce6e67f44 fixing address spaces 2023-06-20 22:45:16 +08:00
Concedo
1f1735f5ad Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 21:39:35 +08:00
Concedo
6b75fc48b9 fixed global const struct types 2023-06-20 21:38:48 +08:00
Concedo
c5ae3f50a7 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 18:41:13 +08:00
Concedo
a6e8b0216d remove old dot kernels and template 2023-06-20 18:37:48 +08:00
Concedo
93247a11cd ported q2k and q5k speedups 2023-06-20 18:37:41 +08:00
Concedo
029bed6446 ported q3k speedup successfully 2023-06-20 18:37:26 +08:00
Concedo
d754915269 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 17:26:39 +08:00
0cc4m
8d816d19d1 Add q6_k fast matmul kernel 2023-06-20 08:41:35 +02:00
0cc4m
34a4917984 Use preprocessor for QK_K 2023-06-20 08:04:16 +02:00
0cc4m
069cbe530d Fix q2_k fast kernel 2023-06-20 08:01:40 +02:00
Concedo
c94a438328 xx + ib0 2023-06-19 23:01:49 +08:00
Concedo
266d436746 Added broken new q4k quant 2023-06-19 22:41:35 +08:00
Concedo
278427d9a4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
2023-06-18 15:29:44 +08:00
Howard Su
3d59ec5935
ggml : fix warnings under MSVC (#1908) 2023-06-17 18:46:15 +03:00
0cc4m
d411968e99
opencl : support k-quants (#1836)
* Porting q2_k kernel to OpenCL

* Set global and local sizes for kernel calls for dequantizing k-quants

* Added q6_k kernel

* Fix q4_k opencl struct order

* Replace uchar with uint8_t

* Finish dequant kernels

* Added OpenCL DMMV kernels

* Fix q2_k, improve code

* Fix q3_k

* Shorten switch statements

* Improve code formatting

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-06-16 21:59:49 +03:00
Concedo
2b4a286e56 Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental 2023-06-14 11:34:53 +08:00
0cc4m
0e3cc8e6f7 Improve code formatting 2023-06-13 16:10:25 +02:00
0cc4m
f1ac03ed37 Shorten switch statements 2023-06-13 15:21:44 +02:00
0cc4m
2a972f3649 Fix q3_k 2023-06-13 12:05:35 +02:00
0cc4m
fc8c823f34 Fix q2_k, improve code 2023-06-13 12:05:35 +02:00
Concedo
6e20827f93 Added OpenCL DMMV kernels 2023-06-13 12:05:35 +02:00
Concedo
f558e4c297 Finish dequant kernels 2023-06-13 12:05:35 +02:00
Concedo
56151bb875 Replace uchar with uint8_t 2023-06-13 12:05:35 +02:00
0cc4m
a4ee2b89d2 Fix q4_k opencl struct order 2023-06-13 12:05:35 +02:00
Concedo
1506affd0a Added q6_k kernel 2023-06-13 12:05:35 +02:00
0cc4m
44422fd567 Set global and local sizes for kernel calls for dequantizing k-quants 2023-06-13 12:05:35 +02:00
Concedo
9b41865312 Porting q2_k kernel to OpenCL 2023-06-13 12:05:23 +02:00
Concedo
9830871d0f pulled all Occam's fixes and the kquants are all working now 2023-06-13 16:15:13 +08:00
Concedo
215edf420b Merge branch 'master' into concedo_experimental 2023-06-12 21:53:13 +08:00
Concedo
9c08017051 this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not. 2023-06-12 21:47:57 +08:00
Howard Su
58970a4c39
Leverage mmap for offloading tensors to GPU (#1597)
* Rebase to latest

* Show progress

* Add assert to make sure we only allocate temp buffer for non-CPU backend tensor

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-06-12 14:44:16 +02:00
Concedo
b9f74db89e Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-06-10 21:07:20 +08:00
Robert Sung-wook Shin
98ed165574
OpenCL: Add release memory (#1741)
* Add opencl release memory

* Rename function name
2023-06-09 18:24:40 +02:00
Concedo
e78c675a6e Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	flake.lock
#	flake.nix
#	ggml-opencl.cpp
2023-06-07 15:23:29 +08:00
Johannes Gäßler
17366df842
Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)
* CUDA multi GPU + scratch

ggml_cuda_compute_forward

Tensor parallelism

ggml_cuda_add

ggml_cuda_rms_norm

ggml_cuda_silu

CUDA scratch buffer

--main-gpu CLI option
2023-06-06 21:33:23 +02:00
LostRuins
d5b111f53d
Clblast fixes + enhancements to save VRAM and offload more layers (#1675)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation

* Clblast fixes + enhancements to save VRAM:

1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.

* change max value size_t to use limits

* removed flags from the CL pool malloc, apply code tidying suggestions.
2023-06-06 19:00:01 +02:00
Concedo
54dc75ce73 Merge branch 'concedo-opencl-dev' into concedo_experimental 2023-06-05 13:31:53 +08:00
Concedo
f6431ded5d removed flags from the CL pool malloc, apply code tidying suggestions. 2023-06-05 13:31:37 +08:00
Concedo
1ddbb9acd9 Merge branch 'concedo-opencl-dev' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-06-04 18:07:27 +08:00
Concedo
64e3e74556 change max value size_t to use limits 2023-06-04 18:04:52 +08:00
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev 2023-06-04 18:00:06 +08:00