Commit graph

113 commits

Author SHA1 Message Date
Concedo
bd2500db36 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	build.zig
#	flake.nix
2023-09-23 10:51:34 +08:00
shibe2
36b904e200
ggml-opencl.cpp: Make private functions static (#3300) 2023-09-21 14:10:26 -04:00
Ycros
f6ba36dff6
Reduce warnings. (#439) 2023-09-16 18:52:09 +08:00
Concedo
a0aa620718 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
2023-09-05 21:49:24 +08:00
slaren
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) 2023-09-04 14:59:52 +02:00
Wentai Zhang
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03 11:46:44 +03:00
Concedo
48c27a9ce1 hotfix for 70b broadcast issues 2023-07-25 01:32:47 +08:00
Concedo
32102c2064 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-07 14:15:39 +08:00
Howard Su
481f793acc
Fix opencl by wrap #if-else-endif with \n (#2086) 2023-07-07 05:34:18 +02:00
Concedo
69add28324 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
2023-07-04 18:51:42 +08:00
Govlzkoy
14a2cc71f6
[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) 2023-07-04 07:50:00 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
Concedo
6d718525c4 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-23 23:56:31 +08:00
Concedo
f7b096374d fixed string too long CI issue 2023-06-23 23:56:22 +08:00
Concedo
da668e685f fixing address spaces 2023-06-20 22:46:11 +08:00
Concedo
cce6e67f44 fixing address spaces 2023-06-20 22:45:16 +08:00
Concedo
1f1735f5ad Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 21:39:35 +08:00
Concedo
6b75fc48b9 fixed global const struct types 2023-06-20 21:38:48 +08:00
Concedo
c5ae3f50a7 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 18:41:13 +08:00
Concedo
a6e8b0216d remove old dot kernels and template 2023-06-20 18:37:48 +08:00
Concedo
93247a11cd ported q2k and q5k speedups 2023-06-20 18:37:41 +08:00
Concedo
029bed6446 ported q3k speedup successfully 2023-06-20 18:37:26 +08:00
Concedo
d754915269 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 17:26:39 +08:00
0cc4m
8d816d19d1 Add q6_k fast matmul kernel 2023-06-20 08:41:35 +02:00
0cc4m
34a4917984 Use preprocessor for QK_K 2023-06-20 08:04:16 +02:00
0cc4m
069cbe530d Fix q2_k fast kernel 2023-06-20 08:01:40 +02:00
Concedo
c94a438328 xx + ib0 2023-06-19 23:01:49 +08:00
Concedo
266d436746 Added broken new q4k quant 2023-06-19 22:41:35 +08:00
Concedo
278427d9a4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
2023-06-18 15:29:44 +08:00
Howard Su
3d59ec5935
ggml : fix warnings under MSVC (#1908) 2023-06-17 18:46:15 +03:00
0cc4m
d411968e99
opencl : support k-quants (#1836)
* Porting q2_k kernel to OpenCL

* Set global and local sizes for kernel calls for dequantizing k-quants

* Added q6_k kernel

* Fix q4_k opencl struct order

* Replace uchar with uint8_t

* Finish dequant kernels

* Added OpenCL DMMV kernels

* Fix q2_k, improve code

* Fix q3_k

* Shorten switch statements

* Improve code formatting

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-06-16 21:59:49 +03:00
Concedo
2b4a286e56 Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental 2023-06-14 11:34:53 +08:00
0cc4m
0e3cc8e6f7 Improve code formatting 2023-06-13 16:10:25 +02:00
0cc4m
f1ac03ed37 Shorten switch statements 2023-06-13 15:21:44 +02:00
0cc4m
2a972f3649 Fix q3_k 2023-06-13 12:05:35 +02:00
0cc4m
fc8c823f34 Fix q2_k, improve code 2023-06-13 12:05:35 +02:00
Concedo
6e20827f93 Added OpenCL DMMV kernels 2023-06-13 12:05:35 +02:00
Concedo
f558e4c297 Finish dequant kernels 2023-06-13 12:05:35 +02:00
Concedo
56151bb875 Replace uchar with uint8_t 2023-06-13 12:05:35 +02:00
0cc4m
a4ee2b89d2 Fix q4_k opencl struct order 2023-06-13 12:05:35 +02:00
Concedo
1506affd0a Added q6_k kernel 2023-06-13 12:05:35 +02:00
0cc4m
44422fd567 Set global and local sizes for kernel calls for dequantizing k-quants 2023-06-13 12:05:35 +02:00
Concedo
9b41865312 Porting q2_k kernel to OpenCL 2023-06-13 12:05:23 +02:00
Concedo
9830871d0f pulled all Occam's fixes and the kquants are all working now 2023-06-13 16:15:13 +08:00
Concedo
215edf420b Merge branch 'master' into concedo_experimental 2023-06-12 21:53:13 +08:00
Concedo
9c08017051 this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not. 2023-06-12 21:47:57 +08:00
Howard Su
58970a4c39
Leverage mmap for offloading tensors to GPU (#1597)
* Rebase to latest

* Show progress

* Add assert to make sure we only allocate temp buffer for non-CPU backend tensor

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-06-12 14:44:16 +02:00
Concedo
b9f74db89e Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-06-10 21:07:20 +08:00
Robert Sung-wook Shin
98ed165574
OpenCL: Add release memory (#1741)
* Add opencl release memory

* Rename function name
2023-06-09 18:24:40 +02:00
Concedo
e78c675a6e Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	flake.lock
#	flake.nix
#	ggml-opencl.cpp
2023-06-07 15:23:29 +08:00