Concedo
5cfabaee25
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# docs/BLIS.md
2023-10-15 15:50:20 +08:00
shibe2
1e0e873c37
CLBlast: Fix matrix-vector multiplication ( #3544 )
2023-10-12 21:59:47 +02:00
Concedo
b5cd935cdb
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml-opencl.cpp
2023-10-06 17:58:08 +08:00
shibe2
e2583cbc29
CLBlast: Fix handling of on-device tensor data
...
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
2023-10-05 18:25:23 +04:00
Concedo
c249f7dbc5
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .dockerignore
# .gitignore
# CMakeLists.txt
# Makefile
# tests/CMakeLists.txt
2023-10-03 23:51:30 +08:00
shibe2
665018c749
CLBlast: Add broadcast support for matrix multiplication ( #3402 )
...
Broadcast src0 into src1 across dimensions 2 and 3 when needed.
This is required for models that use GQA.
2023-10-02 21:26:15 +02:00
Concedo
bd2500db36
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# build.zig
# flake.nix
2023-09-23 10:51:34 +08:00
shibe2
36b904e200
ggml-opencl.cpp: Make private functions static ( #3300 )
2023-09-21 14:10:26 -04:00
Ycros
f6ba36dff6
Reduce warnings. ( #439 )
2023-09-16 18:52:09 +08:00
Concedo
a0aa620718
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
2023-09-05 21:49:24 +08:00
slaren
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra ( #2994 )
2023-09-04 14:59:52 +02:00
Wentai Zhang
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() ( #2955 )
...
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03 11:46:44 +03:00
Concedo
48c27a9ce1
hotfix for 70b broadcast issues
2023-07-25 01:32:47 +08:00
Concedo
32102c2064
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-07-07 14:15:39 +08:00
Howard Su
481f793acc
Fix opencl by wrap #if-else-endif with \n ( #2086 )
2023-07-07 05:34:18 +02:00
Concedo
69add28324
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
2023-07-04 18:51:42 +08:00
Govlzkoy
14a2cc71f6
[ggml] fix index for ne03 value in ggml_cl_mul_f32 ( #2088 )
2023-07-04 07:50:00 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL ( #1966 )
...
* Added broken new q4k quant
* xx + ib0
* Fix q2_k fast kernel
* Use preprocessor for QK_K
* Add q6_k fast matmul kernel
* ported q3k speedup successfully
* ported q2k and q5k speedups
* remove old dot kernels and template
* fixed global const struct types
* fixing address spaces
* fixed string too long CI issue
---------
Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
Concedo
6d718525c4
Merge branch 'optimize_quants_upstream' into concedo_experimental
2023-06-23 23:56:31 +08:00
Concedo
f7b096374d
fixed string too long CI issue
2023-06-23 23:56:22 +08:00
Concedo
da668e685f
fixing address spaces
2023-06-20 22:46:11 +08:00
Concedo
cce6e67f44
fixing address spaces
2023-06-20 22:45:16 +08:00
Concedo
1f1735f5ad
Merge branch 'optimize_quants_upstream' into concedo_experimental
2023-06-20 21:39:35 +08:00
Concedo
6b75fc48b9
fixed global const struct types
2023-06-20 21:38:48 +08:00
Concedo
c5ae3f50a7
Merge branch 'optimize_quants_upstream' into concedo_experimental
2023-06-20 18:41:13 +08:00
Concedo
a6e8b0216d
remove old dot kernels and template
2023-06-20 18:37:48 +08:00
Concedo
93247a11cd
ported q2k and q5k speedups
2023-06-20 18:37:41 +08:00
Concedo
029bed6446
ported q3k speedup successfully
2023-06-20 18:37:26 +08:00
Concedo
d754915269
Merge branch 'optimize_quants_upstream' into concedo_experimental
2023-06-20 17:26:39 +08:00
0cc4m
8d816d19d1
Add q6_k fast matmul kernel
2023-06-20 08:41:35 +02:00
0cc4m
34a4917984
Use preprocessor for QK_K
2023-06-20 08:04:16 +02:00
0cc4m
069cbe530d
Fix q2_k fast kernel
2023-06-20 08:01:40 +02:00
Concedo
c94a438328
xx + ib0
2023-06-19 23:01:49 +08:00
Concedo
266d436746
Added broken new q4k quant
2023-06-19 22:41:35 +08:00
Concedo
278427d9a4
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
2023-06-18 15:29:44 +08:00
Howard Su
3d59ec5935
ggml : fix warnings under MSVC ( #1908 )
2023-06-17 18:46:15 +03:00
0cc4m
d411968e99
opencl : support k-quants ( #1836 )
...
* Porting q2_k kernel to OpenCL
* Set global and local sizes for kernel calls for dequantizing k-quants
* Added q6_k kernel
* Fix q4_k opencl struct order
* Replace uchar with uint8_t
* Finish dequant kernels
* Added OpenCL DMMV kernels
* Fix q2_k, improve code
* Fix q3_k
* Shorten switch statements
* Improve code formatting
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-06-16 21:59:49 +03:00
Concedo
2b4a286e56
Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental
2023-06-14 11:34:53 +08:00
0cc4m
0e3cc8e6f7
Improve code formatting
2023-06-13 16:10:25 +02:00
0cc4m
f1ac03ed37
Shorten switch statements
2023-06-13 15:21:44 +02:00
0cc4m
2a972f3649
Fix q3_k
2023-06-13 12:05:35 +02:00
0cc4m
fc8c823f34
Fix q2_k, improve code
2023-06-13 12:05:35 +02:00
Concedo
6e20827f93
Added OpenCL DMMV kernels
2023-06-13 12:05:35 +02:00
Concedo
f558e4c297
Finish dequant kernels
2023-06-13 12:05:35 +02:00
Concedo
56151bb875
Replace uchar with uint8_t
2023-06-13 12:05:35 +02:00
0cc4m
a4ee2b89d2
Fix q4_k opencl struct order
2023-06-13 12:05:35 +02:00
Concedo
1506affd0a
Added q6_k kernel
2023-06-13 12:05:35 +02:00
0cc4m
44422fd567
Set global and local sizes for kernel calls for dequantizing k-quants
2023-06-13 12:05:35 +02:00
Concedo
9b41865312
Porting q2_k kernel to OpenCL
2023-06-13 12:05:23 +02:00
Concedo
9830871d0f
pulled all Occam's fixes and the kquants are all working now
2023-06-13 16:15:13 +08:00