Concedo
d754915269
Merge branch 'optimize_quants_upstream' into concedo_experimental
2023-06-20 17:26:39 +08:00
0cc4m
8d816d19d1
Add q6_k fast matmul kernel
2023-06-20 08:41:35 +02:00
0cc4m
34a4917984
Use preprocessor for QK_K
2023-06-20 08:04:16 +02:00
0cc4m
069cbe530d
Fix q2_k fast kernel
2023-06-20 08:01:40 +02:00
Concedo
c94a438328
xx + ib0
2023-06-19 23:01:49 +08:00
Concedo
266d436746
Added broken new q4k quant
2023-06-19 22:41:35 +08:00
Concedo
278427d9a4
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
2023-06-18 15:29:44 +08:00
Howard Su
3d59ec5935
ggml : fix warnings under MSVC ( #1908 )
2023-06-17 18:46:15 +03:00
0cc4m
d411968e99
opencl : support k-quants ( #1836 )
...
* Porting q2_k kernel to OpenCL
* Set global and local sizes for kernel calls for dequantizing k-quants
* Added q6_k kernel
* Fix q4_k opencl struct order
* Replace uchar with uint8_t
* Finish dequant kernels
* Added OpenCL DMMV kernels
* Fix q2_k, improve code
* Fix q3_k
* Shorten switch statements
* Improve code formatting
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-06-16 21:59:49 +03:00
Concedo
2b4a286e56
Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental
2023-06-14 11:34:53 +08:00
0cc4m
0e3cc8e6f7
Improve code formatting
2023-06-13 16:10:25 +02:00
0cc4m
f1ac03ed37
Shorten switch statements
2023-06-13 15:21:44 +02:00
0cc4m
2a972f3649
Fix q3_k
2023-06-13 12:05:35 +02:00
0cc4m
fc8c823f34
Fix q2_k, improve code
2023-06-13 12:05:35 +02:00
Concedo
6e20827f93
Added OpenCL DMMV kernels
2023-06-13 12:05:35 +02:00
Concedo
f558e4c297
Finish dequant kernels
2023-06-13 12:05:35 +02:00
Concedo
56151bb875
Replace uchar with uint8_t
2023-06-13 12:05:35 +02:00
0cc4m
a4ee2b89d2
Fix q4_k opencl struct order
2023-06-13 12:05:35 +02:00
Concedo
1506affd0a
Added q6_k kernel
2023-06-13 12:05:35 +02:00
0cc4m
44422fd567
Set global and local sizes for kernel calls for dequantizing k-quants
2023-06-13 12:05:35 +02:00
Concedo
9b41865312
Porting q2_k kernel to OpenCL
2023-06-13 12:05:23 +02:00
Concedo
9830871d0f
pulled all Occam's fixes and the kquants are all working now
2023-06-13 16:15:13 +08:00
Concedo
215edf420b
Merge branch 'master' into concedo_experimental
2023-06-12 21:53:13 +08:00
Concedo
9c08017051
this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not.
2023-06-12 21:47:57 +08:00
Howard Su
58970a4c39
Leverage mmap for offloading tensors to GPU ( #1597 )
...
* Rebase to latest
* Show progress
* Add assert to make sure we only allocate temp buffer for non-CPU backend tensor
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-06-12 14:44:16 +02:00
Concedo
b9f74db89e
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
2023-06-10 21:07:20 +08:00
Robert Sung-wook Shin
98ed165574
OpenCL: Add release memory ( #1741 )
...
* Add opencl release memory
* Rename function name
2023-06-09 18:24:40 +02:00
Concedo
e78c675a6e
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# flake.lock
# flake.nix
# ggml-opencl.cpp
2023-06-07 15:23:29 +08:00
Johannes Gäßler
17366df842
Multi GPU support, CUDA refactor, CUDA scratch buffer ( #1703 )
...
* CUDA multi GPU + scratch
ggml_cuda_compute_forward
Tensor parallelism
ggml_cuda_add
ggml_cuda_rms_norm
ggml_cuda_silu
CUDA scratch buffer
--main-gpu CLI option
2023-06-06 21:33:23 +02:00
LostRuins
d5b111f53d
Clblast fixes + enhancements to save VRAM and offload more layers ( #1675 )
...
* Use events instead of clFinish, where possible
* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
* Reduce queueing overhead for contiguous tensors by using single mul kernel call
* Adapt to #1612 cl_mem malloc changes
* Reduce code duplication between cuda and opencl branches
* Improve implementation
* Clblast fixes + enhancements to save VRAM:
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
* change max value size_t to use limits
* removed flags from the CL pool malloc, apply code tidying suggestions.
2023-06-06 19:00:01 +02:00
Concedo
54dc75ce73
Merge branch 'concedo-opencl-dev' into concedo_experimental
2023-06-05 13:31:53 +08:00
Concedo
f6431ded5d
removed flags from the CL pool malloc, apply code tidying suggestions.
2023-06-05 13:31:37 +08:00
Concedo
1ddbb9acd9
Merge branch 'concedo-opencl-dev' into concedo_experimental
...
# Conflicts:
# ggml-opencl.cpp
2023-06-04 18:07:27 +08:00
Concedo
64e3e74556
change max value size_t to use limits
2023-06-04 18:04:52 +08:00
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev
2023-06-04 18:00:06 +08:00
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel ( #1653 )
...
* Use events instead of clFinish, where possible
* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
* Reduce queueing overhead for contiguous tensors by using single mul kernel call
* Adapt to #1612 cl_mem malloc changes
* Reduce code duplication between cuda and opencl branches
* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
96b0e536b7
Merge branch 'opencl-dev-concedo' into concedo_experimental
2023-06-02 22:12:14 +08:00
Concedo
59fe16877d
Clblast fixes + enhancements to save VRAM:
...
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-06-02 12:19:59 +08:00
0cc4m
24239f0df7
Improve implementation
2023-06-01 18:57:08 +02:00
Concedo
234270bd83
back to 32 block size, not better
2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6
change dmmv block size
2023-05-31 21:40:12 +08:00
Concedo
077ee4e989
Revert "Revert "opencl : no need to allocate cl_mem on heap ( #1612 )""
...
This reverts commit 4afa38e744
.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 17:53:14 +08:00
0cc4m
5e1eecfe12
Adapt to #1612 cl_mem malloc changes
2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387
Merge remote-tracking branch 'origin/master' into opencl-dev
2023-05-31 06:58:51 +02:00
Concedo
85c9f7df41
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 10:20:32 +08:00
Concedo
4afa38e744
Revert "opencl : no need to allocate cl_mem on heap ( #1612 )"
...
This reverts commit bb051d9723
.
2023-05-31 10:20:23 +08:00
0cc4m
ac6b49ed45
Reduce queueing overhead for contiguous tensors by using single mul kernel call
2023-05-30 18:49:53 +02:00
Concedo
3a73ebe8d2
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/full.Dockerfile
# .devops/main.Dockerfile
# Makefile
2023-05-29 16:47:32 +08:00