Commit graph

134 commits

Author SHA1 Message Date
Concedo
ed09a854f0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	ci/run.sh
#	ggml-opencl.cpp
#	tests/CMakeLists.txt
2024-01-27 11:45:07 +08:00
0cc4m
a1d6df129b
Add OpenCL add kernel (#5151)
* Add OpenCL add kernel

* Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results
2024-01-26 23:07:32 +01:00
Concedo
2a4a7241e6 Merge branch 'vulkan_test' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	llama.cpp
2024-01-25 23:01:44 +08:00
Concedo
7b3866f211 vulkan implementation from occam (early access, squashed) 2024-01-25 18:13:19 +08:00
Concedo
71e9a64171 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/nix-ci.yml
#	CMakeLists.txt
#	Makefile
#	ggml-cuda.cu
#	ggml-opencl.cpp
#	llama.cpp
2024-01-20 23:27:42 +08:00
slaren
e7e4df031b
llama : ggml-backend integration (#4766)
* llama : ggml-backend integration

* ggml-backend : add names to buffers

* fix unmap after loading

* batched-bench : add tensor_split param

* llama : check for null tensor_split

* ggml-backend : increase GGML_MAX_BACKENDS

* improve graph splitting, partial fix for --no-kv-offload

* cuda : add ggml-backend split buffer support

* cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)

* ggml : fix null backend dereference (#4807)

* ggml : fix null backend dereference

* ggml : also check ggml_backend_is_cpu

* test-backend-ops : check buffer allocation failures

* llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)

* ggml : fix mul_mat_id work size

* llama : rewrite session kv load/set without graphs

* minor

* llama : only initialize used backends, free backends on context free

* llama : abort ctx if cuda backend init fails

* llama : rewrite lora with ggml-backend and compute on CPU

ggml-ci

* llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer

* opencl : add ggml-backend buffer type

* cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf)

* llama : on Metal, by default offload the full model

ggml-ci

* metal : page align the data ptr (#4854)

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix split buffer free

* address review comments

* llama-bench : add split-mode parameter

* fix whitespace

* opencl : fix double initialization

* server : add --split-mode parameter

* use async copy and compute to improve multi-gpu performance

ggml-ci

* use async memcpys to copy the graph outputs to the CPU

* fix opencl

* use a host buffer for the cpu compute buffer for faster copies to the gpu

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-01-12 20:07:38 +01:00
Concedo
4f40c226a0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
2023-12-01 23:46:59 +08:00
Jared Van Bortel
15f5d96037
build : fix build info generation and cleanup Makefile (#3920)
* cmake : fix joining of REAL_GIT_DIR

* fix includes with help from include-what-you-use

* make : remove unneeded deps and add test-rope target

* fix C includes in C++ source files

* Revert "fix includes with help from include-what-you-use"

This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae.
2023-12-01 00:23:08 +02:00
Concedo
f344a99425 causallm is not working well on clblast, running out of mem wth blas. this helps a bit but doesnt fix the problem. 2023-10-26 23:36:35 +08:00
Concedo
5db89b90b7 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	build.zig
#	ggml-opencl.cpp
#	tests/CMakeLists.txt
#	tests/test-double-float.cpp
#	tests/test-sampling.cpp
2023-10-25 23:58:15 +08:00
shibe2
465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat
Reduce repeated dequantization of the same data.
2023-10-20 22:30:52 +04:00
Concedo
957e245285 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
2023-10-19 23:32:52 +08:00
shibe2
1117d06607
opencl : fix element-wise multiplication (#3656) 2023-10-18 15:09:22 +03:00
Concedo
700951dbd4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-10-18 16:33:09 +08:00
shibe2
40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize)
Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.
2023-10-17 21:02:30 +04:00
Concedo
5cfabaee25 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	docs/BLIS.md
2023-10-15 15:50:20 +08:00
shibe2
1e0e873c37
CLBlast: Fix matrix-vector multiplication (#3544) 2023-10-12 21:59:47 +02:00
Concedo
b5cd935cdb Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ggml-opencl.cpp
2023-10-06 17:58:08 +08:00
shibe2
e2583cbc29 CLBlast: Fix handling of on-device tensor data
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
2023-10-05 18:25:23 +04:00
Concedo
c249f7dbc5 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	tests/CMakeLists.txt
2023-10-03 23:51:30 +08:00
shibe2
665018c749
CLBlast: Add broadcast support for matrix multiplication (#3402)
Broadcast src0 into src1 across dimensions 2 and 3 when needed.
This is required for models that use GQA.
2023-10-02 21:26:15 +02:00
Concedo
bd2500db36 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	build.zig
#	flake.nix
2023-09-23 10:51:34 +08:00
shibe2
36b904e200
ggml-opencl.cpp: Make private functions static (#3300) 2023-09-21 14:10:26 -04:00
Ycros
f6ba36dff6
Reduce warnings. (#439) 2023-09-16 18:52:09 +08:00
Concedo
a0aa620718 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
2023-09-05 21:49:24 +08:00
slaren
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) 2023-09-04 14:59:52 +02:00
Wentai Zhang
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955)
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
2023-09-03 11:46:44 +03:00
Concedo
48c27a9ce1 hotfix for 70b broadcast issues 2023-07-25 01:32:47 +08:00
Concedo
32102c2064 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-07 14:15:39 +08:00
Howard Su
481f793acc
Fix opencl by wrap #if-else-endif with \n (#2086) 2023-07-07 05:34:18 +02:00
Concedo
69add28324 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
2023-07-04 18:51:42 +08:00
Govlzkoy
14a2cc71f6
[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) 2023-07-04 07:50:00 +08:00
LostRuins
96a712ca1b
Porting the improved K-Quant CUDA kernels to OpenCL (#1966)
* Added broken new q4k quant

* xx + ib0

* Fix q2_k fast kernel

* Use preprocessor for QK_K

* Add q6_k fast matmul kernel

* ported q3k speedup successfully

* ported q2k and q5k speedups

* remove old dot kernels and template

* fixed global const struct types

* fixing address spaces

* fixed string too long CI issue

---------

Co-authored-by: 0cc4m <picard12@live.de>
2023-06-29 05:56:43 +02:00
Concedo
6d718525c4 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-23 23:56:31 +08:00
Concedo
f7b096374d fixed string too long CI issue 2023-06-23 23:56:22 +08:00
Concedo
da668e685f fixing address spaces 2023-06-20 22:46:11 +08:00
Concedo
cce6e67f44 fixing address spaces 2023-06-20 22:45:16 +08:00
Concedo
1f1735f5ad Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 21:39:35 +08:00
Concedo
6b75fc48b9 fixed global const struct types 2023-06-20 21:38:48 +08:00
Concedo
c5ae3f50a7 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 18:41:13 +08:00
Concedo
a6e8b0216d remove old dot kernels and template 2023-06-20 18:37:48 +08:00
Concedo
93247a11cd ported q2k and q5k speedups 2023-06-20 18:37:41 +08:00
Concedo
029bed6446 ported q3k speedup successfully 2023-06-20 18:37:26 +08:00
Concedo
d754915269 Merge branch 'optimize_quants_upstream' into concedo_experimental 2023-06-20 17:26:39 +08:00
0cc4m
8d816d19d1 Add q6_k fast matmul kernel 2023-06-20 08:41:35 +02:00
0cc4m
34a4917984 Use preprocessor for QK_K 2023-06-20 08:04:16 +02:00
0cc4m
069cbe530d Fix q2_k fast kernel 2023-06-20 08:01:40 +02:00
Concedo
c94a438328 xx + ib0 2023-06-19 23:01:49 +08:00
Concedo
266d436746 Added broken new q4k quant 2023-06-19 22:41:35 +08:00
Concedo
278427d9a4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
2023-06-18 15:29:44 +08:00