Commit graph

34 commits

Author SHA1 Message Date
Concedo
b63158005f All samplers moved to kcpp side 2024-09-09 18:14:11 +08:00
Concedo
70cdb55cc9 Merge commit '947538acb8' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	CMakePresets.json
#	examples/llama-bench/llama-bench.cpp
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-quantize-fns.cpp
2024-09-09 11:26:34 +08:00
Markus Tavenrath
daa9623ab0
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118)
* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
2024-09-08 21:43:48 +02:00
Salvatore Mesoraca
406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Salvatore Mesoraca
9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Changyeon Kim
409dc4f8bb
ggml : fix build break for the vulkan-debug (#9265)
- windows build : Ok.
- linux build : Ok.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
2024-09-06 15:54:50 +03:00
Concedo
d220495dd4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-intel.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	.github/workflows/docker.yml
#	docs/docker.md
#	examples/llama-bench/llama-bench.cpp
#	flake.lock
#	ggml/include/ggml.h
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-rope.cpp
2024-08-30 10:37:39 +08:00
Georgi Gerganov
231cff5f6f sync : ggml 2024-08-27 22:41:27 +03:00
Concedo
6200b6d64e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.gitignore
#	README.md
#	docs/build.md
#	flake.lock
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
2024-08-21 17:17:36 +08:00
Changyeon Kim
2f3c1466ff
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984)
* llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model.

- The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
- A GGML_OP_ACC shader has been added.
- The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* fix-up coding style.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* Fix-up the missing initial parameter to resolve the compilation warning.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Add missing parameters.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Use nb1 and nb2 for dst.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* Fix check results ggml_acc call

---------

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
Co-authored-by: 0cc4m <picard12@live.de>
2024-08-20 21:00:00 +02:00
Concedo
1edf83761a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/bench.yml.disabled
#	Makefile
#	README.md
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-vulkan.cpp
2024-08-17 16:21:14 +08:00
0cc4m
5fd89a70ea
Vulkan Optimizations and Fixes (#8959)
* Optimize Vulkan REPEAT performance

* Use Vulkan GLSL fused multiply-add instruction where possible

* Add GGML_VULKAN_PERF option to output performance data per operator

* Rework and fix Vulkan descriptor set and descriptor pool handling

* Fix float32 concat f16 shader validation error

* Add Vulkan GROUP_NORM eps parameter

* Fix validation error with transfer queue memory barrier flags

* Remove trailing whitespaces
2024-08-14 18:32:53 +02:00
Concedo
e8de0af3ec Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/bench.yml
#	.github/workflows/build.yml
#	.github/workflows/python-check-requirements.yml
#	README.md
#	docs/backend/SYCL.md
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/kompute-shaders/op_rope_f16.comp
#	ggml/src/kompute-shaders/op_rope_f32.comp
#	ggml/src/kompute-shaders/rope_common.comp
2024-08-14 22:25:43 +08:00
Daniel Bevenius
06943a69f6
ggml : move rope type enum to ggml.h (#8949)
* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-08-13 21:13:15 +02:00
Markus Tavenrath
7c5bfd57f8
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943)
* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-08-11 10:09:09 +02:00
Concedo
bdfe8526b8 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.gitignore
#	CONTRIBUTING.md
#	Makefile
#	examples/llava/CMakeLists.txt
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	src/llama-vocab.cpp
2024-08-10 11:42:32 +08:00
Matt Stephenson
70c0ea3560
whisper : use vulkan as gpu backend when available (whisper/2302)
* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>

---------

Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
2024-08-09 10:03:44 +03:00
Concedo
6dd3d5515e too much memory prints warning instead of exiting 2024-08-08 19:34:52 +08:00
Concedo
e1f97f7fb5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/llama-server.Dockerfile
#	README.md
#	flake.lock
#	ggml/src/ggml-vulkan.cpp
#	ggml/src/vulkan-shaders/concat.comp
#	ggml/src/vulkan-shaders/pad.comp
#	ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
2024-08-06 16:33:26 +08:00
0cc4m
a3738b2fa7 vulkan : implement Stable Diffusion operators (ggml/904)
* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op
2024-08-05 08:50:57 +03:00
Concedo
3a72410804 Added vulkan support for SD (+1 squashed commits)
Squashed commits:

[13f42f83] Added vulkan support for SD
2024-08-01 17:12:33 +08:00
Concedo
01afb28a63 not working 2024-07-28 11:43:10 +08:00
Concedo
ba5babb876 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/apps.nix
#	.devops/tools.sh
#	Makefile
#	README.md
#	docs/backend/SYCL.md
#	docs/build.md
#	examples/CMakeLists.txt
#	ggml/include/ggml.h
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-sampling.cpp
2024-07-27 23:15:54 +08:00
Tony Wasserka
203b7f1531 vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
2024-07-27 17:43:44 +03:00
slaren
2b1f616b20
ggml : reduce hash table reset cost (#8698)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Concedo
c81d1623b4 Merge commit '751fcfc6c3' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CONTRIBUTING.md
#	README.md
#	flake.lock
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
2024-07-23 19:18:05 +08:00
0cc4m
751fcfc6c3
Vulkan IQ4_NL Support (#8613)
* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
0cc4m
bda62d7999
Vulkan MMQ Fix (#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-07-15 09:38:52 +02:00
Concedo
7fb2ae3193 Merge remote-tracking branch 'lcpp/0cc4m/vulkan-mmq-fix' into concedo_experimental 2024-07-14 22:10:11 +08:00
Concedo
1c482b261d Revert "temp revert to functioning vk shaders"
This reverts commit 6abe450c82.
2024-07-14 22:06:42 +08:00
0cc4m
cc7a2f61c4 Fix Vulkan op result checker build error 2024-07-14 12:57:49 +02:00
Concedo
6abe450c82 temp revert to functioning vk shaders 2024-07-14 16:41:39 +08:00
Concedo
abf9531d08 merged but incoherent 2024-07-14 14:32:45 +08:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Renamed from ggml-vulkan.cpp (Browse further)