Commit graph

399 commits

Author SHA1 Message Date
Concedo
50a27793d3 upgrade windows runners to windows 2022, cu11 still uses vs2019
this should finally work (+21 squashed commit)

Squashed commit:

[5edac5b59] Revert "quick dbg"

This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.

[fd62a997c] quick dbg

[bcccae7e6] sanity check 2

[568e2eb08] sanity check

[2f30d573a] please work 2

[cf8765221] please work

[c535e60d9] try a small trick

[d4ba79b80] 2022 test

[3f146b000] t2

[4a3b9a9b4] revert and test

[4bdc9a149] reverted test2

[5081cb4a3] reverted test

[ea9a826f3] broken test

[3c11ae389] compare 2019

[8ecec4fec] not for cu12

[0be964f3a] added vs2019 for the other runners

[5d24641cb] debugging 4

[1dee79207] debugging 3

[ab172f133] more debugging 2

[b1a895e84] more debugging

[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Concedo
a341188f84 add install for vs2019 2025-06-05 10:32:57 +08:00
Concedo
a74d8669b3 try hardcoded path (+1 squashed commits)
Squashed commits:

[711b43d9d] let's see if VS2019 can work
2025-06-05 10:26:02 +08:00
Concedo
f3bb947a13 cuda use wmma flash attention for turing (+1 squashed commits)
Squashed commits:

[3c5112398] 117 (+10 squashed commit)

Squashed commit:

[4f01bb2d4] 117 graphs 80v

[7549034ea] 117 graphs

[dabf9cb99] checking if cuda 11.5.2 works

[ba7ccdb7a] another try cu11.7 only

[752cf2ae5] increase aria2c download log rate

[dc4f198fd] test send turing to wmma flash attention

[496a22e83] temp build test cu11.7.0

[ca759c424] temp build test cu11.7

[c46ada17c] test build: enable virtual80 for oldcpu

[3ccfd939a] test build: with cuda graphs for all
2025-06-01 11:41:45 +08:00
henk717
b8883e254a
KoboldCpp.sh updates (#1562)
* YR makefile upstream

* Create make_portable_rocm_libs.sh

* update makefile, support llama portable, ditch all unnecessary changes

* Delete make_portable_rocm_libs.sh should not be needed

* koboldcpp.sh updates

* Small rocm fixes

* ROCm is now a cuda version not a command

* Don't commit temp file

* Don't commit temp file

* 1200 has errors, removing it for now

* Only rebuild rocm with rebuild

* Update kcpp-build-release-linux.yaml

* Fix rocm filename

* ROCm Linux CI

* We need more diskspace

* Workaround for lockfile getting stuck

Why do I have to do hacks like this....

* Update kcpp-build-release-linux-rocm.yaml

* Dont apt update rocm

You don't allow us to apt update? Better not break things github!

* Container maybe?

* Turns out we aren't root, so we use sudo

* Cleanup ROCm CI PR

* Build for Runpods GPU

* We also need rocblas

* More cleanup just in case

* Update kcpp-build-release-linux-rocm.yaml

---------

Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>
2025-05-26 15:24:49 +08:00
Concedo
0dca953d78 removed winget workflow 2025-05-24 16:40:39 +08:00
Concedo
55cc9acec5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	README.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/mtmd/clip.h
2025-05-24 12:10:36 +08:00
Diego Devesa
b775345d78
ci : enable winget package updates (#13734) 2025-05-23 23:14:00 +03:00
Diego Devesa
a70a8a69c2
ci : add winget package updater (#13732) 2025-05-23 22:09:38 +02:00
Diego Devesa
3079e9ac8e
release : fix windows hip release (#13707)
* release : fix windows hip release

* make single hip release with multiple targets
2025-05-23 00:21:37 +02:00
Concedo
fdca5ba71e declutter 2025-05-22 22:58:47 +08:00
Concedo
8bd6f9f9ae added a simple cross platform launch script for unpacked dirs 2025-05-22 22:09:46 +08:00
Diego Devesa
d643bb2c79
releases : build CPU backend separately (windows) (#13642) 2025-05-21 22:09:57 +02:00
Concedo
d04b4eeb04 merge not working 2025-05-21 18:06:41 +08:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647)
* musa: fix build warning (unused parameter)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: upgrade MUSA SDK version to rc4.0.1

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Update ggml/src/ggml-cuda/cpy.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-05-21 09:58:49 +08:00
Alberto Cabrera Pérez
f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532) 2025-05-19 11:46:09 +01:00
Concedo
59300dbdf5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/actions/windows-setup-curl/action.yml
#	.github/workflows/build-linux-cross.yml
#	README.md
#	common/CMakeLists.txt
#	examples/parallel/README.md
#	examples/parallel/parallel.cpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	tools/server/README.md
2025-05-18 23:27:53 +08:00
Concedo
be3e93c76a bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme (+1 squashed commits)
Squashed commits:

[61c152daf] bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme
2025-05-18 02:21:27 +08:00
Diego Devesa
415e40a357
releases : use arm version of curl for arm releases (#13592) 2025-05-16 19:36:51 +02:00
Sigbjørn Skjæret
7c07ac244d
ci : add ppc64el to build-linux-cross (#13575) 2025-05-16 14:54:23 +02:00
Thammachart Chinvarapon
b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G

https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245
2025-05-10 16:34:48 +02:00
Jeff Bolz
dc1d2adfc0
vulkan: scalar flash attention implementation (#13324)
* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA
2025-05-10 08:07:07 +02:00
Concedo
2f5f4ee65a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes (#13392)
* ci : limit write permission to only the release step

* fix win cuda file name

* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Concedo
2439014a03 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	examples/embedding/embedding.cpp
#	tools/imatrix/imatrix.cpp
#	tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file (#13362) 2025-05-08 13:15:28 +02:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images (#13356) 2025-05-07 16:36:33 +02:00
Concedo
b951310ca5 tryout smaller binaries 2025-05-07 14:56:34 +08:00
Diego Devesa
9f2da5871f
llama : build windows releases with dl backends (#13220) 2025-05-04 14:20:49 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Concedo
bc452da452 improved comfyui compatibility, tweaked hf search 2025-05-02 16:18:31 +08:00
bandoti
d24d592808
ci: fix cross-compile sync issues (#12804) 2025-05-01 19:06:39 -03:00
bandoti
00137157fc
Disable CI cross-compile builds (#13022) 2025-04-19 18:05:03 +02:00
hipudding
54a7272043
CANN: Add x86 build ci (#12950)
* CANN: Add x86 build ci

* CANN: fix code format
2025-04-15 12:08:55 +01:00
Concedo
c94aec1930 update workflows, update gemma default adapter sysprompt 2025-04-12 18:38:23 +08:00
Concedo
b42fa821d8 try allow build from commit hash 2025-04-12 13:37:10 +08:00
Concedo
7a7bdeab6d json to gbnf endpoint added 2025-04-12 11:41:11 +08:00
R0CKSTAR
8ac9f5d765
ci : Replace freediskspace to free_disk_space in docker.yml (#12861)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-11 09:26:17 +02:00
R0CKSTAR
d9a63b2f2e
musa: enable freediskspace for docker image build (#12839)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-09 11:22:30 +02:00
Chenguang Li
6e1c4cebdb
CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786)
* [CANN] Support ELU and CONV_TRANSPOSE_1D

* [CANN]Modification review comments

* [CANN]Modification review comments

* [CANN]name adjustment

* [CANN]remove lambda used in template

* [CANN]Use std::func instead of template

* [CANN]Modify the code according to the review comments

---------

Signed-off-by: noemotiovon <noemotiovon@gmail.com>
2025-04-09 14:04:14 +08:00
Concedo
b99ee451f8 Merge commit '4ccea213bc' into concedo_experimental
# Conflicts:
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	build-xcframework.sh
#	ci/run.sh
#	common/CMakeLists.txt
#	examples/llama.android/llama/build.gradle.kts
#	examples/perplexity/perplexity.cpp
#	examples/run/CMakeLists.txt
#	examples/server/tests/README.md
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/ggml-cpu.c
#	licenses/LICENSE-linenoise
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
822cf2430e Merge commit 'f1e3eb4249' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	docs/backend/SYCL.md
#	examples/llava/clip.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
Concedo
5edbacdd0e fix tools (+3 squashed commit)
Squashed commit:

[95a489ee] fix tools build

[1d3d3451] add accelerate

[2837705c] edit a line
2025-04-06 21:30:48 +08:00
Concedo
8415cac7ac add vk shaders source (+1 squashed commits)
Squashed commits:

[45359f49] add vk shaders source
2025-04-05 22:45:18 +08:00
Concedo
34ddd874fe try containerized ci (+3 squashed commit)
Squashed commit:

[f0600744] troubleshooting

[fe11073c] cap auto threads at 32 due to diminishing returns

[0c7f8a1d] troubleshooting
2025-04-05 01:51:03 +08:00
bandoti
1be76e4620
ci: add Linux cross-compile build (#12428) 2025-04-04 14:05:12 -03:00
Concedo
57e12b73af try containerized ci (+1 squashed commits)
Squashed commits:

[fc53c200] try containerized ci (+1 squashed commits)

Squashed commits:

[4b48b0d5] try containerized ci
2025-04-04 17:19:27 +08:00
0cc4m
a8a1f33567
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)
* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version
2025-03-31 14:37:01 +02:00
Concedo
143b611274 updated workflows 2025-03-19 21:56:35 +08:00