Commit graph

420 commits

Author SHA1 Message Date
Concedo
59300dbdf5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/actions/windows-setup-curl/action.yml
#	.github/workflows/build-linux-cross.yml
#	README.md
#	common/CMakeLists.txt
#	examples/parallel/README.md
#	examples/parallel/parallel.cpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	tools/server/README.md
2025-05-18 23:27:53 +08:00
Concedo
be3e93c76a bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme (+1 squashed commits)
Squashed commits:

[61c152daf] bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme
2025-05-18 02:21:27 +08:00
Diego Devesa
415e40a357
releases : use arm version of curl for arm releases (#13592) 2025-05-16 19:36:51 +02:00
Sigbjørn Skjæret
7c07ac244d
ci : add ppc64el to build-linux-cross (#13575) 2025-05-16 14:54:23 +02:00
Thammachart Chinvarapon
b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G

https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245
2025-05-10 16:34:48 +02:00
Jeff Bolz
dc1d2adfc0
vulkan: scalar flash attention implementation (#13324)
* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA
2025-05-10 08:07:07 +02:00
Concedo
2f5f4ee65a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes (#13392)
* ci : limit write permission to only the release step

* fix win cuda file name

* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Concedo
2439014a03 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	examples/embedding/embedding.cpp
#	tools/imatrix/imatrix.cpp
#	tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file (#13362) 2025-05-08 13:15:28 +02:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images (#13356) 2025-05-07 16:36:33 +02:00
Concedo
b951310ca5 tryout smaller binaries 2025-05-07 14:56:34 +08:00
Diego Devesa
9f2da5871f
llama : build windows releases with dl backends (#13220) 2025-05-04 14:20:49 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Concedo
bc452da452 improved comfyui compatibility, tweaked hf search 2025-05-02 16:18:31 +08:00
bandoti
d24d592808
ci: fix cross-compile sync issues (#12804) 2025-05-01 19:06:39 -03:00
bandoti
00137157fc
Disable CI cross-compile builds (#13022) 2025-04-19 18:05:03 +02:00
Concedo
4b0f63ed62 cleanup 2025-04-18 22:57:10 +08:00
hipudding
54a7272043
CANN: Add x86 build ci (#12950)
* CANN: Add x86 build ci

* CANN: fix code format
2025-04-15 12:08:55 +01:00
Concedo
c94aec1930 update workflows, update gemma default adapter sysprompt 2025-04-12 18:38:23 +08:00
Concedo
b42fa821d8 try allow build from commit hash 2025-04-12 13:37:10 +08:00
Concedo
7a7bdeab6d json to gbnf endpoint added 2025-04-12 11:41:11 +08:00
R0CKSTAR
8ac9f5d765
ci : Replace freediskspace to free_disk_space in docker.yml (#12861)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-11 09:26:17 +02:00
R0CKSTAR
d9a63b2f2e
musa: enable freediskspace for docker image build (#12839)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-09 11:22:30 +02:00
Chenguang Li
6e1c4cebdb
CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786)
* [CANN] Support ELU and CONV_TRANSPOSE_1D

* [CANN]Modification review comments

* [CANN]Modification review comments

* [CANN]name adjustment

* [CANN]remove lambda used in template

* [CANN]Use std::func instead of template

* [CANN]Modify the code according to the review comments

---------

Signed-off-by: noemotiovon <noemotiovon@gmail.com>
2025-04-09 14:04:14 +08:00
Concedo
b99ee451f8 Merge commit '4ccea213bc' into concedo_experimental
# Conflicts:
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	build-xcframework.sh
#	ci/run.sh
#	common/CMakeLists.txt
#	examples/llama.android/llama/build.gradle.kts
#	examples/perplexity/perplexity.cpp
#	examples/run/CMakeLists.txt
#	examples/server/tests/README.md
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/ggml-cpu.c
#	licenses/LICENSE-linenoise
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
822cf2430e Merge commit 'f1e3eb4249' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	docs/backend/SYCL.md
#	examples/llava/clip.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
Concedo
5edbacdd0e fix tools (+3 squashed commit)
Squashed commit:

[95a489ee] fix tools build

[1d3d3451] add accelerate

[2837705c] edit a line
2025-04-06 21:30:48 +08:00
Concedo
8415cac7ac add vk shaders source (+1 squashed commits)
Squashed commits:

[45359f49] add vk shaders source
2025-04-05 22:45:18 +08:00
Concedo
34ddd874fe try containerized ci (+3 squashed commit)
Squashed commit:

[f0600744] troubleshooting

[fe11073c] cap auto threads at 32 due to diminishing returns

[0c7f8a1d] troubleshooting
2025-04-05 01:51:03 +08:00
bandoti
1be76e4620
ci: add Linux cross-compile build (#12428) 2025-04-04 14:05:12 -03:00
Concedo
57e12b73af try containerized ci (+1 squashed commits)
Squashed commits:

[fc53c200] try containerized ci (+1 squashed commits)

Squashed commits:

[4b48b0d5] try containerized ci
2025-04-04 17:19:27 +08:00
0cc4m
a8a1f33567
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)
* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version
2025-03-31 14:37:01 +02:00
Concedo
143b611274 updated workflows 2025-03-19 21:56:35 +08:00
Guus Waals
0fd8487b14
Fix visionOS build and add CI (#12415)
* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
2025-03-19 11:15:23 +01:00
Daniel Bevenius
7b61bcc87c
ci : add --symlinks to xcframework zip command (#12409)
This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option,  the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```

Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377
2025-03-16 18:22:05 +01:00
Concedo
bdf2977372 fixed windows ci 2025-03-13 20:45:16 +08:00
Concedo
2c9ade61fe test automatic vk shader rebuilding 2025-03-13 19:34:15 +08:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) 2025-03-12 20:06:58 +01:00
David Huang
f1648e91cf
HIP: fix rocWMMA build flags under Windows (#12230) 2025-03-07 08:06:08 +01:00
David Huang
3ffbbd5ce1
HIP: rocWMMA documentation and enabling in workflow builds (#12179)
* Enable rocWMMA for Windows CI build

* Enable for Ubuntu

* GGML_HIP_ROCWMMA_FATTN documentation work
2025-03-06 14:14:11 +01:00
Daniel Bevenius
074c4fd39d
ci : add fetch-depth to xcframework upload (#12195)
This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.
2025-03-05 14:16:40 +01:00
Daniel Bevenius
fa31c438e0
ci : fix xcframework artifact tag (#12191)
The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.

The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.
2025-03-05 10:22:29 +01:00
Daniel Bevenius
3ccbfe5a71
ci : remove xframework upload (#12190)
* ci : remove xframework upload

This commit removes the upload of the xframework zip file as an
artifact.

The motivation for this change is that the xframework zip file is
currently being uploaded as part of strategy and will therefore be
attempted to be uploaded multiple times and will fail the build.

The uploading should be moved to somewhere else in the build to avoid
this.

* ci : add xcframework upload to macos-latest job
2025-03-05 08:34:02 +01:00
Daniel Bevenius
a057897ad4
llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
Daniel Bevenius
2679c3b55d
ci : set GITHUB_ACTION env var for server tests (#12162)
This commit tries to address/improve an issue with the server tests
which are failing with a timeout. Looking at the logs it seems like
they are timing out after 12 seconds:
```
FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds
```

This is somewhat strange as in utils.py we have the following values:
```python
DEFAULT_HTTP_TIMEOUT = 12

if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ:
    DEFAULT_HTTP_TIMEOUT = 30

    def start(self, timeout_seconds: int | None = DEFAULT_HTTP_TIMEOUT) -> None:
```
It should be the case that a test running in a github action should have
a timeout of 30 seconds. However, it seems like this is not the case.
Inspecting the logs from the CI job we can see the following environment
variables:
```console
Run cd examples/server/tests
2 cd examples/server/tests
3 ./tests.sh
4 shell: /usr/bin/bash -e {0}
5 env:
6 LLAMA_LOG_COLORS: 1
7 LLAMA_LOG_PREFIX: 1
8 LLAMA_LOG_TIMESTAMPS: 1
9 LLAMA_LOG_VERBOSITY: 10
10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64
```

This probably does not address the underlying issue that the servers
that are providing the models to be downloaded occasionally take a
longer time to response but might improve these situations in some
cases.
2025-03-03 16:17:36 +01:00
Georgi Gerganov
f3e64859ed
ci : fix arm upload artifacts (#12024)
* ci : fix arm upload artifacts

* cont : fix archive name to use matrix
2025-02-22 15:03:00 +02:00
Rohanjames1997
335eb04a91
ci : Build on Github-hosted arm64 runners (#12009) 2025-02-22 11:48:57 +01:00
Eve
f7b1116af1
update release requirements (#11897) 2025-02-17 12:20:23 +01:00