cuda use wmma flash attention for turing (+1 squashed commits)

Squashed commits:

[3c5112398] 117 (+10 squashed commit)

Squashed commit:

[4f01bb2d4] 117 graphs 80v

[7549034ea] 117 graphs

[dabf9cb99] checking if cuda 11.5.2 works

[ba7ccdb7a] another try cu11.7 only

[752cf2ae5] increase aria2c download log rate

[dc4f198fd] test send turing to wmma flash attention

[496a22e83] temp build test cu11.7.0

[ca759c424] temp build test cu11.7

[c46ada17c] test build: enable virtual80 for oldcpu

[3ccfd939a] test build: with cuda graphs for all
This commit is contained in:
Concedo 2025-05-31 18:08:50 +08:00
parent b08dca65ed
commit f3bb947a13
5 changed files with 5 additions and 6 deletions

View file

@ -57,7 +57,6 @@ jobs:
id: make_build
run: |
make LLAMA_CLBLAST=1 LLAMA_VULKAN=1 LLAMA_PORTABLE=1 -j ${env:NUMBER_OF_PROCESSORS}
echo "Vulkan Shaders Rebuilt"
- uses: Jimver/cuda-toolkit@v0.2.15
id: cuda-toolkit