this should finally work (+21 squashed commit)
Squashed commit:
[5edac5b59] Revert "quick dbg"
This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.
[fd62a997c] quick dbg
[bcccae7e6] sanity check 2
[568e2eb08] sanity check
[2f30d573a] please work 2
[cf8765221] please work
[c535e60d9] try a small trick
[d4ba79b80] 2022 test
[3f146b000] t2
[4a3b9a9b4] revert and test
[4bdc9a149] reverted test2
[5081cb4a3] reverted test
[ea9a826f3] broken test
[3c11ae389] compare 2019
[8ecec4fec] not for cu12
[0be964f3a] added vs2019 for the other runners
[5d24641cb] debugging 4
[1dee79207] debugging 3
[ab172f133] more debugging 2
[b1a895e84] more debugging
[5d21d8bd0] vs2019 setup
Squashed commits:
[3c5112398] 117 (+10 squashed commit)
Squashed commit:
[4f01bb2d4] 117 graphs 80v
[7549034ea] 117 graphs
[dabf9cb99] checking if cuda 11.5.2 works
[ba7ccdb7a] another try cu11.7 only
[752cf2ae5] increase aria2c download log rate
[dc4f198fd] test send turing to wmma flash attention
[496a22e83] temp build test cu11.7.0
[ca759c424] temp build test cu11.7
[c46ada17c] test build: enable virtual80 for oldcpu
[3ccfd939a] test build: with cuda graphs for all
* YR makefile upstream
* Create make_portable_rocm_libs.sh
* update makefile, support llama portable, ditch all unnecessary changes
* Delete make_portable_rocm_libs.sh should not be needed
* koboldcpp.sh updates
* Small rocm fixes
* ROCm is now a cuda version not a command
* Don't commit temp file
* Don't commit temp file
* 1200 has errors, removing it for now
* Only rebuild rocm with rebuild
* Update kcpp-build-release-linux.yaml
* Fix rocm filename
* ROCm Linux CI
* We need more diskspace
* Workaround for lockfile getting stuck
Why do I have to do hacks like this....
* Update kcpp-build-release-linux-rocm.yaml
* Dont apt update rocm
You don't allow us to apt update? Better not break things github!
* Container maybe?
* Turns out we aren't root, so we use sudo
* Cleanup ROCm CI PR
* Build for Runpods GPU
* We also need rocblas
* More cleanup just in case
* Update kcpp-build-release-linux-rocm.yaml
---------
Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* vulkan: scalar flash attention implementation
* vulkan: always use fp32 for scalar flash attention
* vulkan: use vector loads in scalar flash attention shader
* vulkan: remove PV matrix, helps with register usage
* vulkan: reduce register usage in scalar FA, but perf may be slightly worse
* vulkan: load each Q value once. optimize O reduction. more tuning
* vulkan: support q4_0/q8_0 KV in scalar FA
* CI: increase timeout to accommodate newly-supported tests
* vulkan: for scalar FA, select between 1 and 8 rows
* vulkan: avoid using Float16 capability in scalar FA
* [CANN] Support ELU and CONV_TRANSPOSE_1D
* [CANN]Modification review comments
* [CANN]Modification review comments
* [CANN]name adjustment
* [CANN]remove lambda used in template
* [CANN]Use std::func instead of template
* [CANN]Modify the code according to the review comments
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com>