mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-22 11:16:08 +00:00
* vulkan: use coopmat for flash attention p*v matrix multiplication * fix P loading issue * fix barrier position * remove reduction that is no longer needed * move max thread reduction into loop * remove osh padding * add bounds checks and padding * remove unused code * fix shmem sizes, loop duration and accesses * don't overwrite Qf, add new shared psh buffer instead * add missing bounds checks * use subgroup reductions * optimize * move bounds check, reduce barriers * support other Bc values and other subgroup sizes * remove D_split * replace Of register array with shared memory Ofsh array * parallelize HSV across the rowgroups * go back to Of in registers, not shmem * vectorize sfsh * don't store entire K tile in shmem * fixes * load large k tiles to shmem on Nvidia * adapt shared memory host check function to shader changes * remove Bc 32 case * remove unused variable * fix missing mask reduction tmspsh barrier * fix mask bounds check * fix rowmax f16 under/overflow to inf * fix flash_attn_cm2 BLOCK_SIZE preprocessor directives |
||
|---|---|---|
| .. | ||
| ggml-blas | ||
| ggml-cann | ||
| ggml-cpu | ||
| ggml-cuda | ||
| ggml-hexagon | ||
| ggml-hip | ||
| ggml-metal | ||
| ggml-musa | ||
| ggml-opencl | ||
| ggml-rpc | ||
| ggml-sycl | ||
| ggml-virtgpu | ||
| ggml-vulkan | ||
| ggml-webgpu | ||
| ggml-zdnn | ||
| ggml-zendnn | ||
| CMakeLists.txt | ||
| ggml-alloc.c | ||
| ggml-backend-impl.h | ||
| ggml-backend-reg.cpp | ||
| ggml-backend.cpp | ||
| ggml-common.h | ||
| ggml-impl.h | ||
| ggml-opt.cpp | ||
| ggml-quants.c | ||
| ggml-quants.h | ||
| ggml-threading.cpp | ||
| ggml-threading.h | ||
| ggml.c | ||
| ggml.cpp | ||
| gguf.cpp | ||