koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-18 23:49:46 +00:00

History

Pascal 58e68df0f9 cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667 ) * cuda: fuse snake activation (mul, sin, sqr, mul, add) Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(ax)^2 inv_b and rewrites it to a single elementwise kernel. Add test_snake_fuse comparing CPU naive vs CUDA fused across F32 / F16 / BF16. * cuda: address review feedback from @am17an Use ggml_cuda_cast for F32/F16/BF16 conversions and rename kernel_snake to snake_kernel to match upstream conventions. * cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an * Update tests/test-backend-ops.cpp Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cuda: snake fusion check add->type matches x->type Address review feedback from @am17an * cuda: snake fusion check add->type matches x->type Moved for readability (equivalent) Address review feedback from @am17an --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>		2026-05-08 17:44:09 +08:00
..
cmake	ggml: backend-agnostic tensor parallelism (experimental) (#19378 )	2026-04-09 16:42:19 +02:00
include	CUDA: lower-case PCI bus id, standardize for ggml (#22820 )	2026-05-08 10:09:38 +02:00
src	cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667 )	2026-05-08 17:44:09 +08:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : bump version to 0.11.0 (ggml/1478)	2026-05-05 13:15:59 +03:00