mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-10 04:00:53 +00:00
The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs well. I think even for larger m, f32 is so bandwidth-limited that running multiple rows doesn't help. |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||