mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-20 17:54:31 +00:00
* [SYCL] Fix reorder MMVQ assert on unaligned vocab sizes The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K asserted that block_num_y was a multiple of 16 subgroups. Models with a vocab size not divisible by 16 (for example HY-MT at 120818) aborted on model load when the output projection tripped the assert. I replaced the assert with padding: block_num_y now rounds up to a whole number of subgroup-sized workgroups. The kernel already has the row bounds check (`if (row >= nrows) return;`) so the extra padded threads early-exit cleanly. Row values are uniform across a subgroup so the collective reduce stays safe. For aligned vocab sizes the padded block_num_y equals the old value, so the kernel launch is identical and there is no regression. Thanks to @arthw for flagging the relationship to #21527. Fixes #22020. AI assisted coding, tested on Intel B70 hardware. * sycl: use WARP_SIZE for num_subgroups in reorder MMVQ launches Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel target where WARP_SIZE is 16, but makes the relationship to subgroup size explicit. Per review by @NeoZhangJianyu on #22035. Assisted by Claude. |
||
|---|---|---|
| .. | ||
| ggml-blas | ||
| ggml-cann | ||
| ggml-cpu | ||
| ggml-cuda | ||
| ggml-hexagon | ||
| ggml-hip | ||
| ggml-metal | ||
| ggml-musa | ||
| ggml-opencl | ||
| ggml-openvino | ||
| ggml-rpc | ||
| ggml-sycl | ||
| ggml-virtgpu | ||
| ggml-vulkan | ||
| ggml-webgpu | ||
| ggml-zdnn | ||
| ggml-zendnn | ||
| CMakeLists.txt | ||
| ggml-alloc.c | ||
| ggml-backend-dl.cpp | ||
| ggml-backend-dl.h | ||
| ggml-backend-impl.h | ||
| ggml-backend-meta.cpp | ||
| ggml-backend-reg.cpp | ||
| ggml-backend.cpp | ||
| ggml-common.h | ||
| ggml-impl.h | ||
| ggml-opt.cpp | ||
| ggml-quants.c | ||
| ggml-quants.h | ||
| ggml-threading.cpp | ||
| ggml-threading.h | ||
| ggml.c | ||
| ggml.cpp | ||
| gguf.cpp | ||