mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-10 04:00:53 +00:00
Disabling OpenMP generally provides better inference performance (at
least in my testing) but the loading becomes slightly slower.
Benchmark results for `convert_B_packed_format()`:
Before this commit:
N K | No OpenMP OpenMP | Diff | Speedup
------------------------------------------------------------
512 2880 | 640.9us 263.5us | -58.9% | 0.41x
2880 4096 | 2.55ms 261.7us | -89.8% | 0.10x
201088 2880 | 256.44ms 21.61ms | -91.6% | 0.08x
------------------------------------------------------------
Total: 325.43ms vs 31.05ms
After:
N K | No OpenMP OpenMP | Diff | Speedup
------------------------------------------------------------
512 2880 | 1.49ms 263.5us | -82.3% | 0.18x
2880 4096 | 1.55ms 261.7us | -83.1% | 0.17x
201088 2880 | 24.03ms 21.61ms | -10.1% | 0.90x
------------------------------------------------------------
Total: 78.97ms vs 31.05ms
Tested with unsloth/gpt-oss-20b-GGUF:Q4_K_M.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
||
|---|---|---|
| .. | ||
| ggml-blas | ||
| ggml-cann | ||
| ggml-cpu | ||
| ggml-cuda | ||
| ggml-hexagon | ||
| ggml-hip | ||
| ggml-metal | ||
| ggml-musa | ||
| ggml-opencl | ||
| ggml-rpc | ||
| ggml-sycl | ||
| ggml-virtgpu | ||
| ggml-vulkan | ||
| ggml-webgpu | ||
| ggml-zdnn | ||
| ggml-zendnn | ||
| CMakeLists.txt | ||
| ggml-alloc.c | ||
| ggml-backend-dl.cpp | ||
| ggml-backend-dl.h | ||
| ggml-backend-impl.h | ||
| ggml-backend-reg.cpp | ||
| ggml-backend.cpp | ||
| ggml-common.h | ||
| ggml-impl.h | ||
| ggml-opt.cpp | ||
| ggml-quants.c | ||
| ggml-quants.h | ||
| ggml-threading.cpp | ||
| ggml-threading.h | ||
| ggml.c | ||
| ggml.cpp | ||
| gguf.cpp | ||