mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-06 08:01:27 +00:00
* ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q8_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_1_q8_1 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: update unit tests for the new vec_dot interface * llama.cpp: add MATMUL_INT8 capability to system_info |
||
|---|---|---|
| .. | ||
| .gitignore | ||
| CMakeLists.txt | ||
| get-model.cpp | ||
| get-model.h | ||
| test-autorelease.cpp | ||
| test-backend-ops.cpp | ||
| test-c.c | ||
| test-double-float.cpp | ||
| test-grad0.cpp | ||
| test-grammar-parser.cpp | ||
| test-llama-grammar.cpp | ||
| test-model-load-cancel.cpp | ||
| test-opt.cpp | ||
| test-quantize-fns.cpp | ||
| test-quantize-perf.cpp | ||
| test-rope.cpp | ||
| test-sampling.cpp | ||
| test-tokenizer-0-falcon.cpp | ||
| test-tokenizer-0-falcon.py | ||
| test-tokenizer-0-llama.cpp | ||
| test-tokenizer-0-llama.py | ||
| test-tokenizer-1-bpe.cpp | ||
| test-tokenizer-1-llama.cpp | ||