diff --git a/README.md b/README.md index a0c14b9d7..71327e514 100644 --- a/README.md +++ b/README.md @@ -280,7 +280,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo | [Metal](docs/build.md#metal-build) | Apple Silicon | | [BLAS](docs/build.md#blas-build) | All | | [BLIS](docs/backend/BLIS.md) | All | -| [SYCL](docs/backend/SYCL.md) | Intel and Nvidia GPU | +| [SYCL](docs/backend/SYCL.md) | Intel GPU | | [OpenVINO [In Progress]](docs/backend/OPENVINO.md) | Intel CPUs, GPUs, and NPUs | | [MUSA](docs/build.md#musa) | Moore Threads GPU | | [CUDA](docs/build.md#cuda) | Nvidia GPU | diff --git a/docs/backend/SYCL.md b/docs/backend/SYCL.md index 155f933b8..0c4660b54 100644 --- a/docs/backend/SYCL.md +++ b/docs/backend/SYCL.md @@ -5,6 +5,7 @@ - [News](#news) - [OS](#os) - [Hardware](#hardware) +- [Performance Reference](#performance-reference) - [Docker](#docker) - [Linux](#linux) - [Windows](#windows) @@ -51,9 +52,8 @@ The packages for FP32 and FP16 would have different accuracy and performance on ## News -- 2026.04 - - - Optimize mul_mat by reorder feature for data type: Q4_K, Q5_K, Q_K, Q8_0. +- 2026.04-05 + - Optimize mul_mat by reorder feature for data type: Q4_K, Q5_K, Q6_K, Q8_0. - Fused MoE. - Upgrate CI and built package for oneAPI 2025.3.3, support Ubuntu 24.04 built package. @@ -150,6 +150,13 @@ On older Intel GPUs, you may try [OpenCL](/docs/backend/OPENCL.md) although the NA +## Performance Reference + + +To get the supported LLMs, GPUs, and performance reference, please check [Performance of llama.cpp on Intel GPU with SYCL backend](https://github.com/ggml-org/llama.cpp/discussions/23313). + +You could update your test result in it directly. + ## Docker The docker build option is currently limited to *Intel GPU* targets.