- Add Dockerfile.xpu for oneAPI-based container - Create Docker_xpu.md with usage instructions - Update xpu.md to include Docker guide
5 KiB
Intel GPU Support for KTransformers (Beta)
Introduction
Overview
We are excited to introduce Intel GPU support in KTransformers (Beta release). This implementation has been tested and developed using Intel Xeon Scalable processors and Intel Arc GPUs (such as A770 and B580).
Installation Guide
1. Install Intel GPU Driver
Begin by installing the GPU drivers for your Intel GPU:
To verify that the kernel and compute drivers are installed and functional:
clinfo --list | grep Device
`-- Device #0: 13th Gen Intel(R) Core(TM) i9-13900K
`-- Device #0: Intel(R) Arc(TM) A770 Graphics
`-- Device #0: Intel(R) UHD Graphics 770
Important
Ensure that Resizable BAR is enabled in your system's BIOS before proceeding. This is essential for optimal GPU performance and to avoid potential issues such as
Bus error (core dumped)
. For detailed steps, please refer to the official guidance here.
2. Set Up Conda Environment
We recommend using Miniconda3/Anaconda3 for environment management:
# Download Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Create environment
conda create --name ktransformers python=3.11
conda activate ktransformers
# Install required libraries
conda install -c conda-forge libstdcxx-ng
# Verify GLIBCXX version (should include 3.4.32)
strings ~/anaconda3/envs/ktransformers/lib/libstdc++.so.6 | grep GLIBCXX
Note: Adjust the Anaconda path if your installation directory differs from
~/anaconda3
3. Install PyTorch and IPEX-LLM
Install PyTorch with XPU backend support and IPEX-LLM:
pip install ipex-llm[xpu_2.6]==2.3.0b20250518 --extra-index-url https://download.pytorch.org/whl/xpu
pip uninstall torch torchvision torchaudio
pip install torch==2.7+xpu torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu # install torch2.7
pip uninstall intel-opencl-rt dpcpp-cpp-rt
4. Build ktransformers
# Clone repository
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init
# Install dependencies
bash install.sh --dev xpu
Running DeepSeek-R1 Models
Configuration for 16B VRAM GPUs
Use our optimized configuration for constrained VRAM:
export SYCL_CACHE_PERSISTENT=1
export ONEAPI_DEVICE_SELECTOR=level_zero:0
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
python ktransformers/local_chat.py \
--model_path deepseek-ai/DeepSeek-R1 \
--gguf_path <path_to_gguf_files> \
--optimize_config_path ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml \
--cpu_infer <cpu_cores + 1> \
--device xpu \
--max_new_tokens 200
Known Limitations
- Serving function is not supported on Intel GPU platform for now
Troubleshooting
- Best Known Config (BKC) to obtain best performance
To obtain best performance on Intel GPU platform, we recommend to lock GPU frequency and set CPU to performance mode by below settings.
echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | sudo tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias
# 2400 is max frequency for Arc A770
sudo xpu-smi config -d 0 -t 0 --frequencyrange 2400,2400
# 2850 is max frequency for Arc B580
# sudo xpu-smi config -d 0 -t 0 --frequencyrange 2850,2850
- Runtime error like
xpu/sycl/TensorCompareKernels.cpp:163: xxx. Aborted (core dumped)
This error is mostly related to GPU driver. If you meet such error, you could update your intel-level-zero-gpu
to 1.3.29735.27-914~22.04
(which is a verified version by us) by below command.
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
sudo apt update
# or sudo apt update --allow-insecure-repositories
sudo apt install intel-level-zero-gpu=1.3.29735.27-914~22.04
ImportError: cannot import name 'intel' from 'triton._C.libtriton'
Installing Triton causes pytorch-triton-xpu to stop working. You can resolve the issue with following command:
pip uninstall triton pytorch-triton-xpu
# Reinstall correct version of pytorch-triton-xpu
pip install pytorch-triton-xpu==3.3.0 --index-url https://download.pytorch.org/whl/xpu
ValueError: Unsupported backend: CUDA_HOME ROCM_HOME MUSA_HOME are not set and XPU is not available.
Ensure you have permissions to access /dev/dri/renderD*. This typically requires your user to be in the render group:
sudo gpasswd -a ${USER} render
newgrp render
Additional Information
To run KTransformers on XPU with Docker, please refer to Docker_xpu.md.