mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-09 05:54:06 +00:00

Shaojun Liu 404ad39a04 docs: add Dockerfile.xpu and GPU driver setup instructions

- Add Dockerfile.xpu for oneAPI-based container
- Create Docker_xpu.md with usage instructions
- Update xpu.md to include Docker guide

2025-05-28 13:55:35 +08:00

2.3 KiB

Raw Permalink Blame History

Intel GPU Docker Guide (Beta)

Prerequisites

Docker must be installed and running on your system.
Create a folder to store big models & intermediate files (e.g., /mnt/models)
Before proceeding, ensure the Intel GPU driver is installed correctly on your host: Installation Guide

Building the Docker Image Locally

Clone the repository and navigate to the project directory:

git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers

Build the Docker image using the XPU-specific Dockerfile.xpu:

sudo http_proxy=$HTTP_PROXY \
     https_proxy=$HTTPS_PROXY \
     docker build \
       --build-arg http_proxy=$HTTP_PROXY \
       --build-arg https_proxy=$HTTPS_PROXY \
       -t kt_xpu:0.3.1 \
       -f Dockerfile.xpu \
       .

Running the Container

1. Start the container

sudo docker run -td --privileged \
    --net=host \
    --device=/dev/dri \
    --shm-size="16g" \
    -v /path/to/models:/models \
    -e http_proxy=$HTTP_PROXY \
    -e https_proxy=$HTTPS_PROXY \
    --name ktransformers_xpu \
    kt_xpu:0.3.1

Note: Replace /path/to/models with your actual model directory path (e.g., /mnt/models).

2. Access the container

sudo docker exec -it ktransformers_xpu /bin/bash

3. Set required XPU environment variables (inside the container)

export SYCL_CACHE_PERSISTENT=1
export ONEAPI_DEVICE_SELECTOR=level_zero:0
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

4. Run the sample script

python ktransformers/local_chat.py \
  --model_path deepseek-ai/DeepSeek-R1 \
  --gguf_path <path_to_gguf_files> \
  --optimize_config_path ktransformers/optimize/optimize_rules/xpu/DeepSeek-V3-Chat.yaml \
  --cpu_infer <cpu_cores + 1> \
  --device xpu \
  --max_new_tokens 200

Note:

Replace <path_to_gguf_files> with the path to your GGUF model files.
Replace <cpu_cores + 1> with the number of CPU cores you want to use plus one.

Additional Information

For more configuration options and usage details, refer to the project README. To run KTransformers natively on XPU (outside of Docker), please refer to xpu.md.

2.3 KiB Raw Permalink Blame History