kvcache-ai-ktransformers/docker/README-packaging.md
Jianwei Dong 1f79f6da92
Some checks failed
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Release Fake Tag / publish (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.10) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.11) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.12) (push) Has been cancelled
Release to PyPI / Publish to PyPI (push) Has been cancelled
[feat](kt-kernel): Add automatic deployment workflow (#1719)
2025-12-16 15:20:06 +08:00

387 lines
9.8 KiB
Markdown

# KTransformers Docker Packaging Guide
This directory contains scripts for building and distributing KTransformers Docker images with standardized naming conventions.
## Overview
The packaging system provides:
- **Automated version detection** from sglang, ktransformers, and LLaMA-Factory
- **Multi-CPU variant support** (AMX, AVX512, AVX2) with runtime auto-detection
- **Standardized naming convention** for easy identification and management
- **Two distribution methods**:
- Local tar file export for offline distribution
- DockerHub publishing for online distribution
## Naming Convention
Docker images follow this naming pattern:
```
sglang-v{sglang版本}_ktransformers-v{ktransformers版本}_{cpu信息}_{gpu信息}_{功能模式}_{时间戳}
```
### Example Names
**Tar file:**
```
sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
```
**DockerHub tags:**
```
Full tag:
kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
Simplified tag:
kvcache/ktransformers:v0.4.3-cu128
```
### Name Components
| Component | Description | Example |
|-----------|-------------|---------|
| sglang version | SGLang package version | `v0.5.6` |
| ktransformers version | KTransformers version | `v0.4.3` |
| cpu info | CPU instruction set support | `x86-intel-multi` (includes AMX/AVX512/AVX2) |
| gpu info | CUDA version | `cu128` (CUDA 12.8) |
| functionality | Feature mode | `sft_llamafactory-v0.9.3` or `infer` |
| timestamp | Build time (Beijing/UTC+8) | `20241212143022` |
## Files
| File | Purpose |
|------|---------|
| `Dockerfile` | Main Dockerfile with multi-CPU build and version extraction |
| `docker-utils.sh` | Shared utility functions for both scripts |
| `build-docker-tar.sh` | Build and export Docker image to tar file |
| `push-to-dockerhub.sh` | Build and push Docker image to DockerHub |
## Prerequisites
- Docker installed and running
- For DockerHub push: Docker Hub account and login (`docker login`)
- Sufficient disk space (at least 20GB recommended)
- Internet access (or local mirrors configured)
## Quick Start
### Build Local Tar File
```bash
cd docker
# Basic build
./build-docker-tar.sh
# With specific CUDA version and mirror
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1
# With proxy
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /path/to/output
```
### Push to DockerHub
```bash
cd docker
# Basic push (requires --repository)
./push-to-dockerhub.sh \
--repository kvcache/ktransformers
# With simplified tag
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
# Skip build if image exists
./push-to-dockerhub.sh \
--repository kvcache/ktransformers \
--skip-build
```
## Script Options
### build-docker-tar.sh
```
Build Configuration:
--cuda-version VERSION CUDA version (default: 12.8.1)
--ubuntu-mirror 0|1 Use Tsinghua mirror (default: 0)
--http-proxy URL HTTP proxy URL
--https-proxy URL HTTPS proxy URL
--cpu-variant VARIANT CPU variant (default: x86-intel-multi)
--functionality TYPE Mode: sft or infer (default: sft)
Paths:
--dockerfile PATH Path to Dockerfile (default: ./Dockerfile)
--context-dir PATH Build context directory (default: .)
--output-dir PATH Output directory for tar (default: .)
Options:
--dry-run Preview without building
--keep-image Keep Docker image after export
--build-arg KEY=VALUE Additional build arguments
-h, --help Show help message
```
### push-to-dockerhub.sh
```
All options from build-docker-tar.sh, plus:
Registry Settings:
--registry REGISTRY Docker registry (default: docker.io)
--repository REPO Repository name (REQUIRED)
Options:
--skip-build Skip build if image exists
--also-push-simplified Also push simplified tag
--max-retries N Max push retries (default: 3)
--retry-delay SECONDS Delay between retries (default: 5)
```
## Usage Examples
### Example 1: Local Development Build
For testing on your local machine:
```bash
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--output-dir ./builds \
--keep-image
```
This will:
1. Build the Docker image
2. Export to tar in `./builds/` directory
3. Keep the Docker image for local testing
### Example 2: Production Build for Distribution
For creating a production build with mirrors and proxy:
```bash
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--ubuntu-mirror 1 \
--http-proxy "http://127.0.0.1:16981" \
--https-proxy "http://127.0.0.1:16981" \
--output-dir /mnt/data/releases
```
### Example 3: Publish to DockerHub
For publishing to DockerHub:
```bash
# First, login to Docker Hub
docker login
# Then push
./push-to-dockerhub.sh \
--cuda-version 12.8.1 \
--repository kvcache/ktransformers \
--also-push-simplified
```
This creates two tags:
- Full: `kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022`
- Simplified: `kvcache/ktransformers:v0.4.3-cu128`
### Example 4: Dry Run
Preview the build without actually building:
```bash
./build-docker-tar.sh --cuda-version 12.8.1 --dry-run
```
### Example 5: Custom Build Arguments
Pass additional Docker build arguments:
```bash
./build-docker-tar.sh \
--cuda-version 12.8.1 \
--build-arg SGL_VERSION=0.5.7 \
--build-arg FLASHINFER_VERSION=0.5.4
```
## Using the Built Images
### Load from Tar File
```bash
# Load the image
docker load -i sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar
# Run the container
docker run -it --rm \
--gpus all \
sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 \
/bin/bash
```
### Pull from DockerHub
```bash
# Pull with full tag
docker pull kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022
# Or pull with simplified tag
docker pull kvcache/ktransformers:v0.4.3-cu128
# Run the container
docker run -it --rm \
--gpus all \
kvcache/ktransformers:v0.4.3-cu128 \
/bin/bash
```
### Inside the Container
The image contains two conda environments:
```bash
# Activate serve environment (for inference with sglang)
conda activate serve
# or use the alias:
serve
# Activate fine-tune environment (for training with LLaMA-Factory)
conda activate fine-tune
# or use the alias:
finetune
```
## Multi-CPU Variant Support
The Docker image includes all three CPU variants:
- **AMX** - For Intel Sapphire Rapids and newer (4th Gen Xeon+)
- **AVX512** - For Intel Skylake-X, Ice Lake, Cascade Lake
- **AVX2** - Maximum compatibility for older CPUs
The runtime automatically detects your CPU and loads the appropriate variant. To override:
```bash
# Force use of AVX2 variant
export KT_KERNEL_CPU_VARIANT=avx2
python your_script.py
# Enable debug output to see which variant is loaded
export KT_KERNEL_DEBUG=1
python your_script.py
```
## Version Extraction
Versions are automatically extracted during Docker build from:
- **SGLang**: From `sglang.__version__` in serve environment
- **KTransformers**: From `version.py` in ktransformers repository
- **LLaMA-Factory**: From `llamafactory.__version__` in fine-tune environment
The versions are saved to `/workspace/versions.env` in the image:
```bash
# View versions in running container
cat /workspace/versions.env
# Output:
SGLANG_VERSION=0.5.6
KTRANSFORMERS_VERSION=0.4.3
LLAMAFACTORY_VERSION=0.9.3
```
## Troubleshooting
### Build Fails with Out of Disk Space
Check available disk space:
```bash
df -h
```
The build requires approximately 15-20GB of disk space. Clean up Docker:
```bash
docker system prune -a
```
### Version Extraction Fails
If version extraction fails (shows "unknown"), check:
1. The cloned repositories have the correct branches
2. Python packages are properly installed in conda environments
3. Version files exist in expected locations
You can manually verify by running:
```bash
docker run --rm <image> /bin/bash -c "
source /opt/miniconda3/etc/profile.d/conda.sh &&
conda activate serve &&
python -c 'import sglang; print(sglang.__version__)'
"
```
### Push to DockerHub Fails
1. **Check login**: `docker login`
2. **Check repository name**: Must include namespace (e.g., `kvcache/ktransformers`, not just `ktransformers`)
3. **Network issues**: Use `--max-retries` and `--retry-delay` options
4. **Rate limiting**: DockerHub has pull/push rate limits for free accounts
## Advanced Topics
### Custom Dockerfile Location
```bash
./build-docker-tar.sh \
--dockerfile /path/to/custom/Dockerfile \
--context-dir /path/to/build/context
```
### Building Only Inference Image (Future)
Currently, the image always includes both serve and fine-tune environments. To create an inference-only image, modify the Dockerfile to skip the fine-tune environment section.
### Customizing CPU Variants
To build only specific CPU variants, modify `kt-kernel/install.sh` or set environment variables in the Dockerfile.
### CI/CD Integration
The scripts are designed for manual execution but can be integrated into CI/CD pipelines:
```yaml
# Example GitHub Actions workflow
- name: Build and push Docker image
run: |
cd docker
./push-to-dockerhub.sh \
--cuda-version ${{ matrix.cuda_version }} \
--repository ${{ secrets.DOCKER_REPOSITORY }} \
--also-push-simplified
```
## Support
For issues and questions:
- File an issue at: https://github.com/kvcache-ai/ktransformers/issues
- Check documentation: https://github.com/kvcache-ai/ktransformers
## License
This packaging system is part of KTransformers and follows the same license.