# KTransformers Docker Packaging Guide This directory contains scripts for building and distributing KTransformers Docker images with standardized naming conventions. ## Overview The packaging system provides: - **Automated version detection** from sglang, ktransformers, and LLaMA-Factory - **Multi-CPU variant support** (AMX, AVX512, AVX2) with runtime auto-detection - **Standardized naming convention** for easy identification and management - **Two distribution methods**: - Local tar file export for offline distribution - DockerHub publishing for online distribution ## Naming Convention Docker images follow this naming pattern: ``` sglang-v{sglang版本}_ktransformers-v{ktransformers版本}_{cpu信息}_{gpu信息}_{功能模式}_{时间戳} ``` ### Example Names **Tar file:** ``` sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar ``` **DockerHub tags:** ``` Full tag: kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 Simplified tag: kvcache/ktransformers:v0.4.3-cu128 ``` ### Name Components | Component | Description | Example | |-----------|-------------|---------| | sglang version | SGLang package version | `v0.5.6` | | ktransformers version | KTransformers version | `v0.4.3` | | cpu info | CPU instruction set support | `x86-intel-multi` (includes AMX/AVX512/AVX2) | | gpu info | CUDA version | `cu128` (CUDA 12.8) | | functionality | Feature mode | `sft_llamafactory-v0.9.3` or `infer` | | timestamp | Build time (Beijing/UTC+8) | `20241212143022` | ## Files | File | Purpose | |------|---------| | `Dockerfile` | Main Dockerfile with multi-CPU build and version extraction | | `docker-utils.sh` | Shared utility functions for both scripts | | `build-docker-tar.sh` | Build and export Docker image to tar file | | `push-to-dockerhub.sh` | Build and push Docker image to DockerHub | ## Prerequisites - Docker installed and running - For DockerHub push: Docker Hub account and login (`docker login`) - Sufficient disk space (at least 20GB recommended) - Internet access (or local mirrors configured) ## Quick Start ### Build Local Tar File ```bash cd docker # Basic build ./build-docker-tar.sh # With specific CUDA version and mirror ./build-docker-tar.sh \ --cuda-version 12.8.1 \ --ubuntu-mirror 1 # With proxy ./build-docker-tar.sh \ --cuda-version 12.8.1 \ --ubuntu-mirror 1 \ --http-proxy "http://127.0.0.1:16981" \ --https-proxy "http://127.0.0.1:16981" \ --output-dir /path/to/output ``` ### Push to DockerHub ```bash cd docker # Basic push (requires --repository) ./push-to-dockerhub.sh \ --repository kvcache/ktransformers # With simplified tag ./push-to-dockerhub.sh \ --cuda-version 12.8.1 \ --repository kvcache/ktransformers \ --also-push-simplified # Skip build if image exists ./push-to-dockerhub.sh \ --repository kvcache/ktransformers \ --skip-build ``` ## Script Options ### build-docker-tar.sh ``` Build Configuration: --cuda-version VERSION CUDA version (default: 12.8.1) --ubuntu-mirror 0|1 Use Tsinghua mirror (default: 0) --http-proxy URL HTTP proxy URL --https-proxy URL HTTPS proxy URL --cpu-variant VARIANT CPU variant (default: x86-intel-multi) --functionality TYPE Mode: sft or infer (default: sft) Paths: --dockerfile PATH Path to Dockerfile (default: ./Dockerfile) --context-dir PATH Build context directory (default: .) --output-dir PATH Output directory for tar (default: .) Options: --dry-run Preview without building --keep-image Keep Docker image after export --build-arg KEY=VALUE Additional build arguments -h, --help Show help message ``` ### push-to-dockerhub.sh ``` All options from build-docker-tar.sh, plus: Registry Settings: --registry REGISTRY Docker registry (default: docker.io) --repository REPO Repository name (REQUIRED) Options: --skip-build Skip build if image exists --also-push-simplified Also push simplified tag --max-retries N Max push retries (default: 3) --retry-delay SECONDS Delay between retries (default: 5) ``` ## Usage Examples ### Example 1: Local Development Build For testing on your local machine: ```bash ./build-docker-tar.sh \ --cuda-version 12.8.1 \ --output-dir ./builds \ --keep-image ``` This will: 1. Build the Docker image 2. Export to tar in `./builds/` directory 3. Keep the Docker image for local testing ### Example 2: Production Build for Distribution For creating a production build with mirrors and proxy: ```bash ./build-docker-tar.sh \ --cuda-version 12.8.1 \ --ubuntu-mirror 1 \ --http-proxy "http://127.0.0.1:16981" \ --https-proxy "http://127.0.0.1:16981" \ --output-dir /mnt/data/releases ``` ### Example 3: Publish to DockerHub For publishing to DockerHub: ```bash # First, login to Docker Hub docker login # Then push ./push-to-dockerhub.sh \ --cuda-version 12.8.1 \ --repository kvcache/ktransformers \ --also-push-simplified ``` This creates two tags: - Full: `kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022` - Simplified: `kvcache/ktransformers:v0.4.3-cu128` ### Example 4: Dry Run Preview the build without actually building: ```bash ./build-docker-tar.sh --cuda-version 12.8.1 --dry-run ``` ### Example 5: Custom Build Arguments Pass additional Docker build arguments: ```bash ./build-docker-tar.sh \ --cuda-version 12.8.1 \ --build-arg SGL_VERSION=0.5.7 \ --build-arg FLASHINFER_VERSION=0.5.4 ``` ## Using the Built Images ### Load from Tar File ```bash # Load the image docker load -i sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022.tar # Run the container docker run -it --rm \ --gpus all \ sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 \ /bin/bash ``` ### Pull from DockerHub ```bash # Pull with full tag docker pull kvcache/ktransformers:sglang-v0.5.6_ktransformers-v0.4.3_x86-intel-multi_cu128_sft_llamafactory-v0.9.3_20241212143022 # Or pull with simplified tag docker pull kvcache/ktransformers:v0.4.3-cu128 # Run the container docker run -it --rm \ --gpus all \ kvcache/ktransformers:v0.4.3-cu128 \ /bin/bash ``` ### Inside the Container The image contains two conda environments: ```bash # Activate serve environment (for inference with sglang) conda activate serve # or use the alias: serve # Activate fine-tune environment (for training with LLaMA-Factory) conda activate fine-tune # or use the alias: finetune ``` ## Multi-CPU Variant Support The Docker image includes all three CPU variants: - **AMX** - For Intel Sapphire Rapids and newer (4th Gen Xeon+) - **AVX512** - For Intel Skylake-X, Ice Lake, Cascade Lake - **AVX2** - Maximum compatibility for older CPUs The runtime automatically detects your CPU and loads the appropriate variant. To override: ```bash # Force use of AVX2 variant export KT_KERNEL_CPU_VARIANT=avx2 python your_script.py # Enable debug output to see which variant is loaded export KT_KERNEL_DEBUG=1 python your_script.py ``` ## Version Extraction Versions are automatically extracted during Docker build from: - **SGLang**: From `sglang.__version__` in serve environment - **KTransformers**: From `version.py` in ktransformers repository - **LLaMA-Factory**: From `llamafactory.__version__` in fine-tune environment The versions are saved to `/workspace/versions.env` in the image: ```bash # View versions in running container cat /workspace/versions.env # Output: SGLANG_VERSION=0.5.6 KTRANSFORMERS_VERSION=0.4.3 LLAMAFACTORY_VERSION=0.9.3 ``` ## Troubleshooting ### Build Fails with Out of Disk Space Check available disk space: ```bash df -h ``` The build requires approximately 15-20GB of disk space. Clean up Docker: ```bash docker system prune -a ``` ### Version Extraction Fails If version extraction fails (shows "unknown"), check: 1. The cloned repositories have the correct branches 2. Python packages are properly installed in conda environments 3. Version files exist in expected locations You can manually verify by running: ```bash docker run --rm /bin/bash -c " source /opt/miniconda3/etc/profile.d/conda.sh && conda activate serve && python -c 'import sglang; print(sglang.__version__)' " ``` ### Push to DockerHub Fails 1. **Check login**: `docker login` 2. **Check repository name**: Must include namespace (e.g., `kvcache/ktransformers`, not just `ktransformers`) 3. **Network issues**: Use `--max-retries` and `--retry-delay` options 4. **Rate limiting**: DockerHub has pull/push rate limits for free accounts ## Advanced Topics ### Custom Dockerfile Location ```bash ./build-docker-tar.sh \ --dockerfile /path/to/custom/Dockerfile \ --context-dir /path/to/build/context ``` ### Building Only Inference Image (Future) Currently, the image always includes both serve and fine-tune environments. To create an inference-only image, modify the Dockerfile to skip the fine-tune environment section. ### Customizing CPU Variants To build only specific CPU variants, modify `kt-kernel/install.sh` or set environment variables in the Dockerfile. ### CI/CD Integration The scripts are designed for manual execution but can be integrated into CI/CD pipelines: ```yaml # Example GitHub Actions workflow - name: Build and push Docker image run: | cd docker ./push-to-dockerhub.sh \ --cuda-version ${{ matrix.cuda_version }} \ --repository ${{ secrets.DOCKER_REPOSITORY }} \ --also-push-simplified ``` ## Support For issues and questions: - File an issue at: https://github.com/kvcache-ai/ktransformers/issues - Check documentation: https://github.com/kvcache-ai/ktransformers ## License This packaging system is part of KTransformers and follows the same license.