[feature] add support for building docker image

2025-09-05 20:19:51 +00:00 · 2024-08-01 04:01:00 +00:00 · 2024-08-01 04:01:00 +00:00 · 86ba1336a9
commit 86ba1336a9
parent 112cb3c962
4 changed files with 84 additions and 17 deletions
--- a/34
+++ b/34
@ -0,0 +1,34 @@
+FROM node:20.16.0 as web_compile
+WORKDIR /home
+RUN <<EOF
+git clone https://github.com/kvcache-ai/ktransformers.git &&
+cd ktransformers/ktransformers/website/ &&
+npm install @vue/cli &&
+npm run build &&
+rm -rf node_modules
+EOF
+
+
+
+FROM pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel as compile_server
+WORKDIR /workspace
+COPY --from=web_compile /home/ktransformers /workspace/ktransformers
+RUN <<EOF
+apt update -y &&  apt install -y  --no-install-recommends \
+    git \
+    wget \
+    vim \
+    gcc \
+    g++ \
+    cmake && 
+rm -rf /var/lib/apt/lists/* &&
+cd ktransformers &&
+git submodule init &&
+git submodule update &&
+pip install ninja pyproject numpy &&
+pip install flash-attn &&
+CPU_INSTRUCT=NATIVE  KTRANSFORMERS_FORCE_BUILD=TRUE TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9" pip install . --no-build-isolation --verbose &&
+pip cache purge
+EOF
+
+ENTRYPOINT [ "/opt/conda/bin/ktransformers" ]
--- a/README.md
+++ b/README.md
@ -80,30 +80,32 @@ Some preparation:
  ```

 <h3>Installation</h3>
-You can install using Pypi:

-```
-pip install ktransformers --no-build-isolation
-```
+1. Use a Docker image, see [documentation for Docker](./doc/en/docker.md) 
+2. You can install using Pypi:

-Or download source code and compile:
- - init source code 
-  ```sh
-  git clone https://github.com/kvcache-ai/ktransformers.git
-  cd ktransformers
-  git submodule init
-  git submodule update
-  ```
- - [Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
- - Compile and install
   ```
-   bash install.sh
+   pip install ktransformers --no-build-isolation
   ```

+3. Or you can download source code and compile:
+   - init source code 
+     ```sh
+     git clone https://github.com/kvcache-ai/ktransformers.git
+     cd ktransformers
+     git submodule init
+     git submodule update
+     ```
+   - [Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
+   - Compile and install
+     ```
+     bash install.sh
+     ```
+
 <h3>Local Chat</h3>
 We provide a simple command-line local chat Python script that you can run for testing. 

-  > Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we alse support other models, you can replace it with any other model that you want to test. 
+  > Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we also support other models, you can replace it with any other model that you want to test. 


 <h4>Run Example</h4>
--- a/doc/en/Docker.md
+++ b/doc/en/Docker.md
@ -0,0 +1,27 @@
+# Docker
+
+## Prerequisites
+* Docker must be installed and running on your system.
+* Create a folder to store big models & intermediate files (ex. /mnt/models)
+
+## Images
+There are Docker images available for our project：
+
+**Uploading**
+
+## Building docker image locally
+ - Download Dockerfile in [there](../../Dockerfile)
+
+ - finish, execute
+   ```bash
+   docker build  -t approachingai/ktransformers:v0.1.1 .
+   ```
+
+## Usage
+
+Assuming you have the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) that you can use the GPU in a Docker container.
+```
+docker run --gpus all -v /path/to/models:/models -p 10002:10002 approachingai/ktransformers:v0.1.1 --port 10002 --gguf_path /models/path/to/gguf_path --model_path /models/path/to/model_path --web True
+```
+
+More operators you can see in the [readme](../../README.md)
--- a/doc/en/deepseek-v2-injection.md
+++ b/doc/en/deepseek-v2-injection.md
@ -43,7 +43,11 @@ In the current version of KTransformers, we utilize Marlin for GPU kernels and l
    <img alt="CPUInfer Performance" src="../assets/cpuinfer.png" width=80%>
  </picture>
 </p>
-
+<p align="center">
+  <picture>
+    <img alt="marlin performance" src="https://github.com/IST-DASLab/marlin/blob/master/assets/sustained.png?raw=true" width=80%>
+  </picture>
+</p>

 ### Arithmetic Intensity Guided Offloading