[feature] add support for building docker image

This commit is contained in:
chenxl 2024-08-01 04:01:00 +00:00
parent 112cb3c962
commit 86ba1336a9
4 changed files with 84 additions and 17 deletions

34
Dockerfile Normal file
View file

@ -0,0 +1,34 @@
FROM node:20.16.0 as web_compile
WORKDIR /home
RUN <<EOF
git clone https://github.com/kvcache-ai/ktransformers.git &&
cd ktransformers/ktransformers/website/ &&
npm install @vue/cli &&
npm run build &&
rm -rf node_modules
EOF
FROM pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel as compile_server
WORKDIR /workspace
COPY --from=web_compile /home/ktransformers /workspace/ktransformers
RUN <<EOF
apt update -y && apt install -y --no-install-recommends \
git \
wget \
vim \
gcc \
g++ \
cmake &&
rm -rf /var/lib/apt/lists/* &&
cd ktransformers &&
git submodule init &&
git submodule update &&
pip install ninja pyproject numpy &&
pip install flash-attn &&
CPU_INSTRUCT=NATIVE KTRANSFORMERS_FORCE_BUILD=TRUE TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9" pip install . --no-build-isolation --verbose &&
pip cache purge
EOF
ENTRYPOINT [ "/opt/conda/bin/ktransformers" ]

View file

@ -80,30 +80,32 @@ Some preparation:
```
<h3>Installation</h3>
You can install using Pypi:
```
pip install ktransformers --no-build-isolation
```
1. Use a Docker image, see [documentation for Docker](./doc/en/docker.md)
2. You can install using Pypi:
Or download source code and compile:
- init source code
```sh
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule init
git submodule update
```
- [Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
- Compile and install
```
bash install.sh
pip install ktransformers --no-build-isolation
```
3. Or you can download source code and compile:
- init source code
```sh
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule init
git submodule update
```
- [Optional] If you want to run with website, please [compile the website](./doc/en/api/server/website.md) before execute ```bash install.sh```
- Compile and install
```
bash install.sh
```
<h3>Local Chat</h3>
We provide a simple command-line local chat Python script that you can run for testing.
> Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we alse support other models, you can replace it with any other model that you want to test.
> Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we also support other models, you can replace it with any other model that you want to test.
<h4>Run Example</h4>

27
doc/en/Docker.md Normal file
View file

@ -0,0 +1,27 @@
# Docker
## Prerequisites
* Docker must be installed and running on your system.
* Create a folder to store big models & intermediate files (ex. /mnt/models)
## Images
There are Docker images available for our project
**Uploading**
## Building docker image locally
- Download Dockerfile in [there](../../Dockerfile)
- finish, execute
```bash
docker build -t approachingai/ktransformers:v0.1.1 .
```
## Usage
Assuming you have the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) that you can use the GPU in a Docker container.
```
docker run --gpus all -v /path/to/models:/models -p 10002:10002 approachingai/ktransformers:v0.1.1 --port 10002 --gguf_path /models/path/to/gguf_path --model_path /models/path/to/model_path --web True
```
More operators you can see in the [readme](../../README.md)

View file

@ -43,7 +43,11 @@ In the current version of KTransformers, we utilize Marlin for GPU kernels and l
<img alt="CPUInfer Performance" src="../assets/cpuinfer.png" width=80%>
</picture>
</p>
<p align="center">
<picture>
<img alt="marlin performance" src="https://github.com/IST-DASLab/marlin/blob/master/assets/sustained.png?raw=true" width=80%>
</picture>
</p>
### Arithmetic Intensity Guided Offloading