mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-06 04:30:03 +00:00
[feature] add support for building docker image
This commit is contained in:
parent
112cb3c962
commit
86ba1336a9
4 changed files with 84 additions and 17 deletions
34
Dockerfile
Normal file
34
Dockerfile
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
FROM node:20.16.0 as web_compile
|
||||||
|
WORKDIR /home
|
||||||
|
RUN <<EOF
|
||||||
|
git clone https://github.com/kvcache-ai/ktransformers.git &&
|
||||||
|
cd ktransformers/ktransformers/website/ &&
|
||||||
|
npm install @vue/cli &&
|
||||||
|
npm run build &&
|
||||||
|
rm -rf node_modules
|
||||||
|
EOF
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
FROM pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel as compile_server
|
||||||
|
WORKDIR /workspace
|
||||||
|
COPY --from=web_compile /home/ktransformers /workspace/ktransformers
|
||||||
|
RUN <<EOF
|
||||||
|
apt update -y && apt install -y --no-install-recommends \
|
||||||
|
git \
|
||||||
|
wget \
|
||||||
|
vim \
|
||||||
|
gcc \
|
||||||
|
g++ \
|
||||||
|
cmake &&
|
||||||
|
rm -rf /var/lib/apt/lists/* &&
|
||||||
|
cd ktransformers &&
|
||||||
|
git submodule init &&
|
||||||
|
git submodule update &&
|
||||||
|
pip install ninja pyproject numpy &&
|
||||||
|
pip install flash-attn &&
|
||||||
|
CPU_INSTRUCT=NATIVE KTRANSFORMERS_FORCE_BUILD=TRUE TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9" pip install . --no-build-isolation --verbose &&
|
||||||
|
pip cache purge
|
||||||
|
EOF
|
||||||
|
|
||||||
|
ENTRYPOINT [ "/opt/conda/bin/ktransformers" ]
|
14
README.md
14
README.md
|
@ -80,13 +80,15 @@ Some preparation:
|
||||||
```
|
```
|
||||||
|
|
||||||
<h3>Installation</h3>
|
<h3>Installation</h3>
|
||||||
You can install using Pypi:
|
|
||||||
|
|
||||||
```
|
1. Use a Docker image, see [documentation for Docker](./doc/en/docker.md)
|
||||||
pip install ktransformers --no-build-isolation
|
2. You can install using Pypi:
|
||||||
```
|
|
||||||
|
|
||||||
Or download source code and compile:
|
```
|
||||||
|
pip install ktransformers --no-build-isolation
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Or you can download source code and compile:
|
||||||
- init source code
|
- init source code
|
||||||
```sh
|
```sh
|
||||||
git clone https://github.com/kvcache-ai/ktransformers.git
|
git clone https://github.com/kvcache-ai/ktransformers.git
|
||||||
|
@ -103,7 +105,7 @@ Or download source code and compile:
|
||||||
<h3>Local Chat</h3>
|
<h3>Local Chat</h3>
|
||||||
We provide a simple command-line local chat Python script that you can run for testing.
|
We provide a simple command-line local chat Python script that you can run for testing.
|
||||||
|
|
||||||
> Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we alse support other models, you can replace it with any other model that you want to test.
|
> Note that this is a very simple test tool only support one round chat without any memory about last input, if you want to try full ability of the model, you may go to [RESTful API and Web UI](#id_666). We use the DeepSeek-V2-Lite-Chat-GGUF model as an example here. But we also support other models, you can replace it with any other model that you want to test.
|
||||||
|
|
||||||
|
|
||||||
<h4>Run Example</h4>
|
<h4>Run Example</h4>
|
||||||
|
|
27
doc/en/Docker.md
Normal file
27
doc/en/Docker.md
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
# Docker
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
* Docker must be installed and running on your system.
|
||||||
|
* Create a folder to store big models & intermediate files (ex. /mnt/models)
|
||||||
|
|
||||||
|
## Images
|
||||||
|
There are Docker images available for our project:
|
||||||
|
|
||||||
|
**Uploading**
|
||||||
|
|
||||||
|
## Building docker image locally
|
||||||
|
- Download Dockerfile in [there](../../Dockerfile)
|
||||||
|
|
||||||
|
- finish, execute
|
||||||
|
```bash
|
||||||
|
docker build -t approachingai/ktransformers:v0.1.1 .
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Assuming you have the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) that you can use the GPU in a Docker container.
|
||||||
|
```
|
||||||
|
docker run --gpus all -v /path/to/models:/models -p 10002:10002 approachingai/ktransformers:v0.1.1 --port 10002 --gguf_path /models/path/to/gguf_path --model_path /models/path/to/model_path --web True
|
||||||
|
```
|
||||||
|
|
||||||
|
More operators you can see in the [readme](../../README.md)
|
|
@ -43,7 +43,11 @@ In the current version of KTransformers, we utilize Marlin for GPU kernels and l
|
||||||
<img alt="CPUInfer Performance" src="../assets/cpuinfer.png" width=80%>
|
<img alt="CPUInfer Performance" src="../assets/cpuinfer.png" width=80%>
|
||||||
</picture>
|
</picture>
|
||||||
</p>
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<picture>
|
||||||
|
<img alt="marlin performance" src="https://github.com/IST-DASLab/marlin/blob/master/assets/sustained.png?raw=true" width=80%>
|
||||||
|
</picture>
|
||||||
|
</p>
|
||||||
|
|
||||||
### Arithmetic Intensity Guided Offloading
|
### Arithmetic Intensity Guided Offloading
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue