4.9 KiB
4.9 KiB
Docker Optimization Recommendations
Optimized Dockerfile with Multi-Stage Build
Replace your current Dockerfile with this optimized version that uses multi-stage builds:
# Stage 1: Build dependencies
FROM python:3.9-alpine AS builder
WORKDIR /build
# Install build dependencies
RUN apk add --no-cache --virtual .build-deps gcc musl-dev
# Copy requirements file
COPY src/requirements.txt .
# Install dependencies into a virtual environment
RUN python -m venv /venv && \
/venv/bin/pip install --no-cache-dir --upgrade pip && \
/venv/bin/pip install --no-cache-dir -r requirements.txt
# Stage 2: Runtime image
FROM python:3.9-alpine
WORKDIR /app
# Copy virtual environment from builder stage
COPY --from=builder /venv /venv
# Make sure we use the virtualenv
ENV PATH="/venv/bin:$PATH"
ENV PYTHONPATH=/app
# Create books directory with proper permissions
RUN mkdir -p /books && chmod 777 /books
# Create project directory structure
RUN mkdir -p src/api/static src/api/templates src/core tests/unit
# Copy only necessary files
COPY src/api/app.py src/api/
COPY src/api/static src/api/static
COPY src/api/templates src/api/templates
COPY src/core/index.py src/core/
COPY tests/unit/ tests/unit/
# Expose the API port
EXPOSE 5000
# Run as non-root user for better security
RUN adduser -D appuser
USER appuser
# Command to run the API
CMD ["python", "src/api/app.py"]
Benefits of this Multi-Stage Build
- Smaller Image Size: The final image only contains runtime dependencies, not build tools.
- Improved Security:
- Running as a non-root user (appuser)
- Fewer packages installed in the final image means a smaller attack surface
- Cleaner Build Process: Build dependencies are isolated in the first stage
- Better Caching: Dependencies are installed separately from application code
Optimized docker-compose.yml
Here's an improved version of your docker-compose.yml file:
version: '3.7'
services:
booksearch_app:
build:
context: .
dockerfile: Dockerfile # Update to point to your new Dockerfile
container_name: booksearch_app
ports:
- "8000:5000"
environment:
- ELASTICSEARCH_HOST=booksearch_elastic
- BASE_URL=${BASE_URL}
- CPU_LIMIT=${CPU_LIMIT}
- ADMIN_USER=${ADMIN_USER}
- ADMIN_PASSWORD=${ADMIN_PASSWORD}
- SNIPPET_CHAR_LIMIT=${SNIPPET_CHAR_LIMIT}
- ITEMS_PER_PAGE=${ITEMS_PER_PAGE}
- INDEXING_BATCH_SIZE=${INDEXING_BATCH_SIZE}
- INDEXING_BATCH_DELAY=${INDEXING_BATCH_DELAY}
volumes:
- ${SMB_SHARE_PATH}:/books:ro # Added :ro for read-only access for better security
depends_on:
booksearch_elastic:
condition: service_healthy # Ensures Elasticsearch is fully ready
restart: unless-stopped
deploy:
resources:
limits:
cpus: ${CPU_LIMIT}
memory: 4G
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:5000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
booksearch_elastic:
container_name: booksearch_elastic
image: bitnami/elasticsearch:latest
ports:
- "9200:9200"
- "9300:9300"
environment:
- discovery.type=single-node
- ELASTICSEARCH_USERNAME=${ELASTICSEARCH_USERNAME}
- ELASTICSEARCH_PASSWORD=${ELASTICSEARCH_PASSWORD}
- ELASTICSEARCH_PLUGINS=analysis-stempel
- ES_JAVA_OPTS=-Xms6g -Xmx6g
- bootstrap.memory_lock=true
- "ELASTICSEARCH_HEAP_SIZE=6g"
- "ELASTICSEARCH_EXTRA_JAVA_OPTS=-Xms6g -Xmx6g"
- "ELASTICSEARCH_CLUSTER_SETTINGS=index.max_result_window=50000"
restart: unless-stopped
deploy:
resources:
limits:
memory: 8g
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- elasticsearch_data:/bitnami/elasticsearch/data # Persist Elasticsearch data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9200/_nodes/plugins?filter_path=nodes.*.plugins"]
interval: 30s
timeout: 10s
retries: 5
volumes:
elasticsearch_data: # Define a named volume for Elasticsearch data
Additional Docker Optimization Recommendations
- Add a .dockerignore file to exclude unnecessary files from the build context:
.git
.gitignore
.env
.env.example
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
-
Consider using Docker BuildKit for faster builds:
- Set
DOCKER_BUILDKIT=1
environment variable before building - Or add
{ "features": { "buildkit": true } }
to your Docker daemon.json
- Set
-
Implement proper health checks for all services to ensure they're ready before dependent services try to use them
-
Use specific version tags for base images instead of 'latest' to ensure reproducible builds
-
Scan your Docker images for vulnerabilities using tools like Trivy, Clair, or Docker Scout