bookssearch/docker_optimization.md

4.9 KiB

Docker Optimization Recommendations

Optimized Dockerfile with Multi-Stage Build

Replace your current Dockerfile with this optimized version that uses multi-stage builds:

# Stage 1: Build dependencies
FROM python:3.9-alpine AS builder

WORKDIR /build

# Install build dependencies
RUN apk add --no-cache --virtual .build-deps gcc musl-dev

# Copy requirements file
COPY src/requirements.txt .

# Install dependencies into a virtual environment
RUN python -m venv /venv && \
    /venv/bin/pip install --no-cache-dir --upgrade pip && \
    /venv/bin/pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime image
FROM python:3.9-alpine

WORKDIR /app

# Copy virtual environment from builder stage
COPY --from=builder /venv /venv

# Make sure we use the virtualenv
ENV PATH="/venv/bin:$PATH"
ENV PYTHONPATH=/app

# Create books directory with proper permissions
RUN mkdir -p /books && chmod 777 /books

# Create project directory structure
RUN mkdir -p src/api/static src/api/templates src/core tests/unit

# Copy only necessary files
COPY src/api/app.py src/api/
COPY src/api/static src/api/static
COPY src/api/templates src/api/templates
COPY src/core/index.py src/core/
COPY tests/unit/ tests/unit/

# Expose the API port
EXPOSE 5000

# Run as non-root user for better security
RUN adduser -D appuser
USER appuser

# Command to run the API
CMD ["python", "src/api/app.py"]

Benefits of this Multi-Stage Build

  1. Smaller Image Size: The final image only contains runtime dependencies, not build tools.
  2. Improved Security:
    • Running as a non-root user (appuser)
    • Fewer packages installed in the final image means a smaller attack surface
  3. Cleaner Build Process: Build dependencies are isolated in the first stage
  4. Better Caching: Dependencies are installed separately from application code

Optimized docker-compose.yml

Here's an improved version of your docker-compose.yml file:

version: '3.7'
services:
  booksearch_app:
    build:
      context: .
      dockerfile: Dockerfile  # Update to point to your new Dockerfile
    container_name: booksearch_app
    ports:
      - "8000:5000"
    environment:
      - ELASTICSEARCH_HOST=booksearch_elastic
      - BASE_URL=${BASE_URL}
      - CPU_LIMIT=${CPU_LIMIT}
      - ADMIN_USER=${ADMIN_USER}
      - ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - SNIPPET_CHAR_LIMIT=${SNIPPET_CHAR_LIMIT}
      - ITEMS_PER_PAGE=${ITEMS_PER_PAGE}
      - INDEXING_BATCH_SIZE=${INDEXING_BATCH_SIZE}
      - INDEXING_BATCH_DELAY=${INDEXING_BATCH_DELAY}
    volumes:
      - ${SMB_SHARE_PATH}:/books:ro  # Added :ro for read-only access for better security
    depends_on:
      booksearch_elastic:
        condition: service_healthy  # Ensures Elasticsearch is fully ready
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: ${CPU_LIMIT}
          memory: 4G
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  booksearch_elastic:
    container_name: booksearch_elastic
    image: bitnami/elasticsearch:latest
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - discovery.type=single-node
      - ELASTICSEARCH_USERNAME=${ELASTICSEARCH_USERNAME}
      - ELASTICSEARCH_PASSWORD=${ELASTICSEARCH_PASSWORD}
      - ELASTICSEARCH_PLUGINS=analysis-stempel
      - ES_JAVA_OPTS=-Xms6g -Xmx6g
      - bootstrap.memory_lock=true
      - "ELASTICSEARCH_HEAP_SIZE=6g"
      - "ELASTICSEARCH_EXTRA_JAVA_OPTS=-Xms6g -Xmx6g"
      - "ELASTICSEARCH_CLUSTER_SETTINGS=index.max_result_window=50000"
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 8g
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch_data:/bitnami/elasticsearch/data  # Persist Elasticsearch data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9200/_nodes/plugins?filter_path=nodes.*.plugins"]
      interval: 30s
      timeout: 10s
      retries: 5

volumes:
  elasticsearch_data:  # Define a named volume for Elasticsearch data

Additional Docker Optimization Recommendations

  1. Add a .dockerignore file to exclude unnecessary files from the build context:
.git
.gitignore
.env
.env.example
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
  1. Consider using Docker BuildKit for faster builds:

    • Set DOCKER_BUILDKIT=1 environment variable before building
    • Or add { "features": { "buildkit": true } } to your Docker daemon.json
  2. Implement proper health checks for all services to ensure they're ready before dependent services try to use them

  3. Use specific version tags for base images instead of 'latest' to ensure reproducible builds

  4. Scan your Docker images for vulnerabilities using tools like Trivy, Clair, or Docker Scout