open-notebook/Dockerfile.single
orihatav d0bbe4a921 fix: handle tiktoken network errors in offline environments (issue #264)
In air-gapped / offline Docker deployments, tiktoken.get_encoding() tries
to download the encoding file from openaipublic.blob.core.windows.net.
When that request fails it raises a URLError / OSError — not an ImportError
— so the previous except clause silently missed it and the crash surfaced in
the UI.

Widened `except ImportError` to `except Exception` so all failures —
"not installed" and "network unreachable" — fall through to the word-count
fallback (words × 1.3). Added a loguru WARNING so operators can see when
the fallback is active.

TIKTOKEN_CACHE_DIR now reads from the environment with a blank-safe
fallback (`or` guard prevents os.makedirs("") on empty env var). This lets
Docker images redirect the cache to a path outside /app/data/ so user-data
volume mounts cannot shadow the pre-baked encoding.

Both images now pre-download the o200k_base encoding during the builder
stage (internet is available at build time) and copy it into the runtime
image at /app/tiktoken-cache. ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache
is set in the runtime stage so no network call is ever needed at runtime.

Added test_token_count_network_error_fallback in tests/test_utils.py:
patches tiktoken.get_encoding with a URLError and asserts token_count()
returns a positive int instead of raising.

Fixes #264

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 19:45:14 -05:00

97 lines
No EOL
3.3 KiB
Text

# Stage 1: Frontend Builder
FROM node:20-slim AS frontend-builder
WORKDIR /app/frontend
# Copy dependency files first to leverage cache
COPY frontend/package.json frontend/package-lock.json ./
ARG NPM_REGISTRY=https://registry.npmjs.org/
RUN npm config set registry ${NPM_REGISTRY}
RUN npm ci
# Copy the rest of the frontend source
COPY frontend/ ./
# Build the frontend
RUN npm run build
# Stage 2: SurrealDB binary (pinned to v2 to match docker-compose.yml)
FROM surrealdb/surrealdb:v2 AS surreal-binary
# Stage 4: Backend Builder
FROM python:3.12-slim-bookworm AS backend-builder
# Install build dependencies
RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
WORKDIR /app
# Set build optimization environment variables
ENV UV_HTTP_TIMEOUT=120
# Copy dependency files first
COPY pyproject.toml uv.lock ./
COPY open_notebook/__init__.py ./open_notebook/__init__.py
# Install dependencies
RUN uv sync --frozen --no-dev
# Pre-download tiktoken encoding so the app works offline (issue #264).
# /app/tiktoken-cache is intentionally outside /app/data/ so that volume mounts
# of /app/data (for user data persistence) do not hide the pre-baked encoding.
# config.py reads TIKTOKEN_CACHE_DIR from the environment to pick up this path.
ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache
RUN mkdir -p /app/tiktoken-cache && \
.venv/bin/python -c "import tiktoken; tiktoken.get_encoding('o200k_base')"
# Stage 5: Runtime
FROM python:3.12-slim-bookworm AS runtime
# Install runtime dependencies
RUN apt-get update && apt-get upgrade -y && apt-get install -y \
ffmpeg \
supervisor \
curl \
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs \
&& rm -rf /var/lib/apt/lists/*
# Install SurrealDB (copied from pinned v2 image to match docker-compose.yml)
COPY --from=surreal-binary /surreal /usr/local/bin/surreal
# Install uv (optional but helpful for some scripts)
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
WORKDIR /app
# Copy backend virtualenv and source code
COPY --from=backend-builder /app/.venv /app/.venv
COPY . /app/
# Copy pre-downloaded tiktoken encoding from builder (outside /data/ — volume-mount safe)
COPY --from=backend-builder /app/tiktoken-cache /app/tiktoken-cache
# Copy built frontend from standalone output
COPY --from=frontend-builder /app/frontend/.next/standalone /app/frontend/
COPY --from=frontend-builder /app/frontend/.next/static /app/frontend/.next/static
COPY --from=frontend-builder /app/frontend/public /app/frontend/public
# Bind Next.js to all interfaces (required for Docker networking and reverse proxies)
ENV HOSTNAME=0.0.0.0
# Point the app at the pre-baked tiktoken encoding (see open_notebook/config.py)
ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache
# Setup directories and permissions
RUN mkdir -p /app/data /mydata
# Ensure wait-for-api script is executable
RUN chmod +x /app/scripts/wait-for-api.sh
# Copy supervisord configuration
COPY supervisord.single.conf /etc/supervisor/conf.d/supervisord.conf
# Create log directories
RUN mkdir -p /var/log/supervisor
# Expose ports
EXPOSE 8502 5055
# Set startup command
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]