SurfSense/surfsense_web/content/docs/docker-installation.mdx
DESKTOP-RTLN3BA\$punk f47a41e3af
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
docs: updated info for kokoro
2025-08-13 17:57:33 -07:00

253 lines
15 KiB
Text

---
title: Docker Installation
description: Setting up SurfSense using Docker
full: true
---
# Docker Installation
This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment.
## Prerequisites
Before you begin, ensure you have:
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) installed on your machine
- [Git](https://git-scm.com/downloads) (to clone the repository)
- Completed all the [prerequisite setup steps](/docs) including:
- PGVector setup
- **File Processing ETL Service** (choose one):
- Unstructured.io API key (Supports 34+ formats)
- LlamaIndex API key (enhanced parsing, supports 50+ formats)
- Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
- Other required API keys
## Installation Steps
1. **Configure Environment Variables**
Set up the necessary environment variables:
**Linux/macOS:**
```bash
# Copy example environment files
cp surfsense_backend/.env.example surfsense_backend/.env
cp surfsense_web/.env.example surfsense_web/.env
cp .env.example .env # For Docker-specific settings
```
**Windows (Command Prompt):**
```cmd
copy surfsense_backend\.env.example surfsense_backend\.env
copy surfsense_web\.env.example surfsense_web\.env
copy .env.example .env
```
**Windows (PowerShell):**
```powershell
Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env
Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env
Copy-Item -Path .env.example -Destination .env
```
Edit all `.env` files and fill in the required values:
### Docker-Specific Environment Variables
| ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
|----------------------------|-----------------------------------------------------------------------------|---------------------|
| FRONTEND_PORT | Port for the frontend service | 3000 |
| BACKEND_PORT | Port for the backend API service | 8000 |
| POSTGRES_PORT | Port for the PostgreSQL database | 5432 |
| PGADMIN_PORT | Port for pgAdmin web interface | 5050 |
| POSTGRES_USER | PostgreSQL username | postgres |
| POSTGRES_PASSWORD | PostgreSQL password | postgres |
| POSTGRES_DB | PostgreSQL database name | surfsense |
| PGADMIN_DEFAULT_EMAIL | Email for pgAdmin login | admin@surfsense.com |
| PGADMIN_DEFAULT_PASSWORD | Password for pgAdmin login | surfsense |
| NEXT_PUBLIC_API_URL | URL of the backend API (used by frontend) | http://backend:8000 |
**Backend Environment Variables:**
| ENV VARIABLE | DESCRIPTION |
| -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., `http://localhost:3000`) |
| AUTH_TYPE | Authentication method: `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication |
| GOOGLE_OAUTH_CLIENT_ID | (Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE) |
| GOOGLE_OAUTH_CLIENT_SECRET | (Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
| TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `local/kokoro`, `openai/tts-1`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) |
| TTS_SERVICE_API_KEY | API key for the Text-to-Speech service |
| TTS_SERVICE_API_BASE | (Optional) Custom API base URL for the Text-to-Speech service |
| STT_SERVICE | Speech-to-Text API provider for Podcasts (e.g., `openai/whisper-1`). See [supported providers](https://docs.litellm.ai/docs/audio_transcription#supported-providers) |
| STT_SERVICE_API_KEY | API key for the Speech-to-Text service |
| STT_SERVICE_API_BASE | (Optional) Custom API base URL for the Speech-to-Text service |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
| ETL_SERVICE | Document parsing service: `UNSTRUCTURED` (supports 34+ formats), `LLAMACLOUD` (supports 50+ formats including legacy document types), or `DOCLING` (local processing, supports PDF, Office docs, images, HTML, CSV) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED) |
| LLAMA_CLOUD_API_KEY | API key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD) |
**Optional Backend LangSmith Observability:**
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| LANGSMITH_TRACING | Enable LangSmith tracing (e.g., `true`) |
| LANGSMITH_ENDPOINT | LangSmith API endpoint (e.g., `https://api.smith.langchain.com`) |
| LANGSMITH_API_KEY | Your LangSmith API key |
| LANGSMITH_PROJECT | LangSmith project name (e.g., `surfsense`) |
**Backend Uvicorn Server Configuration:**
| ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
|------------------------------|---------------------------------------------|---------------|
| UVICORN_HOST | Host address to bind the server | 0.0.0.0 |
| UVICORN_PORT | Port to run the backend API | 8000 |
| UVICORN_LOG_LEVEL | Logging level (e.g., info, debug, warning) | info |
| UVICORN_PROXY_HEADERS | Enable/disable proxy headers | false |
| UVICORN_FORWARDED_ALLOW_IPS | Comma-separated list of allowed IPs | 127.0.0.1 |
| UVICORN_WORKERS | Number of worker processes | 1 |
| UVICORN_ACCESS_LOG | Enable/disable access log (true/false) | true |
| UVICORN_LOOP | Event loop implementation | auto |
| UVICORN_HTTP | HTTP protocol implementation | auto |
| UVICORN_WS | WebSocket protocol implementation | auto |
| UVICORN_LIFESPAN | Lifespan implementation | auto |
| UVICORN_LOG_CONFIG | Path to logging config file or empty string | |
| UVICORN_SERVER_HEADER | Enable/disable Server header | true |
| UVICORN_DATE_HEADER | Enable/disable Date header | true |
| UVICORN_LIMIT_CONCURRENCY | Max concurrent connections | |
| UVICORN_LIMIT_MAX_REQUESTS | Max requests before worker restart | |
| UVICORN_TIMEOUT_KEEP_ALIVE | Keep-alive timeout (seconds) | 5 |
| UVICORN_TIMEOUT_NOTIFY | Worker shutdown notification timeout (sec) | 30 |
| UVICORN_SSL_KEYFILE | Path to SSL key file | |
| UVICORN_SSL_CERTFILE | Path to SSL certificate file | |
| UVICORN_SSL_KEYFILE_PASSWORD | Password for SSL key file | |
| UVICORN_SSL_VERSION | SSL version | |
| UVICORN_SSL_CERT_REQS | SSL certificate requirements | |
| UVICORN_SSL_CA_CERTS | Path to CA certificates file | |
| UVICORN_SSL_CIPHERS | SSL ciphers | |
| UVICORN_HEADERS | Comma-separated list of headers | |
| UVICORN_USE_COLORS | Enable/disable colored logs | true |
| UVICORN_UDS | Unix domain socket path | |
| UVICORN_FD | File descriptor to bind to | |
| UVICORN_ROOT_PATH | Root path for the application | |
For more details, see the [Uvicorn documentation](https://www.uvicorn.org/#command-line-options).
### Frontend Environment Variables
| ENV VARIABLE | DESCRIPTION |
| ------------------------------- | ---------------------------------------------------------- |
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend service (e.g., `http://localhost:8000`) |
| NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE | Same value as set in backend AUTH_TYPE i.e `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication |
| NEXT_PUBLIC_ETL_SERVICE | Document parsing service (should match backend ETL_SERVICE): `UNSTRUCTURED`, `LLAMACLOUD`, or `DOCLING` - affects supported file formats in upload interface |
2. **Build and Start Containers**
Start the Docker containers:
**Linux/macOS/Windows:**
```bash
docker compose up --build
```
To run in detached mode (in the background):
**Linux/macOS/Windows:**
```bash
docker compose up -d
```
**Note for Windows users:** If you're using older Docker Desktop versions, you might need to use `docker compose` (with a space) instead of `docker compose`.
3. **Access the Applications**
Once the containers are running, you can access:
- Frontend: [http://localhost:3000](http://localhost:3000)
- Backend API: [http://localhost:8000](http://localhost:8000)
- API Documentation: [http://localhost:8000/docs](http://localhost:8000/docs)
- pgAdmin: [http://localhost:5050](http://localhost:5050)
## Using pgAdmin
pgAdmin is included in the Docker setup to help manage your PostgreSQL database. To connect:
1. Open pgAdmin at [http://localhost:5050](http://localhost:5050)
2. Login with the credentials from your `.env` file (default: admin@surfsense.com / surfsense)
3. Right-click "Servers" > "Create" > "Server"
4. In the "General" tab, name your connection (e.g., "SurfSense DB")
5. In the "Connection" tab:
- Host: `db`
- Port: `5432`
- Maintenance database: `surfsense`
- Username: `postgres` (or your custom POSTGRES_USER)
- Password: `postgres` (or your custom POSTGRES_PASSWORD)
6. Click "Save" to connect
## Useful Docker Commands
### Container Management
- **Stop containers:**
**Linux/macOS/Windows:**
```bash
docker compose down
```
- **View logs:**
**Linux/macOS/Windows:**
```bash
# All services
docker compose logs -f
# Specific service
docker compose logs -f backend
docker compose logs -f frontend
docker compose logs -f db
```
- **Restart a specific service:**
**Linux/macOS/Windows:**
```bash
docker compose restart backend
```
- **Execute commands in a running container:**
**Linux/macOS/Windows:**
```bash
# Backend
docker compose exec backend python -m pytest
# Frontend
docker compose exec frontend pnpm lint
```
## Troubleshooting
- **Linux/macOS:** If you encounter permission errors, you may need to run the docker commands with `sudo`.
- **Windows:** If you see access denied errors, make sure you're running Command Prompt or PowerShell as Administrator.
- If ports are already in use, modify the port mappings in the `docker-compose.yml` file.
- For backend dependency issues, check the `Dockerfile` in the backend directory.
- For frontend dependency issues, check the `Dockerfile` in the frontend directory.
- **Windows-specific:** If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with `git config --global core.autocrlf true` before cloning the repository.
## Next Steps
Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account.