mirror of
https://github.com/lfnovo/open-notebook.git
synced 2026-04-28 19:40:50 +00:00
225 lines
11 KiB
Markdown
225 lines
11 KiB
Markdown
# Open Notebook - Root CLAUDE.md
|
|
|
|
This file provides architectural guidance for contributors working on Open Notebook at the project level.
|
|
|
|
## Project Overview
|
|
|
|
**Open Notebook** is an open-source, privacy-focused alternative to Google's Notebook LM. It's an AI-powered research assistant enabling users to upload multi-modal content (PDFs, audio, video, web pages), generate intelligent notes, search semantically, chat with AI models, and produce professional podcasts—all with complete control over data and choice of AI providers.
|
|
|
|
**Key Values**: Privacy-first, multi-provider AI support, fully self-hosted option, open-source transparency.
|
|
|
|
---
|
|
|
|
## Three-Tier Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Frontend (React/Next.js) │
|
|
│ frontend/ @ port 3000 │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ - Notebooks, sources, notes, chat, podcasts, search UI │
|
|
│ - Zustand state management, TanStack Query (React Query)│
|
|
│ - Shadcn/ui component library with Tailwind CSS │
|
|
└────────────────────────┬────────────────────────────────┘
|
|
│ HTTP REST
|
|
┌────────────────────────▼────────────────────────────────┐
|
|
│ API (FastAPI) │
|
|
│ api/ @ port 5055 │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ - REST endpoints for notebooks, sources, notes, chat │
|
|
│ - LangGraph workflow orchestration │
|
|
│ - Job queue for async operations (podcasts) │
|
|
│ - Multi-provider AI provisioning via Esperanto │
|
|
└────────────────────────┬────────────────────────────────┘
|
|
│ SurrealQL
|
|
┌────────────────────────▼────────────────────────────────┐
|
|
│ Database (SurrealDB) │
|
|
│ Graph database @ port 8000 │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ - Records: Notebook, Source, Note, ChatSession, Credential│
|
|
│ - Relationships: source-to-notebook, note-to-source │
|
|
│ - Vector embeddings for semantic search │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Useful sources
|
|
|
|
User documentation is at @docs/
|
|
|
|
## Tech Stack
|
|
|
|
### Frontend (`frontend/`)
|
|
- **Framework**: Next.js 16 (React 19)
|
|
- **Language**: TypeScript
|
|
- **State Management**: Zustand
|
|
- **Data Fetching**: TanStack Query (React Query)
|
|
- **Styling**: Tailwind CSS + Shadcn/ui
|
|
- **Build Tool**: Webpack (via Next.js)
|
|
- **i18n compatible**: All front-end changes must also consider the translation keys
|
|
|
|
### API Backend (`api/` + `open_notebook/`)
|
|
- **Framework**: FastAPI 0.104+
|
|
- **Language**: Python 3.11+
|
|
- **Workflows**: LangGraph state machines
|
|
- **Database**: SurrealDB async driver
|
|
- **AI Providers**: Esperanto library (8+ providers: OpenAI, Anthropic, Google, Groq, Ollama, Mistral, DeepSeek, xAI)
|
|
- **Job Queue**: Surreal-Commands for async jobs (podcasts)
|
|
- **Logging**: Loguru
|
|
- **Validation**: Pydantic v2
|
|
- **Testing**: Pytest
|
|
|
|
### Database
|
|
- **SurrealDB**: Graph database with built-in embedding storage and vector search
|
|
- **Schema Migrations**: Automatic on API startup via AsyncMigrationManager
|
|
|
|
### Additional Services
|
|
- **Content Processing**: content-core library (file/URL extraction)
|
|
- **Prompts**: AI-Prompter with Jinja2 templating
|
|
- **Podcast Generation**: podcast-creator library
|
|
- **Embeddings**: Multi-provider via Esperanto
|
|
|
|
---
|
|
|
|
## Architecture Highlights
|
|
|
|
### 1. Async-First Design
|
|
- All database queries, graph invocations, and API calls are async (await)
|
|
- SurrealDB async driver with connection pooling
|
|
- FastAPI handles concurrent requests efficiently
|
|
|
|
### 2. LangGraph Workflows
|
|
- **source.py**: Content ingestion (extract → embed → save)
|
|
- **chat.py**: Conversational agent with message history
|
|
- **ask.py**: Search + synthesis (retrieve relevant sources → LLM)
|
|
- **transformation.py**: Custom transformations on sources
|
|
- All use `provision_langchain_model()` for smart model selection
|
|
|
|
### 3. Multi-Provider AI
|
|
- **Esperanto library**: Unified interface to 8+ AI providers
|
|
- **Credential system**: Individual encrypted credential records per provider; models link to credentials for direct config
|
|
- **ModelManager**: Factory pattern with fallback logic; uses credential config when available, env vars as fallback
|
|
- **Smart selection**: Detects large contexts, prefers long-context models
|
|
- **Override support**: Per-request model configuration
|
|
|
|
### 4. Database Schema
|
|
- **Automatic migrations**: AsyncMigrationManager runs on API startup
|
|
- **SurrealDB graph model**: Records with relationships and embeddings
|
|
- **Vector search**: Built-in semantic search across all content
|
|
- **Transactions**: Repo functions handle ACID operations
|
|
|
|
### 5. Error Handling
|
|
- **Custom exceptions** (`exceptions.py`): Hierarchy rooted at `OpenNotebookError` with typed subclasses (`AuthenticationError`, `ConfigurationError`, `RateLimitError`, `ExternalServiceError`, `NetworkError`, etc.)
|
|
- **Error classification** (`utils/error_classifier.py`): `classify_error()` maps raw LLM provider exceptions to typed exceptions with user-friendly messages via keyword matching
|
|
- **Global handlers**: FastAPI exception handlers in `api/main.py` convert typed exceptions to appropriate HTTP status codes (401, 422, 429, 502, etc.)
|
|
|
|
### 6. Authentication
|
|
- **Current**: Simple password middleware (insecure, dev-only)
|
|
- **Production**: Replace with OAuth/JWT (see CONFIGURATION.md)
|
|
|
|
---
|
|
|
|
## Important Quirks & Gotchas
|
|
|
|
### API Startup
|
|
- **Migrations run automatically** on startup; check logs for errors
|
|
- **Must start API before UI**: UI depends on API for all data
|
|
- **SurrealDB must be running**: API fails without database connection
|
|
|
|
### Frontend-Backend Communication
|
|
- **Base API URL**: Configured in `.env.local` (default: http://localhost:5055)
|
|
- **CORS enabled**: Configured in `api/main.py` (allow all origins in dev)
|
|
- **Rate limiting**: Not built-in; add at proxy layer for production
|
|
|
|
### LangGraph Workflows
|
|
- **Blocking operations**: Chat/podcast workflows may take minutes; no timeout
|
|
- **State persistence**: Uses SQLite checkpoint storage in `/data/sqlite-db/`
|
|
- **Model fallback**: If primary model fails, falls back to cheaper/smaller model
|
|
|
|
### Podcast Generation
|
|
- **Async job queue**: `podcast_service.py` submits jobs but doesn't wait
|
|
- **Track status**: Use `/commands/{command_id}` endpoint to poll status
|
|
- **Failure handling**: Failed jobs are marked as "failed" with error messages; retry via `POST /podcasts/episodes/{id}/retry`
|
|
- **No automatic retries**: Podcast jobs use `max_attempts: 1` to prevent duplicate episode records
|
|
- **TTS failures**: Fall back to silent audio if speech synthesis fails
|
|
|
|
### Content Processing
|
|
- **File extraction**: Uses content-core library; supports 50+ file types
|
|
- **URL handling**: Extracts text + metadata from web pages
|
|
- **Large files**: Content processing is sync; may block API briefly
|
|
|
|
---
|
|
|
|
## Component References
|
|
|
|
See dedicated CLAUDE.md files for detailed guidance:
|
|
|
|
- **[frontend/CLAUDE.md](../frontend/CLAUDE.md)**: React/Next.js architecture, state management, API integration
|
|
- **[api/CLAUDE.md](../api/CLAUDE.md)**: FastAPI structure, service pattern, endpoint development
|
|
- **[domain/CLAUDE.md](domain/CLAUDE.md)**: Data models, repository pattern, search functions
|
|
- **[ai/CLAUDE.md](ai/CLAUDE.md)**: ModelManager, AI provider integration, Esperanto usage
|
|
- **[graphs/CLAUDE.md](graphs/CLAUDE.md)**: LangGraph workflow design, state machines
|
|
- **[database/CLAUDE.md](database/CLAUDE.md)**: SurrealDB operations, migrations, async patterns
|
|
|
|
---
|
|
|
|
## Documentation Map
|
|
|
|
- **[README.md](../README.md)**: Project overview, features, quick start
|
|
- **[docs/index.md](../docs/index.md)**: Complete user & deployment documentation
|
|
- **[CONFIGURATION.md](../CONFIGURATION.md)**: Environment variables, model configuration
|
|
- **[CONTRIBUTING.md](../CONTRIBUTING.md)**: Contribution guidelines
|
|
- **[MAINTAINER_GUIDE.md](../MAINTAINER_GUIDE.md)**: Release & maintenance procedures
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
- **Unit tests**: `tests/test_domain.py`, `test_models_api.py`
|
|
- **Graph tests**: `tests/test_graphs.py` (workflow integration)
|
|
- **Utils tests**: `tests/test_utils.py`, `tests/test_chunking.py`, `tests/test_embedding.py`
|
|
- **Run all**: `uv run pytest tests/`
|
|
- **Coverage**: Check with `pytest --cov`
|
|
|
|
---
|
|
|
|
## Common Tasks
|
|
|
|
### Add a New API Endpoint
|
|
1. Create router in `api/routers/feature.py`
|
|
2. Create service in `api/feature_service.py`
|
|
3. Define schemas in `api/models.py`
|
|
4. Register router in `api/main.py`
|
|
5. Test via http://localhost:5055/docs
|
|
|
|
### Add a New LangGraph Workflow
|
|
1. Create `open_notebook/graphs/workflow_name.py`
|
|
2. Define StateDict and node functions
|
|
3. Build graph with `.add_node()` / `.add_edge()`
|
|
4. Invoke in service: `graph.ainvoke({"input": ...}, config={"..."})`
|
|
5. Test with sample data in `tests/`
|
|
|
|
### Add Database Migration
|
|
1. Create `migrations/XXX_description.surql`
|
|
2. Write SurrealQL schema changes
|
|
3. Create `migrations/XXX_description_down.surql` (optional rollback)
|
|
4. API auto-detects on startup; migration runs if newer than recorded version
|
|
|
|
### Deploy to Production
|
|
1. Review [CONFIGURATION.md](CONFIGURATION.md) for security settings
|
|
2. Use `make docker-release` for multi-platform image
|
|
3. Push to Docker Hub / GitHub Container Registry
|
|
4. Deploy `docker compose --profile multi up`
|
|
5. Verify migrations via API logs
|
|
|
|
---
|
|
|
|
## Support & Community
|
|
|
|
- **Documentation**: https://open-notebook.ai
|
|
- **Discord**: https://discord.gg/37XJPXfz2w
|
|
- **Issues**: https://github.com/lfnovo/open-notebook/issues
|
|
- **License**: MIT (see LICENSE)
|
|
|
|
|