mirror of https://github.com/lfnovo/open-notebook.git synced 2026-05-03 05:40:36 +00:00

New front-end
Launch Chat API
Manage Sources
Enable re-embedding of all contents
Sources can be added without a notebook now
Improved settings
Enable model selector on all chats
Background processing for better experience
Dark mode
Improved Notes

Improved Docs: 
- Remove all Streamlit references from documentation
- Update deployment guides with React frontend setup
- Fix Docker environment variables format (SURREAL_URL, SURREAL_PASSWORD)
- Update docker image tag from :latest to :v1-latest
- Change navigation references (Settings → Models to just Models)
- Update development setup to include frontend npm commands
- Add MIGRATION.md guide for users upgrading from Streamlit
- Update quick-start guide with correct environment variables
- Add port 5055 documentation for API access
- Update project structure to reflect frontend/ directory
- Remove outdated source-chat documentation files

2025-10-18 12:46:22 -03:00

12 KiB

Raw Blame History

Frequently Asked Questions

This document addresses common questions about Open Notebook usage, configuration, and best practices.

General Usage

What is Open Notebook?

Open Notebook is an open-source, privacy-focused alternative to Google's Notebook LM. It allows you to:

Create and manage research notebooks
Chat with your documents using AI
Generate podcasts from your content
Search across all your sources with semantic search
Transform and analyze your content

How is Open Notebook different from Google Notebook LM?

Privacy: Your data stays local by default. Only your chosen AI providers receive queries. Flexibility: Support for 15+ AI providers (OpenAI, Anthropic, Google, local models, etc.) Customization: Open source, so you can modify and extend functionality Control: You control your data, models, and processing

Do I need technical skills to use Open Notebook?

Basic usage: No technical skills required. The Docker installation is designed for non-technical users. Advanced features: Some technical knowledge helpful for:

Custom model configurations
API integrations
Source code modifications

Can I use Open Notebook offline?

Partially: The application runs locally, but requires internet for:

AI model API calls (unless using local models like Ollama)
Web content scraping
Some file processing features

Fully offline: Possible with local models (Ollama) for basic functionality.

What file types does Open Notebook support?

Documents:

PDF (text extraction)
Microsoft Word (DOC, DOCX)
Plain text (TXT)
Markdown (MD)

Web Content:

URLs (automatic web scraping)
YouTube videos (transcript extraction)
Web articles and blog posts

Media:

Images (PNG, JPG, GIF, WebP) with OCR
Audio files (MP3, WAV, M4A) with transcription
Video files (MP4, AVI, MOV) for transcript extraction

Other:

Direct text input
CSV data
Code files (with syntax highlighting)

How much does it cost to run Open Notebook?

Software: Free (open source) AI API costs: Pay-per-use to providers:

OpenAI: ~$0.50-5 per 1M tokens depending on model
Anthropic: ~$3-75 per 1M tokens depending on model
Google: Often free tier available
Local models: Free after initial setup

Typical monthly costs: $5-50 for moderate usage, depending on chosen models.

AI Models and Providers

Which AI provider should I choose?

For beginners: OpenAI (reliable, well-documented, good balance of cost/quality) For advanced users: Mix of providers based on specific needs For privacy: Local models (Ollama) or European providers (Mistral) For cost optimization: DeepSeek, Google (free tier), or OpenRouter

Can I use multiple AI providers simultaneously?

Yes: Open Notebook supports multiple providers. You can configure different providers for different tasks:

OpenAI for chat
Google for embeddings
ElevenLabs for text-to-speech
Anthropic for complex reasoning

What are the best model combinations?

Budget-friendly:

Language: gpt-5-mini (OpenAI) or deepseek-chat (DeepSeek)
Embedding: text-embedding-3-small (OpenAI)
TTS: gpt-4o-mini-tts (OpenAI)

High-quality:

Language: claude-3-7-sonnet (Anthropic) or gpt-4o (OpenAI)
Embedding: text-embedding-3-large (OpenAI)
TTS: eleven_turbo_v2_5 (ElevenLabs)

Privacy-focused:

Language: Local Ollama models
Embedding: Local embedding models
TTS: Local TTS solutions

How do I set up local models with Ollama?

Install Ollama: Download from https://ollama.ai
Start Ollama: ollama serve
Download models: ollama pull llama2
Configure Open Notebook:
```
OLLAMA_API_BASE=http://localhost:11434
```
Select models: In Models, choose Ollama models

Why are my AI requests failing?

Common causes:

Invalid API keys
Insufficient credits/billing
Model not available for your account
Rate limiting
Network connectivity issues

Solutions:

Verify API keys in provider dashboard
Check billing and usage limits
Try different models
Wait and retry for rate limits
Check internet connection

How do I optimize AI costs?

Model selection:

Use smaller models for simple tasks
Use larger models only for complex reasoning
Leverage free tiers when available

Usage optimization:

Process documents in batches
Use shorter prompts
Cache results when possible
Use local models for frequent tasks

Provider diversity:

Use OpenRouter for expensive models
Use free tier providers for testing
Mix providers based on strength

Data Management

Where is my data stored?

Local storage: By default, all data is stored locally:

Database: SurrealDB files in surreal_data/
Uploads: Files in notebook_data/
No external data transmission (except to chosen AI providers)

Cloud storage: Not implemented, but can be configured with external storage solutions.

How do I backup my data?

Manual backup:

# Create backup
tar -czf backup-$(date +%Y%m%d).tar.gz notebook_data/ surreal_data/

# Restore backup
tar -xzf backup-20240101.tar.gz

Automated backup: Set up cron jobs or use your preferred backup solution to backup the data directories.

Can I sync data between devices?

Currently: No built-in sync functionality. Workarounds:

Use shared network storage for data directories
Manual backup/restore between devices
Database replication (advanced)

How do I migrate data between installations?

Stop services: make stop-all

Copy data directories:

cp -r surreal_data/ new_installation/
cp -r notebook_data/ new_installation/

Start new installation
Verify data integrity

What happens to my data if I delete a notebook?

Soft deletion: Notebooks are marked as archived, not permanently deleted. Hard deletion: Currently not implemented in UI, but possible via API. Recovery: Archived notebooks can be restored from the database.

How do I clean up old data?

Manual cleanup:

Delete unused notebooks through UI
Remove old files from notebook_data/
Clear browser cache

Database cleanup: Advanced users can query the database directly to remove old records.

Best Practices

How should I organize my notebooks?

By topic: Create separate notebooks for different research areas By project: One notebook per project or course By source type: Separate notebooks for different content types By time period: Monthly or quarterly notebooks

What's the optimal notebook size?

Recommended: 20-100 sources per notebook Performance: Larger notebooks may have slower search Organization: Better to have focused notebooks than everything in one

How do I get the best search results?

Use descriptive queries: Instead of "data", use "data analysis methods" Combine keywords: Use multiple related terms Use natural language: Ask questions as you would to a human Refine iteratively: Start broad, then get more specific

How can I improve chat responses?

Provide context: Reference specific sources or topics Be specific: Ask detailed questions rather than general ones Use follow-up questions: Build on previous responses Include examples: Show what kind of response you want

What's the best way to process large documents?

Break into sections: Split large documents into smaller parts Use transformations: Apply summarization before adding to notebook Batch processing: Process multiple documents at once Use background jobs: For heavy processing tasks

How do I handle multiple languages?

Model selection: Choose models that support your languages Language-specific providers: Some providers are better for certain languages Separate notebooks: Consider separate notebooks for different languages Encoding: Ensure proper text encoding for non-English content

What are the security best practices?

API keys: Never share API keys publicly Password protection: Use strong passwords for public deployments Network security: Use HTTPS for production deployments Regular updates: Keep Docker images updated Backup encryption: Encrypt backups if they contain sensitive data

How do I optimize performance?

Hardware:

Use SSD storage for database
Allocate sufficient RAM (4GB+ recommended)
Use fast internet connection

Configuration:

Choose appropriate models for your needs
Optimize embedding dimensions
Use efficient file formats

Usage patterns:

Process documents in batches
Use background jobs for heavy tasks
Clear cache periodically

Technical Questions

Can I use Open Notebook programmatically?

Yes: Open Notebook provides a comprehensive REST API:

Full API documentation at /docs endpoint
Support for all UI functionality
Authentication via API keys
Webhook support for notifications

How do I extend Open Notebook?

Plugin system: Add custom transformations and processors API integration: Build custom applications using the API Source code: Modify the open-source codebase Custom models: Add support for new AI providers

Can I run Open Notebook in production?

Yes: Designed for production use with:

Docker deployment
Horizontal scaling capability
Security features
Monitoring and logging

Considerations:

Use production-grade database settings
Implement proper backup strategy
Configure monitoring and alerting
Use HTTPS and security best practices

How do I contribute to Open Notebook?

Ways to contribute:

Report bugs and issues
Suggest new features
Contribute code improvements
Improve documentation
Help other users in the community

Getting started:

Join Discord community
Check GitHub issues
Read contribution guidelines
Start with small improvements

What's the development roadmap?

Current focus:

Stability and performance improvements
Additional AI provider support
Enhanced podcast generation
Better mobile experience

Future plans:

Multi-user support
Advanced analytics
Integration with external tools
Cloud deployment options

Troubleshooting

My question isn't answered here. What should I do?

Check the troubleshooting guide: Common Issues
Search existing issues: GitHub repository issues
Ask the community: Discord server
Create a GitHub issue: For bugs or feature requests
Check the documentation: Other documentation sections

How do I report a bug?

Include:

Steps to reproduce
Expected vs actual behavior
Error messages and logs
System information
Configuration details (without API keys)

Submit to: GitHub Issues with bug report template

How do I request a new feature?

Process:

Check if feature already exists or is planned
Discuss in Discord to gauge interest
Create detailed GitHub issue
Consider contributing implementation

Where can I get help with installation?

Resources:

How do I stay updated with new releases?

Methods:

Watch GitHub repository
Join Discord for announcements
Follow release notes
Enable automatic Docker updates

This FAQ is updated regularly based on community questions and feedback. If you have a question that's not covered here, please ask in our Discord community or create a GitHub issue.

12 KiB Raw Blame History