mirror of https://github.com/lfnovo/open-notebook.git synced 2026-05-02 21:30:38 +00:00

New front-end
Launch Chat API
Manage Sources
Enable re-embedding of all contents
Sources can be added without a notebook now
Improved settings
Enable model selector on all chats
Background processing for better experience
Dark mode
Improved Notes

Improved Docs: 
- Remove all Streamlit references from documentation
- Update deployment guides with React frontend setup
- Fix Docker environment variables format (SURREAL_URL, SURREAL_PASSWORD)
- Update docker image tag from :latest to :v1-latest
- Change navigation references (Settings → Models to just Models)
- Update development setup to include frontend npm commands
- Add MIGRATION.md guide for users upgrading from Streamlit
- Update quick-start guide with correct environment variables
- Add port 5055 documentation for API access
- Update project structure to reflect frontend/ directory
- Remove outdated source-chat documentation files

2025-10-18 12:46:22 -03:00

22 KiB

Raw Blame History

AI Models & Providers

Open Notebook supports 16+ AI providers, giving you complete flexibility in choosing the AI models that best fit your needs, budget, and privacy requirements. This comprehensive guide covers everything you need to know about selecting, configuring, and optimizing your AI models.

Quick Start

For immediate setup, use one of these configurations:

OpenAI Only (Simplest)

# Set environment variable
export OPENAI_API_KEY=your_key_here

# Configure these models in Settings:
# Chat: gpt-5-mini
# Tools: gpt-5  
# Transformations: gpt-5-mini
# Embedding: text-embedding-3-small
# Speech-to-Text: whisper-1
# Text-to-Speech: tts-1

Mixed Providers (Best Value)

# Environment variables
export OPENAI_API_KEY=your_key
export GEMINI_API_KEY=your_key
export OLLAMA_API_BASE=http://localhost:11434

# Recommended configuration in settings covered below

Understanding Model Types

Open Notebook uses four distinct types of AI models, each optimized for specific tasks:

🧠 Language Models

Purpose: Chat conversations, text generation, summaries, and tool calling
Key Features: Reasoning, instruction following, context understanding
Usage: Primary interface for AI interactions

🔍 Embedding Models

Purpose: Semantic search and content similarity matching
Key Features: Convert text to numerical vectors for similarity comparison
Usage: Power the search functionality across your content

🎙️ Text-to-Speech (TTS)

Purpose: Generate podcasts and audio content
Key Features: Natural-sounding voice synthesis
Usage: Convert your notes and research into professional podcasts

🎧 Speech-to-Text (STT)

Purpose: Transcribe audio and video files
Key Features: Accurate transcription with speaker identification
Usage: Convert audio/video sources into searchable text

Provider Support Matrix

Provider	Language	Embedding	STT	TTS
OpenAI	✅	✅	✅	✅
Anthropic	✅	❌	❌	❌
Google (Gemini)	✅	✅	❌	✅
Ollama	✅	✅	❌	❌
ElevenLabs	❌	❌	✅	✅
Mistral	✅	✅	❌	❌
DeepSeek	✅	❌	❌	❌
xAI (Grok)	✅	❌	❌	❌
Voyage AI	❌	✅	❌	❌
Groq	✅	❌	✅	❌
Vertex AI	✅	✅	❌	✅
Azure OpenAI	✅	✅	❌	❌
OpenRouter	✅	❌	❌	❌
Perplexity	✅	❌	❌	❌
OpenAI Compatible	✅	❌	❌	❌

Model Selection Guide

🎯 Selection Criteria

💰 Cost Considerations

Free: Ollama models (run locally)
Budget: OpenAI gpt-5-mini, Gemini Flash models
Premium: Claude 3.5 Sonnet, gpt-5, Grok-3

🎯 Quality Factors

Reasoning: Claude 3.5 Sonnet, Grok-3, DeepSeek-R1
Tool Calling: gpt-5, Claude 3.5 Sonnet, Grok-3
Large Context: Gemini models (up to 2M tokens)
Speed: Groq models, Ollama local models

🔧 Special Features

Reasoning Models: Show transparent thinking process
Multilingual: Gemini, Claude, GPT-4
Code Generation: Claude 3.5 Sonnet, gpt-5
Creative Writing: Claude, gpt-5, Grok

Provider Deep Dive

🟦 Google (Gemini)

Best for: Large context processing, cost-effective high-quality models

Environment Setup

export GEMINI_API_KEY=your_api_key_here

Recommended Models

Language: gemini-2.0-flash, gemini-2.5-pro-preview-06-05
TTS: gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts
Embedding: text-embedding-004

Strengths

Massive context windows (up to 2M tokens)
Excellent price-to-performance ratio
Strong multilingual capabilities
Integrated TTS with good quality

Considerations

No STT support
Newer models may have limited availability

🟢 OpenAI

Best for: Reliable performance, excellent tool calling, comprehensive ecosystem

Environment Setup

export OPENAI_API_KEY=your_api_key_here

Recommended Models

Language: gpt-5-mini, gpt-5
TTS: tts-1, gpt-4o-mini-tts
STT: whisper-1
Embedding: text-embedding-3-small

Strengths

Most mature ecosystem
Excellent tool calling capabilities
Industry-standard STT with Whisper
Consistent performance across models

Considerations

Higher costs for premium models
Data privacy concerns for sensitive content

🟣 Anthropic (Claude)

Best for: High-quality reasoning, safety, and nuanced understanding

Environment Setup

export ANTHROPIC_API_KEY=your_api_key_here

Recommended Models

Language: claude-3-5-sonnet-latest

Strengths

Exceptional reasoning capabilities
Strong safety and alignment
Excellent for complex analysis
Superior code generation

Considerations

Only language models available
Higher cost per token
Need additional providers for other model types

🦙 Ollama (Local/Free)

Best for: Privacy, offline use, zero ongoing costs

Environment Setup

# Install Ollama locally
curl -fsSL https://ollama.ai/install.sh | sh

# Set API base (if running remotely)
export OLLAMA_API_BASE=http://localhost:11434

Recommended Models

Language: qwen3, gemma3, phi4, deepseek-r1, llama4
Embedding: mxbai-embed-large

Strengths

Completely free after setup
Full data privacy (local processing)
No internet dependency
Support for reasoning models

Considerations

Requires local hardware resources
Limited model variety compared to cloud providers
No TTS/STT capabilities

📖 Need detailed Ollama setup help? Check our comprehensive Ollama Setup Guide for network configuration, Docker deployment, troubleshooting, and optimization tips.

🎤 ElevenLabs

Best for: Premium voice synthesis and transcription

Environment Setup

export ELEVENLABS_API_KEY=your_api_key_here

Recommended Models

TTS: eleven_turbo_v2_5, eleven-monolingual-v1
STT: scribe_v1, eleven-stt-v1

Strengths

Highest quality voice synthesis
Excellent transcription accuracy
Multiple voice options
Good pricing for audio services

Considerations

Audio-only provider
Requires separate language/embedding providers

🔵 DeepSeek

Best for: Cost-effective language models with advanced reasoning

Environment Setup

export DEEPSEEK_API_KEY=your_api_key_here

Recommended Models

Language: deepseek-chat, deepseek-reasoner

Strengths

Excellent quality-to-price ratio
Advanced reasoning capabilities
Large context windows (64k+)
Strong performance on technical tasks

Considerations

Limited to language models only
Relatively new provider

🟡 Mistral

Best for: European alternative with competitive pricing

Environment Setup

export MISTRAL_API_KEY=your_api_key_here

Recommended Models

Language: mistral-medium-latest, ministral-8b-latest, magistral
Embedding: mistral-embed

Strengths

European data governance
Competitive pricing
Good reasoning capabilities
Strong multilingual support

Considerations

Limited model variety
No TTS/STT capabilities

⚡ xAI (Grok)

Best for: Cutting-edge intelligence and unrestricted responses

Environment Setup

export XAI_API_KEY=your_api_key_here

Recommended Models

Language: grok-3, grok-3-mini

Strengths

State-of-the-art reasoning
Less restrictive than other providers
Excellent for creative and analytical tasks
Real-time information access

Considerations

Premium pricing
Limited to language models
Relatively new provider

🚢 Voyage AI

Best for: Specialized high-performance embeddings

Environment Setup

export VOYAGE_API_KEY=your_api_key_here

Recommended Models

Embedding: voyage-3.5-lite

Strengths

Specialized in embeddings
Competitive performance
Good pricing for embeddings

Considerations

Embedding-only provider
Requires other providers for language models

🔧 OpenAI Compatible (LM Studio & Others)

Best for: Using any OpenAI-compatible API endpoint, including LM Studio

Environment Setup

export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
# Optional - only if your endpoint requires authentication
export OPENAI_COMPATIBLE_API_KEY=your_key_here

Common Use Cases

LM Studio: Run models locally with a familiar UI
Text Generation WebUI: Alternative local inference
Custom Endpoints: Any OpenAI-compatible API

Strengths

Use any OpenAI-compatible endpoint
Perfect for LM Studio users
Flexibility in model deployment
Works with local and remote endpoints

Considerations

Performance depends on your hardware (for local)
Model availability varies by endpoint
Some endpoints may not support all features

🧠 Reasoning Models

Open Notebook fully supports reasoning models that show their transparent thinking process. These models output their internal reasoning within <think> tags, which Open Notebook automatically handles.

How Reasoning Models Work

In Chat Interface

Reasoning content appears in a collapsible "🤔 AI Reasoning" section
Clean final answers are displayed prominently
Users can explore the AI's thought process

In Transformations

Clean output is stored in your notes
Reasoning is filtered out automatically
Professional results without internal monologue

In Search

Final answers remain clean and focused
Reasoning helps improve answer quality

Supported Reasoning Models

Model	Provider	Access	Quality
deepseek-r1	Ollama	Free	Exceptional
qwen3	Ollama	Free	Very Good
magistral	Mistral	Paid	Good
deepseek-reasoner	DeepSeek	Paid	Excellent

Benefits of Reasoning Models

Transparency: See exactly how AI reached conclusions
Trust: Understand the logic behind responses
Learning: Gain insights into AI problem-solving
Debugging: Identify where AI reasoning went wrong
Quality: Better answers through explicit reasoning

Recommended Configurations

🌟 Best Value (Mixed Providers)

Perfect balance of cost and performance

# Environment Variables
export OPENAI_API_KEY=your_key
export GEMINI_API_KEY=your_key
export OLLAMA_API_BASE=http://localhost:11434

Model Default	Recommended Model	Provider
Chat Model	`gpt-5-mini`	OpenAI
Tools Model	`gpt-5`	OpenAI
Transformations	`ministral-8b-latest`	Mistral
Large Context	`gemini-2.0-flash`	Google
Embedding	`text-embedding-3-small`	OpenAI
Text-to-Speech	`gemini-2.5-flash-preview-tts`	Google
Speech-to-Text	`whisper-1`	OpenAI

Monthly Cost Estimate: $20-50 for moderate usage

💰 Budget-Friendly (Mostly Free)

Great for getting started or keeping costs low

# Environment Variables
export OPENAI_API_KEY=your_key  # For STT/TTS only
export OLLAMA_API_BASE=http://localhost:11434

Model Default	Recommended Model	Provider
Chat Model	`qwen3`	Ollama
Tools Model	`qwen3`	Ollama
Transformations	`gemma3`	Ollama
Large Context	`qwen3`	Ollama
Embedding	`mxbai-embed-large`	Ollama
Text-to-Speech	`gpt-4o-mini-tts`	OpenAI
Speech-to-Text	`whisper-1`	OpenAI

Monthly Cost Estimate: $5-15 (only for audio services)

🚀 High Performance (Premium)

When quality is your top priority

# Environment Variables
export ANTHROPIC_API_KEY=your_key
export XAI_API_KEY=your_key
export GEMINI_API_KEY=your_key
export VOYAGE_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export OPENAI_API_KEY=your_key

Model Default	Recommended Model	Provider
Chat Model	`claude-3-5-sonnet-latest`	Anthropic
Tools Model	`grok-3`	xAI
Transformations	`grok-3-mini`	xAI
Large Context	`gemini-2.5-pro-preview-06-05`	Google
Embedding	`voyage-3.5-lite`	Voyage
Text-to-Speech	`eleven_turbo_v2_5`	ElevenLabs
Speech-to-Text	`whisper-1`	OpenAI

Monthly Cost Estimate: $100-300 for moderate usage

🏢 Single Provider (OpenAI)

Simplify billing and setup

# Environment Variables
export OPENAI_API_KEY=your_key

Model Default	Recommended Model	Provider
Chat Model	`gpt-5-mini`	OpenAI
Tools Model	`gpt-5`	OpenAI
Transformations	`gpt-5-mini`	OpenAI
Large Context	`gpt-5`	OpenAI
Embedding	`text-embedding-3-small`	OpenAI
Text-to-Speech	`gpt-4o-mini-tts`	OpenAI
Speech-to-Text	`whisper-1`	OpenAI

Monthly Cost Estimate: $30-80 for moderate usage

Setup Instructions

1. Environment Variables

Set up your API keys using environment variables. Here's the complete list:

# Core Providers
export OPENAI_API_KEY=your_key
export ANTHROPIC_API_KEY=your_key
export GEMINI_API_KEY=your_key

# Additional Language Providers
export MISTRAL_API_KEY=your_key
export DEEPSEEK_API_KEY=your_key
export XAI_API_KEY=your_key
export GROQ_API_KEY=your_key
export OPENROUTER_API_KEY=your_key

# Audio Providers
export ELEVENLABS_API_KEY=your_key

# Embedding Providers
export VOYAGE_API_KEY=your_key

# Local/Cloud Infrastructure
export OLLAMA_API_BASE=http://localhost:11434

# Azure OpenAI
export AZURE_OPENAI_API_KEY=your_key
export AZURE_OPENAI_ENDPOINT=your_endpoint
export AZURE_OPENAI_API_VERSION=2024-12-01-preview
export AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment

# Vertex AI
export VERTEX_PROJECT=your_project
export GOOGLE_APPLICATION_CREDENTIALS=./google-credentials.json
export VERTEX_LOCATION=us-east5

# OpenAI Compatible (LM Studio, etc.)
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
export OPENAI_COMPATIBLE_API_KEY=your_key  # Optional

2. Using Docker

For Docker deployments, pass environment variables:

docker run -d \
  --name open-notebook \
  -p 8502:8502 -p 5055:5055 \
  -v ./notebook_data:/app/data \
  -v ./surreal_single_data:/mydata \
  -e OPENAI_API_KEY=your_key \
  -e GEMINI_API_KEY=your_key \
  -e ANTHROPIC_API_KEY=your_key \
  lfnovo/open_notebook:v1-latest-single

3. Model Configuration

After setting environment variables:

Access Settings: Go to the Settings page in Open Notebook
Create Models: Add your models for each provider
Set Defaults: Configure default models for each task type
Test Models: Use the Playground to test model performance

4. Provider-Specific Setup

OpenAI

export OPENAI_API_KEY=sk-your-key-here

Get your API key from OpenAI Platform
Supports all model types
Immediate activation

Anthropic

export ANTHROPIC_API_KEY=sk-ant-your-key-here

Get your API key from Anthropic Console
Only language models available
Requires separate providers for other types

Google (Gemini)

export GEMINI_API_KEY=your-key-here

Get your API key from Google AI Studio
Excellent for large context and TTS
Cost-effective option

Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull qwen3
ollama pull mxbai-embed-large

# Set API base if remote
export OLLAMA_API_BASE=http://your-server:11434

ElevenLabs

export ELEVENLABS_API_KEY=your-key-here

Get your API key from ElevenLabs
Premium voice synthesis
Excellent for podcast generation

Advanced Configuration

Model Switching

You can switch models at runtime:

In Chat

Use the model selector dropdown
Changes apply to current conversation

In Transformations

Configure per-transformation defaults
Override on individual operations

In Settings

Change global defaults
Affects all new operations

Performance Optimization

For Speed

Use smaller models for simple tasks
Groq for fast inference
Local Ollama models for instant response

For Quality

Use premium models for complex reasoning
Claude 3.5 Sonnet for analysis
GPT-4o for tool calling

For Cost

Use cheaper models for transformations
Ollama for free processing
OpenAI mini models for everyday use

Context Management

Small Context (< 32k tokens)

Any modern language model
Faster processing
Lower costs

Medium Context (32k-128k tokens)

GPT-4o, Claude 3.5 Sonnet
Good balance of speed and capacity

Large Context (> 128k tokens)

Gemini models (up to 2M tokens)
Essential for large document processing
Higher costs but necessary for big content

Cost Optimization Strategies

1. Tiered Model Strategy

Use different models for different complexity levels:

Simple Tasks (70% of usage):
- Chat: gpt-5-mini or qwen3 (Ollama)
- Transformations: ministral-8b-latest

Complex Tasks (25% of usage):
- Analysis: claude-3-5-sonnet-latest
- Tool calling: gpt-5

Specialized Tasks (5% of usage):
- Large context: gemini-2.0-flash
- Premium TTS: eleven_turbo_v2_5

2. Smart Model Selection

For Transformations

Use smaller, cheaper models
Batch multiple operations
Cache results when possible

For Chat

Start with mini models
Escalate to premium for complex queries
Use reasoning models for transparency

For Embeddings

Use free Ollama models when possible
OpenAI for balanced performance
Voyage for specialized needs

3. Usage Monitoring

Track your usage patterns:

# Monitor API usage through provider dashboards
# Set up billing alerts
# Review monthly costs by model
# Optimize based on actual usage patterns

4. Free Tier Maximization

Ollama (Completely Free)

Language models for most tasks
Embeddings for search
No usage limits after setup

Free Tiers

OpenAI: $5 monthly credit for new users
Anthropic: Limited free tier
Google: Generous free tier for Gemini

5. Batch Processing

Process multiple items together:

Combine similar transformations
Use larger context windows efficiently
Reduce API call overhead

Troubleshooting

Common Issues

API Key Problems

# Check environment variables
echo $OPENAI_API_KEY

# Verify key format
# OpenAI: sk-...
# Anthropic: sk-ant-...
# Google: starts with alphanumeric

Model Not Found

Verify model name spelling
Check provider availability
Ensure API key has access to model

Rate Limiting

Implement retry logic
Use different models for different tasks
Monitor API quotas

High Costs

Review model usage patterns
Switch to cheaper models for simple tasks
Use free Ollama models where possible

Provider-Specific Issues

OpenAI

Rate limits: Upgrade to paid tier
Model access: Check account tier
Usage limits: Monitor dashboard

Anthropic

Beta access: Some models require approval
Rate limits: Request increase if needed
Region restrictions: Check availability

Google (Gemini)

Quota limits: Monitor usage
Model availability: Some models are preview
API key restrictions: Check project settings

Ollama

Model download: Ensure sufficient disk space
Performance: Check hardware requirements
Network: Verify base URL configuration

Performance Issues

Slow Responses

Use smaller models
Reduce context size
Consider local Ollama models

Poor Quality

Upgrade to premium models
Improve prompting
Use reasoning models for complex tasks

High Latency

Check network connectivity
Use geographically closer providers
Consider local Ollama deployment

Best Practices

1. Model Selection

Match Models to Tasks

Simple chat: Mini models
Complex analysis: Premium models
Transformations: Efficient models
Large documents: High-context models

Consider Cost vs. Quality

Use premium models only when necessary
Free models for development and testing
Monitor and optimize usage patterns

2. Security & Privacy

Sensitive Data

Use local Ollama models
Avoid sending sensitive content to cloud providers
Consider on-premises deployment

API Key Management

Use environment variables
Rotate keys regularly
Monitor usage for anomalies

3. Reliability

Fallback Strategies

Configure multiple providers
Have backup models ready
Implement retry logic

Testing

Test new models in playground
Validate performance before deployment
Monitor quality metrics

4. Optimization

Performance Tuning

Profile model performance
Optimize context size
Use appropriate model for each task

Cost Management

Set up billing alerts
Regular usage reviews
Optimize model selection

Getting Help

Community Support

Discord Server - Get help from the community
GitHub Issues - Report bugs and request features

Documentation

User Guide - Learn how to use Open Notebook
Getting Started - Quick setup guide
Troubleshooting - Solve common issues

Testing Your Setup

Use the Playground in Settings to test models
Try different model combinations
Monitor performance and costs

This comprehensive guide should help you make informed decisions about AI models for your Open Notebook deployment. Start with a simple configuration and gradually optimize based on your specific needs and usage patterns.

22 KiB Raw Blame History

AI Models & Providers

Quick Start

OpenAI Only (Simplest)

Mixed Providers (Best Value)

Understanding Model Types

🧠 Language Models

🔍 Embedding Models

🎙️ Text-to-Speech (TTS)

🎧 Speech-to-Text (STT)

Provider Support Matrix

Model Selection Guide

🎯 Selection Criteria

Provider Deep Dive

🟦 Google (Gemini)

🟢 OpenAI

🟣 Anthropic (Claude)

🦙 Ollama (Local/Free)

🎤 ElevenLabs

🔵 DeepSeek

🟡 Mistral

⚡ xAI (Grok)

🚢 Voyage AI

🔧 OpenAI Compatible (LM Studio & Others)

🧠 Reasoning Models

How Reasoning Models Work

Supported Reasoning Models

Benefits of Reasoning Models

Recommended Configurations

🌟 Best Value (Mixed Providers)

💰 Budget-Friendly (Mostly Free)

🚀 High Performance (Premium)

🏢 Single Provider (OpenAI)

Setup Instructions

1. Environment Variables

2. Using Docker

3. Model Configuration

4. Provider-Specific Setup

OpenAI

Anthropic

Google (Gemini)

Ollama (Local)

ElevenLabs

Advanced Configuration

Model Switching

Performance Optimization

Context Management

Cost Optimization Strategies

1. Tiered Model Strategy

2. Smart Model Selection

3. Usage Monitoring

4. Free Tier Maximization

5. Batch Processing

Troubleshooting

Common Issues

Provider-Specific Issues

Performance Issues

Best Practices

1. Model Selection

2. Security & Privacy

3. Reliability

4. Optimization

Getting Help

22 KiB

Raw Blame History