12 KiB
AI Context & RAG - How Open Notebook Uses Your Research
Open Notebook uses different approaches to make AI models aware of your research depending on the feature. This section explains RAG (used in Ask) and full-content context (used in Chat).
The Problem: Making AI Aware of Your Data
Traditional Approaches (and their problems)
Option 1: Fine-Tuning
- Train the model on your data
- Pro: Model becomes specialized
- Con: Expensive, slow, permanent (can't unlearn)
Option 2: Send Everything to Cloud
- Upload all your data to ChatGPT/Claude API
- Pro: Works well, fast
- Con: Privacy nightmare, data leaves your control, expensive
Option 3: Ignore Your Data
- Just use the base model without your research
- Pro: Private, free
- Con: AI doesn't know anything about your specific topic
Open Notebook's Dual Approach
For Chat: Sends the entire selected content to the LLM
- Simple and transparent: You select sources, they're sent in full
- Maximum context: AI sees everything you choose
- You control which sources are included
For Ask (RAG): Retrieval-Augmented Generation
- RAG = Retrieval-Augmented Generation
- The insight: Search your content, find relevant pieces, send only those
- Automatic: AI decides what's relevant based on your question
How RAG Works: Three Stages
Stage 1: Content Preparation
When you upload a source, Open Notebook prepares it for retrieval:
1. EXTRACT TEXT
PDF → text
URL → webpage text
Audio → transcribed text
Video → subtitles + transcription
2. CHUNK INTO PIECES
Long documents → break into ~500-word chunks
Why? AI context has limits; smaller pieces are more precise
3. CREATE EMBEDDINGS
Each chunk → semantic vector (numbers representing meaning)
Why? Allows finding chunks by similarity, not just keywords
4. STORE IN DATABASE
Chunks + embeddings + metadata → searchable storage
Example:
Source: "AI Safety Research 2026" (50-page PDF)
↓
Extracted: 50 pages of text
↓
Chunked: 150 chunks (~500 words each)
↓
Embedded: Each chunk gets a vector (1536 numbers for OpenAI)
↓
Stored: Ready for search
Stage 2: Query Time (What You Search For)
When you ask a question, the system finds relevant content:
1. YOU ASK A QUESTION
"What does the paper say about alignment?"
2. SYSTEM CONVERTS QUESTION TO EMBEDDING
Your question → vector (same way chunks are vectorized)
3. SIMILARITY SEARCH
Find chunks most similar to your question
(using vector math, not keyword matching)
4. RETURN TOP RESULTS
Usually top 5-10 most similar chunks
5. YOU GET BACK
✓ The relevant chunks
✓ Where they came from (sources + page numbers)
✓ Relevance scores
Example:
Q: "What does the paper say about alignment?"
↓
Q vector: [0.23, -0.51, 0.88, ..., 0.12]
↓
Search: Compare to all chunk vectors
↓
Results:
- Chunk 47 (alignment section): similarity 0.94
- Chunk 63 (safety approaches): similarity 0.88
- Chunk 12 (related work): similarity 0.71
Stage 3: Augmentation (How AI Uses It)
Now you have the relevant pieces. The AI uses them:
SYSTEM BUILDS A PROMPT:
"You are an AI research assistant.
The user has the following research materials:
[CHUNK 47 CONTENT]
[CHUNK 63 CONTENT]
User question: 'What does the paper say about alignment?'
Answer based on the above materials."
AI RESPONDS:
"Based on the research materials, the paper approaches
alignment through [pulls from chunks] and emphasizes
[pulls from chunks]..."
SYSTEM ADDS CITATIONS:
"- See research materials page 15 for approach details
- See research materials page 23 for emphasis on X"
Two Search Modes: Exact vs. Semantic
Open Notebook provides two different search strategies for different goals.
1. Text Search (Keyword Matching)
How it works:
- Uses BM25 ranking (the same algorithm Google uses)
- Finds chunks containing your keywords
- Ranks by relevance (how often keywords appear, position, etc.)
When to use:
- "I remember the exact phrase 'X' and want to find it"
- "I'm looking for a specific name or number"
- "I need the exact quote"
Example:
Search: "transformer architecture"
Results:
1. Chunk with "transformer architecture" 3 times
2. Chunk with "transformer" and "architecture" separately
3. Chunk with "transformer-based models"
2. Vector Search (Semantic Similarity)
How it works:
- Converts your question to a vector (number embedding)
- Finds chunks with similar vectors
- No keywords needed—finds conceptually similar content
When to use:
- "Find content about X (without saying exact words)"
- "I'm exploring a concept"
- "Find similar ideas even if worded differently"
Example:
Search: "what's the mechanism for model understanding?"
Results (no "understanding" in any chunk):
1. Chunk about interpretability and mechanistic analysis
2. Chunk about feature analysis
3. Chunk about attention mechanisms
Why? The vectors are semantically similar to your concept.
Context Management: Your Control Panel
Here's where Open Notebook is different: You decide what the AI sees.
The Three Levels
| Level | What's Shared | Example Cost | Privacy | Use Case |
|---|---|---|---|---|
| Full Content | Complete source text | 10,000 tokens | Low | Detailed analysis, close reading |
| Summary Only | AI-generated summary | 2,000 tokens | High | Background material, references |
| Not in Context | Nothing | 0 tokens | Max | Confidential, irrelevant, or archived |
How It Works
Full Content:
You: "What's the methodology in paper A?"
System:
- Searches paper A
- Retrieves full paper content (or large chunks)
- Sends to AI: "Here's paper A. Answer about methodology."
- AI analyzes complete content
- Result: Detailed, precise answer
Summary Only:
You: "I want to chat using paper A and B"
System:
- For Paper A: Sends AI-generated summary (not full text)
- For Paper B: Sends full content (detailed analysis)
- AI sees 2 sources but in different detail levels
- Result: Uses summaries for context, details for focused content
Not in Context:
You: "I have 10 sources but only want 5 in context"
System:
- Paper A-E: In context (sent to AI)
- Paper F-J: Not in context (AI can't see them, doesn't search them)
- AI never knows these 5 sources exist
- Result: Tight, focused context
Why This Matters
Privacy: You control what leaves your system
Scenario: Confidential company docs + public research
Control: Public research in context → Confidential docs excluded
Result: AI never sees confidential content
Cost: You control token usage
Scenario: 100 sources for background + 5 for detailed analysis
Control: Full content for 5 detailed, summaries for 95 background
Result: 80% lower token cost than sending everything
Quality: You control what the AI focuses on
Scenario: 20 sources, question requires deep analysis
Control: Full content for relevant source, exclude others
Result: AI doesn't get distracted; gives better answer
The Difference: Chat vs. Ask
IMPORTANT: These use completely different approaches!
Chat: Full-Content Context (NO RAG)
How it works:
YOU:
1. Select which sources to include in context
2. Set context level (full/summary/excluded)
3. Ask question
SYSTEM:
- Takes ALL selected sources (respecting context levels)
- Sends the ENTIRE content to the LLM at once
- NO search, NO retrieval, NO chunking
- AI sees everything you selected
AI:
- Responds based on the full content you provided
- Can reference any part of selected sources
- Conversational: context stays for follow-ups
Use this when:
- You know which sources are relevant
- You want conversational back-and-forth
- You want AI to see the complete context
- You're doing close reading or analysis
Advantages:
- Simple and transparent
- AI sees everything (no missed content)
- Conversational flow
Limitations:
- Limited by LLM context window
- You must manually select relevant sources
- Sends more tokens (higher cost with many sources)
Ask: RAG - Automatic Retrieval
How it works:
YOU:
Ask one complex question
SYSTEM:
1. Analyzes your question
2. Searches across ALL your sources automatically
3. Finds relevant chunks using vector similarity
4. Retrieves only the most relevant pieces
5. Sends ONLY those chunks to the LLM
6. Synthesizes into comprehensive answer
AI:
- Sees ONLY the retrieved chunks (not full sources)
- Answers based on what was found to be relevant
- One-shot answer (not conversational)
Use this when:
- You have many sources and don't know which are relevant
- You want the AI to search automatically
- You need a comprehensive answer to a complex question
- You want to minimize tokens sent to LLM
Advantages:
- Automatic search (you don't pick sources)
- Works across many sources at once
- Cost-effective (sends only relevant chunks)
Limitations:
- Not conversational (single question/answer)
- AI only sees retrieved chunks (might miss context)
- Search quality depends on how well question matches content
What This Means: Privacy by Design
Open Notebook's RAG approach gives you something you don't get with ChatGPT or Claude directly:
You control the boundary between:
- What stays private (on your system)
- What goes to AI (explicitly chosen)
- What the AI can see (context levels)
The Audit Trail
Because everything is retrieved explicitly, you can ask:
- "Which sources did the AI use for this answer?" → See citations
- "What exactly did the AI see?" → See chunks in context level
- "Is the AI's claim actually in my sources?" → Verify citation
This prevents hallucinations or misrepresentation better than most systems.
How Embeddings Work (Simplified)
The magic of semantic search comes from embeddings. Here's the intuition:
The Idea
Instead of storing text, store it as a list of numbers (vectors) that represent "meaning."
Chunk: "The transformer uses attention mechanisms"
Vector: [0.23, -0.51, 0.88, 0.12, ..., 0.34]
(1536 numbers for OpenAI)
Another chunk: "Attention allows models to focus on relevant parts"
Vector: [0.24, -0.48, 0.87, 0.15, ..., 0.35]
(similar numbers = similar meaning!)
Why This Works
Words that are semantically similar produce similar vectors. So:
- "alignment" and "interpretability" have similar vectors
- "transformer" and "attention" have related vectors
- "cat" and "dog" are more similar than "cat" and "radiator"
How Search Works
Your question: "How do models understand their decisions?"
Question vector: [0.25, -0.50, 0.86, 0.14, ..., 0.33]
Compare to all stored vectors. Find the most similar:
- Chunk about interpretability: similarity 0.94
- Chunk about explainability: similarity 0.91
- Chunk about feature attribution: similarity 0.88
Return the top matches.
This is why semantic search finds conceptually similar content even when words are different.
Key Design Decisions
1. Search, Don't Train
Why? Fine-tuning is slow and permanent. Search is flexible and reversible.
2. Explicit Retrieval, Not Implicit Knowledge
Why? You can verify what the AI saw. You have audit trails. You control what leaves your system.
3. Multiple Search Types
Why? Different questions need different search (keyword vs. semantic). Giving you both is more powerful.
4. Context as a Permission System
Why? Not everything you save needs to reach AI. You control granularly.
Summary
Open Notebook gives you two ways to work with AI:
Chat (Full-Content)
- Sends entire selected sources to LLM
- Manual control: you pick sources
- Conversational: back-and-forth dialog
- Transparent: you know exactly what AI sees
- Best for: focused analysis, close reading
Ask (RAG)
- Searches and retrieves relevant chunks automatically
- Automatic: AI finds what's relevant
- One-shot: single comprehensive answer
- Efficient: sends only relevant pieces
- Best for: broad questions across many sources
Both approaches:
- Keep your data private (doesn't leave your system by default)
- Give you control (you choose which features to use)
- Create audit trails (citations show what was used)
- Support multiple AI providers
Coming Soon: The community is working on adding RAG capabilities to Chat as well, giving you the best of both worlds.