- Replace old docs structure with new comprehensive documentation - Organize into 8 major sections (0-START-HERE through 7-DEVELOPMENT) - Convert CONFIGURATION.md, CONTRIBUTING.md, MAINTAINER_GUIDE.md to redirects - Remove outdated MIGRATION.md and DESIGN_PRINCIPLES.md - Fix all internal documentation links and cross-references - Add progressive disclosure paths for different user types - Include 44 focused guides covering all features - Update README.md to remove v1.0 breaking changes notice
11 KiB
Podcasts Explained - Research as Audio Dialogue
Podcasts are Open Notebook's highest-level transformation: converting your research into audio dialogue for a different consumption pattern.
Why Podcasts Matter
The Problem
Research naturally accumulates as text: PDFs, articles, web pages, notes. This creates a friction point:
To consume research, you must:
- Sit down at a desk
- Focus intently
- Read actively
- Take notes
- Set aside dedicated time
But much of life is passive time:
- Commuting
- Exercising
- Doing dishes
- Driving
- Walking
- Idle moments
The Solution
Convert your research into audio dialogue so you can consume it passively.
Before (Text-based):
Research pile → Must schedule reading time → Requires focus
After (Podcast):
Research pile → Podcast → Can listen while commuting
→ Absorb while exercising
→ Understand while walking
→ Engage without screen time
What Makes It Special: Open Notebook vs. Competitors
Google Notebook LM Podcasts
- Fixed format: 2 hosts, always conversational
- Limited customization: You can't choose who the "hosts" are
- One TTS voice per speaker: Can't customize voices
- Only uses cloud services: No local options
Open Notebook Podcasts
- Customizable format: 1-4 speakers, you design them
- Rich speaker profiles: Create personas with backstories and expertise
- Multiple TTS options:
- OpenAI (natural, fast)
- Google TTS (high quality)
- ElevenLabs (beautiful voices, accents)
- Local TTS (privacy-first, no API calls)
- Async generation: Doesn't block your work
- Full control: Choose outline structure, tone, depth
How Podcast Generation Works
Stage 1: Content Selection
You choose what goes into the podcast:
Notebook content → Which sources? → Which notes?
→ Which topics to focus on?
→ Depth of coverage?
Stage 2: Episode Profile
You define how you want the podcast structured:
Episode Profile
├─ Topic: "AI Safety Approaches"
├─ Length: 20 minutes
├─ Tone: Academic but accessible
├─ Format: Debate (2 speakers with opposing views)
├─ Audience: Researchers new to the field
└─ Focus areas: Main approaches, pros/cons, open questions
Stage 3: Speaker Configuration
You create speaker personas (1-4 speakers):
Speaker 1: "Expert Alex"
├─ Expertise: "Deep knowledge of alignment research"
├─ Personality: "Rigorous, academic, patient with explanation"
├─ Accent: (Optional) "British English"
└─ TTS Voice: "OpenAI Onyx" (or ElevenLabs, Google, etc.)
Speaker 2: "Researcher Sam"
├─ Expertise: "Field observer, pragmatic perspective"
├─ Personality: "Curious, asks clarifying questions"
├─ Accent: "American English"
└─ TTS Voice: "ElevenLabs - thoughtful"
Stage 4: Outline Generation
System generates episode outline:
EPISODE: "AI Safety Approaches"
1. Introduction (2 min)
Alex: Introduces topic and speakers
Sam: What will we cover today?
2. Main Approaches (8 min)
Alex: Explains top 3 approaches
Sam: Asks about tradeoffs
3. Debate: Best approach? (6 min)
Alex: Advocates for approach A
Sam: Argues for approach B
4. Open Questions (3 min)
Both: What's unsolved?
5. Conclusion (1 min)
Recap and where to learn more
Stage 5: Dialogue Generation
System generates dialogue based on outline:
Alex: "Today we're exploring three major approaches to AI alignment..."
Sam: "That's a great start. Can you break down what we mean by alignment?"
Alex: "Good question. Alignment means ensuring AI systems pursue the goals
we actually want them to pursue, not just what we literally asked for.
There's a classic example of a paperclip maximizer..."
Sam: "Interesting. So it's about solving the intention problem?"
Alex: "Exactly. And that's where the three approaches come in..."
Stage 6: Text-to-Speech
System converts dialogue to audio:
Alex's text → OpenAI TTS → Alex's voice (audio file)
Sam's text → ElevenLabs TTS → Sam's voice (audio file)
Audio files → Mix together → Final podcast MP3
Key Architecture Decisions
1. Asynchronous Processing
Podcasts are generated in the background. You upload → system processes → you download when ready.
Why? Podcast generation takes time (10+ minutes for a 30-minute episode). Blocking would lock up your interface.
2. Multi-Speaker Support
Unlike Google Notebook LM (always 2 hosts), you choose 1-4 speakers.
Why? Different discussions work better with different formats:
- Expert monologue (1 speaker)
- Interview (2 speakers: host + expert)
- Debate (2 speakers: opposing views)
- Panel discussion (3-4 speakers: different expertise)
3. Speaker Customization
You create rich speaker profiles, not just "Host A" and "Host B".
Why? Makes podcasts more engaging and authentic. Different speakers bring different perspectives.
4. Multiple TTS Providers
You're not locked into one voice provider.
Why?
- Cost optimization (some providers cheaper)
- Quality preferences (some voices more natural)
- Privacy options (local TTS for sensitive content)
- Accessibility (different accents, genders, styles)
5. Local TTS Option
Can generate podcasts entirely offline with local text-to-speech.
Why? For sensitive research, never send audio to external APIs.
Use Cases Show Why This Matters
Academic Publishing
Traditional: Academic paper → PDF
Problem: Hard to consume, linear reading required
Open Notebook:
Research materials → Podcast (expert explaining methodology)
→ Podcast (debate format: different interpretations)
→ Different consumption for different audiences
Content Creation
Blog creator: Has research pile on a topic
Problem: Doesn't have time to write the article
Solution:
Add research → Create podcast → Transcribe → Becomes article
OR: Podcast BECOMES the content (upload to podcast platforms)
Educational Content
Educator: Has reading materials for a course
Problem: Students don't read the papers
Solution:
Create podcast with expert explaining papers
Students listen → Better engagement → Discussions can reference podcast
Market Research
Product manager: Has interviews with customers
Problem: Too many hours of audio to review
Solution:
Create podcast with debate format (customer perspective vs. team perspective)
Much more engaging than raw transcripts
Knowledge Transfer
Domain expert: Leaving the organization
Problem: How to preserve expertise?
Solution:
Create expert-mode podcast explaining frameworks, decision-making, context
New team member listens, gets context faster than reading 100 documents
The Difference: Active vs. Passive Learning
Text-Based Research (Active)
- Effort: High (must focus, read, synthesize)
- When: Dedicated study time
- Cost: Time is expensive (can't multitask)
- Best for: Deep dives, precise information
- Format: Whatever you write (notes, articles, books)
Audio Podcast (Passive)
- Effort: Low (just listen)
- When: Anywhere, anytime
- Cost: Low (can multitask)
- Best for: Overview, context, exploration
- Format: Dialogue (more engaging than narration)
They complement each other:
- First encounter: Listen to podcast (passive, get context)
- Deep dive: Read source materials (active, precise)
- Mastery: Both together (understand big picture + details)
How Podcasts Fit Into Your Workflow
1. Build notebook (add sources)
↓
2. Apply transformations (extract insights)
↓
3. Chat/Ask (explore content)
↓
4. Decide on podcast
├─→ Create speaker profiles
├─→ Define episode profile
├─→ Choose TTS provider
└─→ Generate podcast
↓
5. Listen while commuting/exercising
↓
6. Reference sources for deep dive
↓
7. Repeat for different formats/speakers/focus
Advanced: Multiple Podcasts from Same Research
You can create different podcasts from the same sources:
Example: AI Safety Research
Podcast 1: "Expert Monologue"
Speaker: Researcher explaining field
Format: Educational, comprehensive
Audience: Students new to field
Podcast 2: "Debate Format"
Speakers: Optimist vs. skeptic
Format: Discussion of tradeoffs
Audience: Advanced researchers
Podcast 3: "Interview Format"
Speakers: Journalist + expert
Format: Q&A about practical applications
Audience: Industry practitioners
Each tells the same story from different angles.
Privacy & Data Considerations
Where Your Data Goes
Option 1: Cloud TTS (Faster, Higher Quality)
Your outline → API call to TTS provider
→ Audio returned
→ Stored in your notebook
Provider sees: Your outlined script (not raw sources)
Privacy level: Medium (outline is shared, sources aren't)
Option 2: Local TTS (Slower, Maximum Privacy)
Your outline → Local TTS engine (runs on your machine)
→ Audio generated locally
→ Stored in your notebook
Provider sees: Nothing
Privacy level: Maximum (everything local)
Recommendation
- Sensitive research: Use local TTS, no API calls
- Less sensitive: Use ElevenLabs or Google (both handle audio data professionally)
- Mixed: Use local TTS for speakers reading sensitive content
Cost Considerations
Cloud TTS Costs
| Provider | Cost | Quality | Speed |
|---|---|---|---|
| OpenAI | ~$0.015 per minute | Good | Fast |
| ~$0.004 per minute | Excellent | Fast | |
| ElevenLabs | ~$0.10 per minute | Exceptional | Medium |
| Local TTS | Free | Basic | Slow |
A 30-minute podcast costs:
- OpenAI: ~$0.45
- Google: ~$0.12
- ElevenLabs: ~$3.00
- Local: Free (but slow)
Summary: Why Podcasts Are Special
Podcasts transform your research consumption:
| Aspect | Text | Podcast |
|---|---|---|
| How consumed? | Active reading | Passive listening |
| Where consumed? | Desk | Anywhere |
| Multitasking | Hard | Easy |
| Time commitment | Scheduled | Flexible |
| Format | Whatever | Natural dialogue |
| Engagement | Academic | Conversational |
| Accessibility | Text-based | Audio-based |
In Open Notebook specifically:
- Full customization — you create speakers and format
- Privacy options — local TTS for sensitive content
- Cost control — choose TTS provider based on budget
- Non-blocking — generates in background
- Multiple versions — create different podcasts from same research
This is why podcasts matter: they change when and how you can consume your research.