open-notebook/docs/2-CORE-CONCEPTS/podcasts-explained.md
Luis Novo eac837d555
Some checks failed
Development Build / extract-version (push) Has been cancelled
Tests / Backend Tests (push) Has been cancelled
Tests / Frontend Tests (push) Has been cancelled
Development Build / build-regular (push) Has been cancelled
Development Build / build-single (push) Has been cancelled
Development Build / summary (push) Has been cancelled
feat(podcasts): model registry integration, credential passthrough & new features (#632)
* feat(podcasts): integrate model registry for profiles and credential passthrough

Replace loose provider/model string fields with record<model> references
in podcast profiles, enabling credential passthrough to podcast-creator.

Backend:
- EpisodeProfile: outline_llm, transcript_llm (record<model>) replace
  outline_provider/outline_model strings. New language field (BCP 47).
- SpeakerProfile: voice_model (record<model>) replaces tts_provider/
  tts_model strings. Per-speaker voice_model override support.
- Migration 14: schema changes making legacy fields optional, adding new
  record<model> fields.
- Data migration (migration.py): auto-converts legacy profiles to model
  registry references on startup. Idempotent.
- podcast_commands.py: resolves credentials for ALL profiles before
  calling podcast-creator.
- New /api/languages endpoint (pycountry + babel) with BCP 47 locale
  codes (pt-BR, en-US, etc.).

Frontend:
- Episode/speaker profile forms use ModelSelector instead of manual
  provider/model dropdowns.
- Language dropdown with BCP 47 codes in episode profile form.
- Per-speaker TTS voice model override in speaker profile form.
- "Templates" tab renamed to "Profiles".
- Setup required badge on unconfigured profiles.
- i18n updated across all 8 locales.

Closes #486, closes #552

* fix(i18n): remove unused legacy podcast provider/model keys

Remove 10 orphaned i18n keys across all 8 locales that were left behind
after replacing manual provider/model dropdowns with ModelSelector.

* fix: address review violations in podcast model registry

- P1: Remove profiles with failed model resolution from dicts to prevent
  podcast-creator validation errors on unrelated profiles
- P2: Use centralized QUERY_KEYS.languages instead of inline key
- P3: Fix ISO 639-1 → BCP 47 in model field description and CLAUDE.md
- P3: Update "templates" → "profiles" in locale string values (all 8)

* chore: bump version to 1.8.0
2026-02-27 11:06:47 -03:00

426 lines
12 KiB
Markdown

# Podcasts Explained - Research as Audio Dialogue
Podcasts are Open Notebook's highest-level transformation: converting your research into audio dialogue for a different consumption pattern.
---
## Why Podcasts Matter
### The Problem
Research naturally accumulates as text: PDFs, articles, web pages, notes. This creates a friction point:
**To consume research, you must:**
- Sit down at a desk
- Focus intently
- Read actively
- Take notes
- Set aside dedicated time
**But much of life is passive time:**
- Commuting
- Exercising
- Doing dishes
- Driving
- Walking
- Idle moments
### The Solution
Convert your research into audio dialogue so you can consume it passively.
```
Before (Text-based):
Research pile → Must schedule reading time → Requires focus
After (Podcast):
Research pile → Podcast → Can listen while commuting
→ Absorb while exercising
→ Understand while walking
→ Engage without screen time
```
---
## What Makes It Special: Open Notebook vs. Competitors
### Google Notebook LM Podcasts
- **Fixed format**: 2 hosts, always conversational
- **Limited customization**: You can't choose who the "hosts" are
- **One TTS voice per speaker**: Can't customize voices
- **Only uses cloud services**: No local options
### Open Notebook Podcasts
- **Customizable format**: 1-4 speakers, you design them
- **Rich speaker profiles**: Create personas with backstories and expertise
- **Multiple TTS options**:
- OpenAI (natural, fast)
- Google TTS (high quality)
- ElevenLabs (beautiful voices, accents)
- Local TTS (privacy-first, no API calls)
- **Async generation**: Doesn't block your work
- **Full control**: Choose outline structure, tone, depth
---
## How Podcast Generation Works
### Stage 1: Content Selection
You choose what goes into the podcast:
```
Notebook content → Which sources? → Which notes?
→ Which topics to focus on?
→ Depth of coverage?
```
### Stage 2: Episode Profile
You define how you want the podcast structured:
```
Episode Profile
├─ Topic: "AI Safety Approaches"
├─ Length: 20 minutes
├─ Tone: Academic but accessible
├─ Format: Debate (2 speakers with opposing views)
├─ Audience: Researchers new to the field
└─ Focus areas: Main approaches, pros/cons, open questions
```
### Stage 3: Speaker Configuration
You create speaker personas (1-4 speakers):
```
Speaker 1: "Expert Alex"
├─ Expertise: "Deep knowledge of alignment research"
├─ Personality: "Rigorous, academic, patient with explanation"
├─ Accent: (Optional) "British English"
└─ Voice Model: Selected from model registry (e.g., OpenAI TTS)
└─ Optional per-speaker override of the episode's default voice model
Speaker 2: "Researcher Sam"
├─ Expertise: "Field observer, pragmatic perspective"
├─ Personality: "Curious, asks clarifying questions"
├─ Accent: "American English"
└─ Voice Model: Selected from model registry (e.g., ElevenLabs TTS)
```
### Stage 4: Outline Generation
System generates episode outline:
```
EPISODE: "AI Safety Approaches"
1. Introduction (2 min)
Alex: Introduces topic and speakers
Sam: What will we cover today?
2. Main Approaches (8 min)
Alex: Explains top 3 approaches
Sam: Asks about tradeoffs
3. Debate: Best approach? (6 min)
Alex: Advocates for approach A
Sam: Argues for approach B
4. Open Questions (3 min)
Both: What's unsolved?
5. Conclusion (1 min)
Recap and where to learn more
```
### Stage 5: Dialogue Generation
System generates dialogue based on outline:
```
Alex: "Today we're exploring three major approaches to AI alignment..."
Sam: "That's a great start. Can you break down what we mean by alignment?"
Alex: "Good question. Alignment means ensuring AI systems pursue the goals
we actually want them to pursue, not just what we literally asked for.
There's a classic example of a paperclip maximizer..."
Sam: "Interesting. So it's about solving the intention problem?"
Alex: "Exactly. And that's where the three approaches come in..."
```
### Stage 6: Text-to-Speech
System converts dialogue to audio using the voice models configured in the model registry. Credentials are automatically resolved from each model's configuration.
```
Alex's text → Voice model (from registry) → Alex's voice (audio file)
Sam's text → Voice model (from registry) → Sam's voice (audio file)
Audio files → Mix together → Final podcast MP3
```
---
## When Things Go Wrong: Failures & Retry
Podcast generation involves multiple steps (outline, transcript, TTS) and depends on external AI providers. Sometimes things fail.
### What Happens on Failure
When podcast generation fails (e.g., wrong model configured, API key expired, provider outage):
- The episode is marked as **Failed** with a red badge
- The **error message** from the AI provider is displayed so you can understand what went wrong
- No duplicate episodes are created — automatic retries are disabled to prevent confusion
### How to Retry a Failed Episode
1. Go to the podcast's **Episodes** tab
2. Find the failed episode — it shows a red "FAILED" badge and an error details box
3. Click the **Retry** button
4. The failed episode is deleted and a new generation job is submitted
5. The new episode appears with "pending" status
### Common Failure Causes
| Error | What to Do |
|-------|-----------|
| Invalid API key | Check Settings -> Credentials for the TTS and language model providers |
| Model not found | Verify the model exists in the model registry and has valid credentials configured |
| Rate limit exceeded | Wait a few minutes and retry |
| Provider unavailable | Check provider status page; retry later |
---
## Key Architecture Decisions
### 1. Asynchronous Processing
Podcasts are generated in the background. You upload → system processes → you download when ready.
**Why?** Podcast generation takes time (10+ minutes for a 30-minute episode). Blocking would lock up your interface.
### 2. Multi-Speaker Support
Unlike Google Notebook LM (always 2 hosts), you choose 1-4 speakers.
**Why?** Different discussions work better with different formats:
- Expert monologue (1 speaker)
- Interview (2 speakers: host + expert)
- Debate (2 speakers: opposing views)
- Panel discussion (3-4 speakers: different expertise)
### 3. Speaker Customization
You create rich speaker profiles, not just "Host A" and "Host B".
**Why?** Makes podcasts more engaging and authentic. Different speakers bring different perspectives.
### 4. Multiple TTS Providers
You're not locked into one voice provider.
**Why?**
- Cost optimization (some providers cheaper)
- Quality preferences (some voices more natural)
- Privacy options (local TTS for sensitive content)
- Accessibility (different accents, genders, styles)
### 5. Local TTS Option
Can generate podcasts entirely offline with local text-to-speech.
**Why?** For sensitive research, never send audio to external APIs.
---
## Use Cases Show Why This Matters
### Academic Publishing
```
Traditional: Academic paper → PDF
Problem: Hard to consume, linear reading required
Open Notebook:
Research materials → Podcast (expert explaining methodology)
→ Podcast (debate format: different interpretations)
→ Different consumption for different audiences
```
### Content Creation
```
Blog creator: Has research pile on a topic
Problem: Doesn't have time to write the article
Solution:
Add research → Create podcast → Transcribe → Becomes article
OR: Podcast BECOMES the content (upload to podcast platforms)
```
### Educational Content
```
Educator: Has reading materials for a course
Problem: Students don't read the papers
Solution:
Create podcast with expert explaining papers
Students listen → Better engagement → Discussions can reference podcast
```
### Market Research
```
Product manager: Has interviews with customers
Problem: Too many hours of audio to review
Solution:
Create podcast with debate format (customer perspective vs. team perspective)
Much more engaging than raw transcripts
```
### Knowledge Transfer
```
Domain expert: Leaving the organization
Problem: How to preserve expertise?
Solution:
Create expert-mode podcast explaining frameworks, decision-making, context
New team member listens, gets context faster than reading 100 documents
```
---
## The Difference: Active vs. Passive Learning
### Text-Based Research (Active)
- **Effort**: High (must focus, read, synthesize)
- **When**: Dedicated study time
- **Cost**: Time is expensive (can't multitask)
- **Best for**: Deep dives, precise information
- **Format**: Whatever you write (notes, articles, books)
### Audio Podcast (Passive)
- **Effort**: Low (just listen)
- **When**: Anywhere, anytime
- **Cost**: Low (can multitask)
- **Best for**: Overview, context, exploration
- **Format**: Dialogue (more engaging than narration)
**They complement each other:**
1. **First encounter**: Listen to podcast (passive, get context)
2. **Deep dive**: Read source materials (active, precise)
3. **Mastery**: Both together (understand big picture + details)
---
## How Podcasts Fit Into Your Workflow
```
1. Build notebook (add sources)
2. Apply transformations (extract insights)
3. Chat/Ask (explore content)
4. Decide on podcast
├─→ Create speaker profiles
├─→ Define episode profile
├─→ Configure voice models (from model registry)
└─→ Generate podcast
5. Listen while commuting/exercising
6. Reference sources for deep dive
7. Repeat for different formats/speakers/focus
```
---
## Advanced: Multiple Podcasts from Same Research
You can create different podcasts from the same sources:
### Example: AI Safety Research
```
Podcast 1: "Expert Monologue"
Speaker: Researcher explaining field
Format: Educational, comprehensive
Audience: Students new to field
Podcast 2: "Debate Format"
Speakers: Optimist vs. skeptic
Format: Discussion of tradeoffs
Audience: Advanced researchers
Podcast 3: "Interview Format"
Speakers: Journalist + expert
Format: Q&A about practical applications
Audience: Industry practitioners
```
Each tells the same story from different angles.
---
## Privacy & Data Considerations
### Where Your Data Goes
**Option 1: Cloud TTS (Faster, Higher Quality)**
```
Your outline → API call to TTS provider
→ Audio returned
→ Stored in your notebook
Provider sees: Your outlined script (not raw sources)
Privacy level: Medium (outline is shared, sources aren't)
```
**Option 2: Local TTS (Slower, Maximum Privacy)**
```
Your outline → Local TTS engine (runs on your machine)
→ Audio generated locally
→ Stored in your notebook
Provider sees: Nothing
Privacy level: Maximum (everything local)
```
### Recommendation
- **Sensitive research**: Use local TTS, no API calls
- **Less sensitive**: Use ElevenLabs or Google (both handle audio data professionally)
- **Mixed**: Use local TTS for speakers reading sensitive content
---
## Cost Considerations
### Cloud TTS Costs
| Provider | Cost | Quality | Speed |
|----------|------|---------|-------|
| OpenAI | ~$0.015 per minute | Good | Fast |
| Google | ~$0.004 per minute | Excellent | Fast |
| ElevenLabs | ~$0.10 per minute | Exceptional | Medium |
| Local TTS | Free | Basic | Slow |
A 30-minute podcast costs:
- OpenAI: ~$0.45
- Google: ~$0.12
- ElevenLabs: ~$3.00
- Local: Free (but slow)
---
## Summary: Why Podcasts Are Special
**Podcasts transform your research consumption:**
| Aspect | Text | Podcast |
|--------|------|---------|
| **How consumed?** | Active reading | Passive listening |
| **Where consumed?** | Desk | Anywhere |
| **Multitasking** | Hard | Easy |
| **Time commitment** | Scheduled | Flexible |
| **Format** | Whatever | Natural dialogue |
| **Engagement** | Academic | Conversational |
| **Accessibility** | Text-based | Audio-based |
**In Open Notebook specifically:**
- **Full customization** — you create speakers and format
- **Privacy options** — local TTS for sensitive content
- **Cost control** — choose TTS provider based on budget
- **Non-blocking** — generates in background
- **Multiple versions** — create different podcasts from same research
This is why podcasts matter: they change *when* and *how* you can consume your research.