open-notebook/docs/3-USER-GUIDE/creating-podcasts.md
LUIS NOVO e13e4a2d8b docs: restructure documentation with new organized layout
- Replace old docs structure with new comprehensive documentation
- Organize into 8 major sections (0-START-HERE through 7-DEVELOPMENT)
- Convert CONFIGURATION.md, CONTRIBUTING.md, MAINTAINER_GUIDE.md to redirects
- Remove outdated MIGRATION.md and DESIGN_PRINCIPLES.md
- Fix all internal documentation links and cross-references
- Add progressive disclosure paths for different user types
- Include 44 focused guides covering all features
- Update README.md to remove v1.0 breaking changes notice
2026-01-03 20:10:24 -03:00

676 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Creating Podcasts - Turn Research into Audio
Podcasts let you consume your research passively. This guide covers the complete workflow from setup to download.
---
## Quick-Start: Your First Podcast (5 Minutes)
```
1. Go to your notebook
2. Click "Generate Podcast"
3. Select sources to include
4. Choose a speaker profile (or use default)
5. Click "Generate"
6. Wait 3-10 minutes (non-blocking)
7. Download MP3 when ready
8. Done!
```
That's the minimum. Let's make it better.
---
## Step-by-Step: The Complete Workflow
### Step 1: Prepare Your Notebook
```
Before generating, make sure:
✓ You have sources added
(At least 1-2 sources)
✓ Sources have been processed
(Green "Ready" status)
✓ Notes are organized
(If you want notes included)
✓ You know your message
(What's the main story?)
Typical preparation: 5-10 minutes
```
### Step 2: Choose Content
```
Click "Generate Podcast"
You'll see:
- List of all sources in notebook
- List of all notes
Select which to include:
☑ Paper A (primary source)
☑ Paper B (supporting source)
☐ Old note (not relevant)
✓ Analysis note (important)
What to include:
- Primary sources: Always include
- Supporting sources: Usually include
- Notes: Include your analysis/insights
- Everything: Can overload podcast
Recommended: 3-5 sources per podcast
```
### Step 3: Choose Episode Profile
An episode profile defines the structure and tone.
**Option A: Use Preset Profile**
```
Open Notebook provides templates:
Academic Presentation (Monologue)
├─ 1 speaker
├─ Tone: Educational
└─ Format: Expert explaining topic
Expert Interview (2-speaker)
├─ 2 speakers: Host + Expert
├─ Tone: Q&A, conversational
└─ Format: Interview with expert
Debate Format (2-speaker)
├─ 2 speakers: Pro vs. Con
├─ Tone: Discussion, disagreement
└─ Format: Debate about the topic
Panel Discussion (3-4 speaker)
├─ 3-4 speakers: Different perspectives
├─ Tone: Thoughtful discussion
└─ Format: Each brings different expertise
Solo Explanation (Monologue)
├─ 1 speaker
├─ Tone: Conversational, friendly
└─ Format: Personal explanation
```
**Pick based on your content:**
- One main idea → Academic Presentation
- You want to explain → Solo Explanation
- Two competing views → Debate Format
- Multiple perspectives → Panel Discussion
- Want to explore → Expert Interview
### Step 4: Customize Episode Profile (Optional)
If presets don't fit, customize:
```
Episode Profile
├─ Title: "AI Safety in 2026"
├─ Description: "Exploring current approaches"
├─ Length target: 20 minutes
├─ Tone: "Academic but accessible"
├─ Focus areas:
│ ├─ Main approaches to alignment
│ ├─ Pros and cons comparison
│ └─ Open questions
├─ Audience: "Researchers new to field"
└─ Format: "Debate between two perspectives"
How to set:
1. Click "Customize"
2. Edit each field
3. Click "Save Profile"
4. System uses your profile for outline generation
```
### Step 5: Create or Select Speakers
Speakers are the "voice" of your podcast.
**Option A: Use Preset Speakers**
```
Open Notebook provides templates:
"Expert Alex"
- Expertise: Deep knowledge
- Personality: Rigorous, patient
- TTS: OpenAI (clear voice)
"Curious Sam"
- Expertise: Curious newcomer
- Personality: Asks questions
- TTS: Google (natural voice)
"Skeptic Jordan"
- Expertise: Critical perspective
- Personality: Challenges assumptions
- TTS: ElevenLabs (warm voice)
For your first podcast: Use presets
For custom podcast: Create your own
```
**Option B: Create Custom Speakers**
```
Click "Add Speaker"
Fill in:
Name: "Dr. Research Expert"
Expertise:
"20 years in AI safety research,
deep knowledge of alignment approaches"
Personality:
"Rigorous, academic style,
explains clearly, asks good questions"
Voice Configuration:
- TTS Provider: OpenAI / Google / ElevenLabs / Local
- Voice selection: Choose from available voices
- Accent (optional): British / American / etc.
Example:
Name: Dr. Research Expert
Expertise: AI safety alignment research
Personality: Rigorous, academic but accessible
Voice: ElevenLabs - professional male voice
```
### Step 6: Generate Podcast
```
1. Review your setup:
Sources: ✓ Selected
Profile: ✓ Episode profile chosen
Speakers: ✓ Speakers configured
2. Click "Generate Podcast"
3. System begins:
- Analyzing your content
- Creating outline
- Writing dialogue
- Generating audio
- Mixing speakers
4. Status shows progress:
20% Outline generation
40% Dialogue writing
60% Audio synthesis
80% Mixing
100% Complete
Processing time:
- 5 minutes of content: 3-5 minutes
- 15 minutes of content: 5-10 minutes
- 30 minutes of content: 10-20 minutes
```
### Step 7: Review and Download
```
When complete:
Preview:
- Play audio sample
- Review transcript
- Check duration
Options:
✓ Download as MP3 - Save to computer
✓ Stream directly - Listen in browser
✓ Share link - Get shareable URL (if public)
✓ Regenerate - Try different speakers/profile
Download:
1. Click "Download as MP3"
2. Choose quality: 128kbps / 192kbps / 320kbps
3. Save file: podcast_[notebook]_[date].mp3
4. Listen!
```
---
## Understanding What Happens Behind the Scenes
### The Generation Pipeline
```
Stage 1: CONTENT ANALYSIS (1 minute)
Your sources → What's the main story?
→ Key themes?
→ Debate points?
Stage 2: OUTLINE CREATION (2-3 minutes)
Themes → Episode structure
→ Section breakdown
→ Talking points
Stage 3: DIALOGUE WRITING (2-3 minutes)
Outline → Convert to natural dialogue
→ Add speaker personalities
→ Create flow and transitions
Stage 4: AUDIO SYNTHESIS (3-5 minutes per speaker)
Script + Speaker → Text-to-speech
→ Individual audio files
→ High quality audio
Stage 5: MIXING & MASTERING (1-2 minutes)
Multiple audio → Combine speakers
→ Level audio
→ Add polish
→ Final MP3
Total: 10-20 minutes for typical podcast
```
---
## Text-to-Speech Providers
Different providers, different qualities.
### OpenAI (Recommended)
```
Voices: 5 options (Alloy, Echo, Fable, Onyx, Shimmer)
Quality: Good, natural sounding
Speed: Fast
Cost: ~$0.015 per minute
Best for: General purpose, natural speech
Example: "I have to say, the research shows..."
```
### Google TTS
```
Voices: Many options, various accents
Quality: Excellent, very natural
Speed: Fast
Cost: ~$0.004 per minute
Best for: High quality output, accents
Example: "The research demonstrates that..."
```
### ElevenLabs
```
Voices: 100+ voices, highly customizable
Quality: Exceptional, very expressive
Speed: Slower (5-10 seconds per phrase)
Cost: ~$0.10 per minute
Best for: Premium quality, emotional range
Example: [Can convey emotion and tone]
```
### Local TTS (Free)
```
Voices: Limited, basic options
Quality: Basic, robotic
Speed: Depends on hardware (slow)
Cost: Free (local processing)
Best for: Privacy, testing, offline use
Example: "The research shows..."
Privacy: Everything stays on your computer
```
### Which Provider to Choose?
```
For your first podcast: Google (quality/cost balance)
For privacy-sensitive: Local TTS (free, private)
For premium quality: ElevenLabs (best voices)
For budget: Google (cheapest quality option)
For speed: OpenAI (fast generation)
```
---
## Tips for Better Podcasts
### Choose Right Profile
```
Single source analysis → Academic Presentation
"Explaining one paper to someone new"
Comparing two approaches → Debate Format
"Pros and cons of different methods"
Multiple sources + insights → Panel Discussion
"Different experts discussing topic"
Narrative exploration → Expert Interview
"Host interviewing research expert"
Personal take → Solo Explanation
"You explaining your analysis"
```
### Create Good Speakers
```
Good Speaker:
✓ Clear expertise (know what they're talking about)
✓ Distinct personality (not generic)
✓ Good voice choice (matches personality)
✓ Realistic backstory (feels like real person)
Bad Speaker:
✗ Generic expertise ("good at research")
✗ No personality ("just reads")
✗ Mismatched voice (deep voice for young person)
✗ Contradicts personality (serious person uses casual voice)
```
### Focus Content
```
Better: Podcast on ONE specific topic
"How transformers work" (15 minutes, focused)
Worse: Podcast on everything
"All of AI 2025" (2 hours, unfocused)
Guideline:
- 5-10 minutes: One narrow topic
- 15-20 minutes: One broad topic
- 30+ minutes: Multiple related subtopics
Shorter is usually better for podcasts.
```
### Optimize Source Selection
```
Too much content:
"Here are all 20 papers"
→ Podcast becomes 2+ hours
→ Unfocused
→ Low quality
Right amount:
"Here are 3 key papers"
→ Podcast is 15-20 minutes
→ Focused
→ High quality
Rule: 3-5 sources per podcast
Remove long background papers
Keep focused on main topic
```
---
## Quality Troubleshooting
### Audio Sounds Robotic
**Problem**: TTS voice sounds unnatural
**Solutions**:
```
1. Switch provider: Try Google or ElevenLabs instead
2. Choose different voice: Some voices more natural
3. Shorter sentences: Very long sentences sound robotic
4. Adjust pacing: Ask for "natural, conversational pacing"
```
### Audio Sounds Unclear
**Problem**: Hard to understand what's being said
**Solutions**:
```
1. Re-generate with different speaker
2. Try different TTS provider
3. Use speakers with clear accents
4. Lower background noise (if any)
5. Increase speech rate (if too slow)
```
### Missing Content
**Problem**: Important information isn't in podcast
**Solutions**:
```
1. Include that source in content selection
2. Review generated outline (check before generating)
3. Regenerate with clearer profile instructions
4. Try different model (more thorough model)
```
### Speakers Don't Match
**Problem**: Speakers sound like same person
**Solutions**:
```
1. Choose different TTS providers (OpenAI + Google)
2. Choose very different voice options
3. Increase personality differences in profile
4. Try different speaker count (2 vs 3 vs 4)
```
### Generation Failed
**Problem**: "Podcast generation failed"
**Solutions**:
```
1. Check internet connection (especially TTS)
2. Try again (might be temporary issue)
3. Use local TTS (doesn't need internet)
4. Reduce source count (less to process)
5. Contact support if persistent
```
---
## Advanced: Multiple Podcasts from Same Research
You can generate different podcasts from one notebook:
```
Podcast 1: Overview
Profile: Academic Presentation
Sources: Papers A, B, C
Speakers: One expert
Length: 15 minutes
→ Use for "What's this about?" understanding
Podcast 2: Deep Dive
Profile: Expert Interview
Sources: Paper A (Full) + B, C (Summary)
Speakers: Expert + Interviewer
Length: 30 minutes
→ Use for detailed exploration
Podcast 3: Debate
Profile: Debate Format
Sources: Papers A vs B (different approaches)
Speakers: Pro-A speaker + Pro-B speaker
Length: 20 minutes
→ Use for comparing approaches
```
Each tells the same story from different angles.
---
## Exporting and Sharing
### Download MP3
```
1. Generation complete
2. Click "Download"
3. Choose quality:
- 128 kbps: Smallest file, lower quality
- 192 kbps: Balanced (recommended)
- 320 kbps: Highest quality, largest file
4. Save to computer
5. Use in podcast app, upload to platform, etc.
```
### Export Transcript
```
1. Click "Export Transcript"
2. Get full dialogue as text
3. Useful for:
- Blog post content
- Show notes
- Searchable text version
- Accessibility
```
### Share Link
```
If podcast is public:
1. Click "Share"
2. Get shareable link
3. Others can listen/download
4. Useful for:
- Sharing with team
- Public distribution
- Embedding on website
```
### Publish to Podcast Platforms
```
If you want to distribute (future feature):
1. Download MP3
2. Upload to platform (Spotify, Apple Podcasts, etc.)
3. Add metadata (title, description, episode notes)
4. Your research becomes a published podcast!
```
---
## Best Practices
### Before Generation
- [ ] Sources are processed and ready
- [ ] You've chosen content to include
- [ ] You have a clear episode profile
- [ ] Speakers are well-defined
- [ ] Content is focused (3-5 sources max)
### During Generation
- Don't close the browser (use background processing)
- Check back in 5-15 minutes
- Review transcript when complete
- Listen to sample before downloading
### After Generation
- [ ] Download MP3 to computer
- [ ] Save in organized folder
- [ ] Add metadata (title, description, date)
- [ ] Test listening in podcast app
- [ ] Share with colleagues for feedback
---
## Use Cases
### Academic Researcher
```
Podcast: Explaining your dissertation
Speakers: You + colleague
Content: Your papers + supporting research
Use: Share with advisors, test explanations
```
### Content Creator
```
Podcast: Research-to-podcast article
Speakers: Narrator + expert
Content: Articles you've researched
Use: Transform article into podcast version
```
### Team Research
```
Podcast: Weekly research updates
Speakers: Multiple team members
Content: This week's papers
Use: Team updates, knowledge sharing
```
### Learning/Teaching
```
Podcast: Teaching material
Speakers: Teacher + inquisitive student
Content: Textbook + examples
Use: Students learn while commuting
```
---
## Cost Breakdown Example
### Generate 15-minute podcast with ElevenLabs
```
Generation (outline + dialogue):
No charge (included in service)
Text-to-speech:
2 speakers × 15 minutes = 30 minutes TTS
ElevenLabs: $0.10 per minute
Cost: 30 × $0.10 = $3.00
Processing:
Included (no additional cost)
Total: $3.00 per podcast
Cheaper options:
With Google TTS: ~$0.12
With OpenAI: ~$0.45
With Local TTS: ~$0.00
```
---
## Summary: Podcasts as Research Tool
Podcasts transform how you consume research:
```
Before: Reading papers takes time, focus
After: Listen while commuting, exercising, doing chores
Before: Can't share complex research easily
After: Share audio of your analysis
Before: Different consumption styles isolated
After: Same research, multiple formats (read/listen)
```
Podcasts aren't just for entertainment—they're a tool for making research more accessible, shareable, and consumable.
That's why they're important for Open Notebook.