vrr/open-notebook

Fork 0

mirror of https://github.com/lfnovo/open-notebook.git synced 2026-04-28 11:30:00 +00:00

Luis Novo eac837d555

Development Build / extract-version (push) Has been cancelled

Details

Tests / Backend Tests (push) Has been cancelled

Details

Tests / Frontend Tests (push) Has been cancelled

Details

Development Build / build-regular (push) Has been cancelled

Details

Development Build / build-single (push) Has been cancelled

Details

Development Build / summary (push) Has been cancelled

Details

feat(podcasts): model registry integration, credential passthrough & new features (#632 )

* feat(podcasts): integrate model registry for profiles and credential passthrough

Replace loose provider/model string fields with record<model> references
in podcast profiles, enabling credential passthrough to podcast-creator.

Backend:
- EpisodeProfile: outline_llm, transcript_llm (record<model>) replace
  outline_provider/outline_model strings. New language field (BCP 47).
- SpeakerProfile: voice_model (record<model>) replaces tts_provider/
  tts_model strings. Per-speaker voice_model override support.
- Migration 14: schema changes making legacy fields optional, adding new
  record<model> fields.
- Data migration (migration.py): auto-converts legacy profiles to model
  registry references on startup. Idempotent.
- podcast_commands.py: resolves credentials for ALL profiles before
  calling podcast-creator.
- New /api/languages endpoint (pycountry + babel) with BCP 47 locale
  codes (pt-BR, en-US, etc.).

Frontend:
- Episode/speaker profile forms use ModelSelector instead of manual
  provider/model dropdowns.
- Language dropdown with BCP 47 codes in episode profile form.
- Per-speaker TTS voice model override in speaker profile form.
- "Templates" tab renamed to "Profiles".
- Setup required badge on unconfigured profiles.
- i18n updated across all 8 locales.

Closes #486, closes #552

* fix(i18n): remove unused legacy podcast provider/model keys

Remove 10 orphaned i18n keys across all 8 locales that were left behind
after replacing manual provider/model dropdowns with ModelSelector.

* fix: address review violations in podcast model registry

- P1: Remove profiles with failed model resolution from dicts to prevent
  podcast-creator validation errors on unrelated profiles
- P2: Use centralized QUERY_KEYS.languages instead of inline key
- P3: Fix ISO 639-1 → BCP 47 in model field description and CLAUDE.md
- P3: Update "templates" → "profiles" in locale string values (all 8)

* chore: bump version to 1.8.0

2026-02-27 11:06:47 -03:00

14 KiB

Raw Blame History

Creating Podcasts - Turn Research into Audio

Podcasts let you consume your research passively. This guide covers the complete workflow from setup to download.

Quick-Start: Your First Podcast (5 Minutes)

1. Go to your notebook
2. Click "Generate Podcast"
3. Select sources to include
4. Choose a speaker profile (or use default)
5. Click "Generate"
6. Wait 3-10 minutes (non-blocking)
7. Download MP3 when ready
8. Done!

That's the minimum. Let's make it better.

Step-by-Step: The Complete Workflow

Step 1: Prepare Your Notebook

Before generating, make sure:

✓ You have sources added
  (At least 1-2 sources)

✓ Sources have been processed
  (Green "Ready" status)

✓ Notes are organized
  (If you want notes included)

✓ You know your message
  (What's the main story?)

Typical preparation: 5-10 minutes

Step 2: Choose Content

Click "Generate Podcast"

You'll see:
- List of all sources in notebook
- List of all notes

Select which to include:
☑ Paper A (primary source)
☑ Paper B (supporting source)
☐ Old note (not relevant)
✓ Analysis note (important)

What to include:
- Primary sources: Always include
- Supporting sources: Usually include
- Notes: Include your analysis/insights
- Everything: Can overload podcast

Recommended: 3-5 sources per podcast

Step 3: Choose Episode Profile

An episode profile defines the structure and tone.

Option A: Use Preset Profile

Open Notebook provides preset profiles:

Academic Presentation (Monologue)
├─ 1 speaker
├─ Tone: Educational
└─ Format: Expert explaining topic

Expert Interview (2-speaker)
├─ 2 speakers: Host + Expert
├─ Tone: Q&A, conversational
└─ Format: Interview with expert

Debate Format (2-speaker)
├─ 2 speakers: Pro vs. Con
├─ Tone: Discussion, disagreement
└─ Format: Debate about the topic

Panel Discussion (3-4 speaker)
├─ 3-4 speakers: Different perspectives
├─ Tone: Thoughtful discussion
└─ Format: Each brings different expertise

Solo Explanation (Monologue)
├─ 1 speaker
├─ Tone: Conversational, friendly
└─ Format: Personal explanation

Pick based on your content:

One main idea → Academic Presentation
You want to explain → Solo Explanation
Two competing views → Debate Format
Multiple perspectives → Panel Discussion
Want to explore → Expert Interview

Step 4: Customize Episode Profile (Optional)

If presets don't fit, customize:

Episode Profile
├─ Title: "AI Safety in 2026"
├─ Description: "Exploring current approaches"
├─ Length target: 20 minutes
├─ Tone: "Academic but accessible"
├─ Focus areas:
│  ├─ Main approaches to alignment
│  ├─ Pros and cons comparison
│  └─ Open questions
├─ Audience: "Researchers new to field"
└─ Format: "Debate between two perspectives"

How to set:
1. Click "Customize"
2. Edit each field
3. Click "Save Profile"
4. System uses your profile for outline generation

Step 5: Create or Select Speakers

Speakers are the "voice" of your podcast.

Option A: Use Preset Speakers

Open Notebook provides preset profiles:

"Expert Alex"
- Expertise: Deep knowledge
- Personality: Rigorous, patient
- Voice Model: Selected from model registry

"Curious Sam"
- Expertise: Curious newcomer
- Personality: Asks questions
- Voice Model: Selected from model registry

"Skeptic Jordan"
- Expertise: Critical perspective
- Personality: Challenges assumptions
- Voice Model: Selected from model registry

For your first podcast: Use presets
For custom podcast: Create your own

Option B: Create Custom Speakers

Click "Add Speaker"

Fill in:

Name: "Dr. Research Expert"

Expertise:
"20 years in AI safety research,
 deep knowledge of alignment approaches"

Personality:
"Rigorous, academic style,
 explains clearly, asks good questions"

Voice Configuration:
- Voice Model: Select from model registry (e.g., OpenAI TTS, Google TTS, ElevenLabs)
- Voice: Choose from available voices for the selected model
- Per-speaker override: Each speaker can optionally use a different voice model

Credentials are automatically resolved from the model configuration.

Example:
Name: Dr. Research Expert
Expertise: AI safety alignment research
Personality: Rigorous, academic but accessible
Voice Model: ElevenLabs TTS (from registry), Voice: professional male

Step 6: Generate Podcast

1. Review your setup:
   Sources: ✓ Selected
   Profile: ✓ Episode profile chosen
   Speakers: ✓ Speakers configured

2. Click "Generate Podcast"

3. System begins:
   - Analyzing your content
   - Creating outline
   - Writing dialogue
   - Generating audio
   - Mixing speakers

4. Status shows progress:
   20% Outline generation
   40% Dialogue writing
   60% Audio synthesis
   80% Mixing
   100% Complete

Processing time:
- 5 minutes of content: 3-5 minutes
- 15 minutes of content: 5-10 minutes
- 30 minutes of content: 10-20 minutes

Step 7: Review and Download

When complete:

Preview:
- Play audio sample
- Review transcript
- Check duration

Options:
✓ Download as MP3 - Save to computer
✓ Stream directly - Listen in browser
✓ Share link - Get shareable URL (if public)
✓ Regenerate - Try different speakers/profile

Download:
1. Click "Download as MP3"
2. Choose quality: 128kbps / 192kbps / 320kbps
3. Save file: podcast_[notebook]_[date].mp3
4. Listen!

Understanding What Happens Behind the Scenes

The Generation Pipeline

Stage 1: CONTENT ANALYSIS (1 minute)
  Your sources → What's the main story?
               → Key themes?
               → Debate points?

Stage 2: OUTLINE CREATION (2-3 minutes)
  Themes → Episode structure
        → Section breakdown
        → Talking points

Stage 3: DIALOGUE WRITING (2-3 minutes)
  Outline → Convert to natural dialogue
         → Add speaker personalities
         → Create flow and transitions

Stage 4: AUDIO SYNTHESIS (3-5 minutes per speaker)
  Script + Speaker → Text-to-speech
                  → Individual audio files
                  → High quality audio

Stage 5: MIXING & MASTERING (1-2 minutes)
  Multiple audio → Combine speakers
               → Level audio
               → Add polish
               → Final MP3

Total: 10-20 minutes for typical podcast

Text-to-Speech Providers

Different providers, different qualities.

OpenAI (Recommended)

Voices: 5 options (Alloy, Echo, Fable, Onyx, Shimmer)
Quality: Good, natural sounding
Speed: Fast
Cost: ~$0.015 per minute
Best for: General purpose, natural speech
Example: "I have to say, the research shows..."

Google TTS

Voices: Many options, various accents
Quality: Excellent, very natural
Speed: Fast
Cost: ~$0.004 per minute
Best for: High quality output, accents
Example: "The research demonstrates that..."

ElevenLabs

Voices: 100+ voices, highly customizable
Quality: Exceptional, very expressive
Speed: Slower (5-10 seconds per phrase)
Cost: ~$0.10 per minute
Best for: Premium quality, emotional range
Example: [Can convey emotion and tone]

Local TTS (Free)

Voices: Limited, basic options
Quality: Basic, robotic
Speed: Depends on hardware (slow)
Cost: Free (local processing)
Best for: Privacy, testing, offline use
Example: "The research shows..."
Privacy: Everything stays on your computer

Which Provider to Choose?

For your first podcast: Google (quality/cost balance)
For privacy-sensitive: Local TTS (free, private)
For premium quality: ElevenLabs (best voices)
For budget: Google (cheapest quality option)
For speed: OpenAI (fast generation)

Tips for Better Podcasts

Choose Right Profile

Single source analysis → Academic Presentation
  "Explaining one paper to someone new"

Comparing two approaches → Debate Format
  "Pros and cons of different methods"

Multiple sources + insights → Panel Discussion
  "Different experts discussing topic"

Narrative exploration → Expert Interview
  "Host interviewing research expert"

Personal take → Solo Explanation
  "You explaining your analysis"

Create Good Speakers

Good Speaker:
✓ Clear expertise (know what they're talking about)
✓ Distinct personality (not generic)
✓ Good voice choice (matches personality)
✓ Realistic backstory (feels like real person)

Bad Speaker:
✗ Generic expertise ("good at research")
✗ No personality ("just reads")
✗ Mismatched voice (deep voice for young person)
✗ Contradicts personality (serious person uses casual voice)

Focus Content

Better: Podcast on ONE specific topic
  "How transformers work" (15 minutes, focused)

Worse: Podcast on everything
  "All of AI 2025" (2 hours, unfocused)

Guideline:
- 5-10 minutes: One narrow topic
- 15-20 minutes: One broad topic
- 30+ minutes: Multiple related subtopics

Shorter is usually better for podcasts.

Optimize Source Selection

Too much content:
  "Here are all 20 papers"
  → Podcast becomes 2+ hours
  → Unfocused
  → Low quality

Right amount:
  "Here are 3 key papers"
  → Podcast is 15-20 minutes
  → Focused
  → High quality

Rule: 3-5 sources per podcast
     Remove long background papers
     Keep focused on main topic

Quality Troubleshooting

Audio Sounds Robotic

Problem: TTS voice sounds unnatural

Solutions:

1. Switch provider: Try Google or ElevenLabs instead
2. Choose different voice: Some voices more natural
3. Shorter sentences: Very long sentences sound robotic
4. Adjust pacing: Ask for "natural, conversational pacing"

Audio Sounds Unclear

Problem: Hard to understand what's being said

Solutions:

1. Re-generate with different speaker
2. Try different TTS provider
3. Use speakers with clear accents
4. Lower background noise (if any)
5. Increase speech rate (if too slow)

Missing Content

Problem: Important information isn't in podcast

Solutions:

1. Include that source in content selection
2. Review generated outline (check before generating)
3. Regenerate with clearer profile instructions
4. Try different model (more thorough model)

Speakers Don't Match

Problem: Speakers sound like same person

Solutions:

1. Choose different voice models from the registry for each speaker
2. Choose very different voice options
3. Increase personality differences in profile
4. Try different speaker count (2 vs 3 vs 4)

Generation Failed

Problem: "Podcast generation failed"

Solutions:

1. Check internet connection (especially TTS)
2. Try again (might be temporary issue)
3. Use local TTS (doesn't need internet)
4. Reduce source count (less to process)
5. Contact support if persistent

Advanced: Multiple Podcasts from Same Research

You can generate different podcasts from one notebook:

Podcast 1: Overview
  Profile: Academic Presentation
  Sources: Papers A, B, C
  Speakers: One expert
  Length: 15 minutes

→ Use for "What's this about?" understanding

Podcast 2: Deep Dive
  Profile: Expert Interview
  Sources: Paper A (Full) + B, C (Summary)
  Speakers: Expert + Interviewer
  Length: 30 minutes

→ Use for detailed exploration

Podcast 3: Debate
  Profile: Debate Format
  Sources: Papers A vs B (different approaches)
  Speakers: Pro-A speaker + Pro-B speaker
  Length: 20 minutes

→ Use for comparing approaches

Each tells the same story from different angles.

Download MP3

1. Generation complete
2. Click "Download"
3. Choose quality:
   - 128 kbps: Smallest file, lower quality
   - 192 kbps: Balanced (recommended)
   - 320 kbps: Highest quality, largest file
4. Save to computer
5. Use in podcast app, upload to platform, etc.

Export Transcript

1. Click "Export Transcript"
2. Get full dialogue as text
3. Useful for:
   - Blog post content
   - Show notes
   - Searchable text version
   - Accessibility

If podcast is public:
1. Click "Share"
2. Get shareable link
3. Others can listen/download
4. Useful for:
   - Sharing with team
   - Public distribution
   - Embedding on website

Publish to Podcast Platforms

If you want to distribute (future feature):
1. Download MP3
2. Upload to platform (Spotify, Apple Podcasts, etc.)
3. Add metadata (title, description, episode notes)
4. Your research becomes a published podcast!

Best Practices

Before Generation

Sources are processed and ready
You've chosen content to include
You have a clear episode profile
Speakers are well-defined
Content is focused (3-5 sources max)

During Generation

Don't close the browser (use background processing)
Check back in 5-15 minutes
Review transcript when complete
Listen to sample before downloading

After Generation

Download MP3 to computer
Save in organized folder
Add metadata (title, description, date)
Test listening in podcast app
Share with colleagues for feedback

Use Cases

Academic Researcher

Podcast: Explaining your dissertation
Speakers: You + colleague
Content: Your papers + supporting research
Use: Share with advisors, test explanations

Content Creator

Podcast: Research-to-podcast article
Speakers: Narrator + expert
Content: Articles you've researched
Use: Transform article into podcast version

Team Research

Podcast: Weekly research updates
Speakers: Multiple team members
Content: This week's papers
Use: Team updates, knowledge sharing

Learning/Teaching

Podcast: Teaching material
Speakers: Teacher + inquisitive student
Content: Textbook + examples
Use: Students learn while commuting

Cost Breakdown Example

Generate 15-minute podcast with ElevenLabs

Generation (outline + dialogue):
  No charge (included in service)

Text-to-speech:
  2 speakers × 15 minutes = 30 minutes TTS
  ElevenLabs: $0.10 per minute
  Cost: 30 × $0.10 = $3.00

Processing:
  Included (no additional cost)

Total: $3.00 per podcast

Cheaper options:
  With Google TTS: ~$0.12
  With OpenAI: ~$0.45
  With Local TTS: ~$0.00

Summary: Podcasts as Research Tool

Podcasts transform how you consume research:

Before: Reading papers takes time, focus
After: Listen while commuting, exercising, doing chores

Before: Can't share complex research easily
After: Share audio of your analysis

Before: Different consumption styles isolated
After: Same research, multiple formats (read/listen)

Podcasts aren't just for entertainment—they're a tool for making research more accessible, shareable, and consumable.

That's why they're important for Open Notebook.

14 KiB Raw Blame History Unescape Escape

Creating Podcasts - Turn Research into Audio

Quick-Start: Your First Podcast (5 Minutes)

Step-by-Step: The Complete Workflow

Step 1: Prepare Your Notebook

Step 2: Choose Content

Step 3: Choose Episode Profile

Step 4: Customize Episode Profile (Optional)

Step 5: Create or Select Speakers

Step 6: Generate Podcast

Step 7: Review and Download

Understanding What Happens Behind the Scenes

The Generation Pipeline

Text-to-Speech Providers

OpenAI (Recommended)

Google TTS

ElevenLabs

Local TTS (Free)

Which Provider to Choose?

Tips for Better Podcasts

Choose Right Profile

Create Good Speakers

Focus Content

Optimize Source Selection

Quality Troubleshooting

Audio Sounds Robotic

Audio Sounds Unclear

Missing Content

Speakers Don't Match

Generation Failed

Advanced: Multiple Podcasts from Same Research

Exporting and Sharing

Download MP3

Export Transcript

Share Link

Publish to Podcast Platforms

Best Practices

Before Generation

During Generation

After Generation

Use Cases

Academic Researcher

Content Creator

Team Research

Learning/Teaching

Cost Breakdown Example

Generate 15-minute podcast with ElevenLabs

Summary: Podcasts as Research Tool

14 KiB

Raw Blame History