- Replace old docs structure with new comprehensive documentation - Organize into 8 major sections (0-START-HERE through 7-DEVELOPMENT) - Convert CONFIGURATION.md, CONTRIBUTING.md, MAINTAINER_GUIDE.md to redirects - Remove outdated MIGRATION.md and DESIGN_PRINCIPLES.md - Fix all internal documentation links and cross-references - Add progressive disclosure paths for different user types - Include 44 focused guides covering all features - Update README.md to remove v1.0 breaking changes notice
11 KiB
Adding Sources - Getting Content Into Your Notebook
Sources are the raw materials of your research. This guide covers how to add different types of content.
Quick-Start: Add Your First Source
Option 1: Upload a File (PDF, Word, etc.)
1. In your notebook, click "Add Source"
2. Select "Upload File"
3. Choose a file from your computer
4. Click "Upload"
5. Wait 30-60 seconds for processing
6. Done! Source appears in your notebook
Option 2: Add a Web Link
1. Click "Add Source"
2. Select "Web Link"
3. Paste URL: https://example.com/article
4. Click "Add"
5. Wait for processing (usually faster than files)
6. Done!
Option 3: Paste Text
1. Click "Add Source"
2. Select "Text"
3. Paste or type your content
4. Click "Save"
5. Done! Immediately available
Supported File Types
Documents
- PDF (.pdf) — Best support, including scanned PDFs with OCR
- Word (.docx, .doc) — Full support
- PowerPoint (.pptx) — Slides converted to text
- Excel (.xlsx, .xls) — Spreadsheet data
- EPUB (.epub) — eBook files
- Markdown (.md, .txt) — Plain text formats
- HTML (.html, .htm) — Web page files
File size limits: Up to ~100MB (varies by system)
Processing time: 10 seconds - 2 minutes (depending on length and file type)
Audio & Video
- Audio: MP3, WAV, M4A, OGG, FLAC (~30 seconds - 3 minutes per hour)
- Video: MP4, AVI, MOV, MKV, WebM (~3-10 minutes per hour)
- YouTube: Direct URL support
- Podcasts: RSS feed URL
Automatic transcription: Audio/video is transcribed to text automatically. This requires enabling speech-to-text in settings.
Web Content
- Articles: Blog posts, news articles, Medium
- YouTube: Full videos or playlists
- PDFs online: Direct PDF links
- News: News site articles
Just paste the URL in "Web Link" section.
What Doesn't Work
- Paywalled content (WSJ, FT, etc.) — Can't extract
- Password-protected PDFs — Can't open
- Pure image files (.jpg, .png) — Except scanned PDFs which have OCR
- Very large files (>100MB) — Timeout
What Happens When You Add a Source
The system automatically does four things:
1. EXTRACT TEXT
File/URL → Readable text
(PDFs get OCR if scanned)
(Videos get transcribed if enabled)
2. BREAK INTO CHUNKS
Long text → ~500-word pieces
(So search finds specific parts, not whole document)
3. CREATE EMBEDDINGS
Each chunk → Vector representation
(Enables semantic/concept search)
4. INDEX & STORE
Everything → Database
(Ready to search and retrieve)
Time to use: After the progress bar completes, the source is ready immediately. Embeddings are created in the background.
Step-by-Step for Different Types
PDFs
Best practices:
Clean PDFs:
1. Upload → Done
2. Processing time: ~30-60 seconds
Scanned/Image PDFs:
1. Upload same way
2. System auto-detects and uses OCR
3. Processing time: ~2-3 minutes
4. (Higher, due to OCR overhead)
Large PDFs (50+ pages):
1. Consider splitting into smaller files
2. Or upload as-is (system handles it)
3. Processing time scales with size
Common issues:
- "Can't extract text" → PDF is corrupted or has copy protection
- Solution: Try opening in Adobe. If it won't, the PDF is likely protected.
Web Links / Articles
Best practices:
1. Copy full URL from browser: https://example.com/article-title
2. Paste in "Web Link"
3. Click Add
4. Wait for extraction
Processing time: Usually 5-15 seconds
What works:
- Standard web articles
- Blog posts
- News articles
- Wikipedia pages
- Medium posts
- Substack articles
What doesn't work:
- Twitter threads (unreliable)
- Paywalled articles (can't access)
- JavaScript-heavy sites (content not extracted)
Pro tip: If it doesn't work, copy the article text and paste as "Text" instead.
Audio Files
Best practices:
1. Ensure speech-to-text is enabled in Settings
2. Upload MP3, WAV, or M4A file
3. System automatically transcribes to text
4. Processing time: ~1 minute per 5 minutes of audio
Example:
- 1-hour podcast → 12 minutes processing
- 10-minute recording → 2 minutes processing
Quality matters:
- Clear audio: Fast transcription
- Muffled/noisy audio: Slower, less accurate transcription
- Background noise: Try to minimize before uploading
Tip: If audio quality is poor, the AI might misinterpret content. You can manually correct transcription if needed.
YouTube Videos
Best practices:
Two ways to add:
Method 1: Direct URL
1. Copy YouTube URL: https://www.youtube.com/watch?v=...
2. Paste in "Web Link"
3. Click Add
4. System extracts captions (if available) + transcript
Method 2: Playlist
1. Paste playlist URL
2. System adds all videos as separate sources
3. Each video processed separately
4. Takes longer (multiple videos)
What's extracted:
- Captions/subtitles (if available)
- Transcription (if captions aren't available)
- Basic metadata (title, channel, length)
Processing:
- 10-minute video: ~2-3 minutes
- 1-hour video: ~10-15 minutes
Text / Paste Content
Best practices:
1. Select "Text" when adding source
2. Paste or type content
3. System processes immediately
4. No wait time needed
Good for:
- Notes you want to reference
- Quotes from books
- Transcripts you have handy
- Quick research snippets
Managing Your Sources
Viewing Source Details
Click on source → See:
- Original file name/title
- When it was added
- Size and format
- Processing status
- Number of chunks
Organizing with Metadata
You can add to each source:
- Title: Better name than original filename
- Tags: Category labels ("primary research", "background", "competitor analysis")
- Description: A few notes about what it contains
Why this matters:
- Makes sources easier to find
- Helps when contextualizing for Chat
- Useful for organizing large notebooks
Searching Within Sources
After sources are added, you can:
Text search: "Find exact phrase"
Vector search: "Find conceptually similar"
Both search across all sources in notebook.
Results show:
- Which source
- Which section
- Relevance score
Context Management: How Sources Get Used
You control how AI accesses sources:
Three Levels (for Chat)
Full Content:
AI sees: Complete source text
Cost: 100% of tokens
Use when: Analyzing in detail, need precise citations
Example: "Analyze this methodology paper closely"
Summary Only:
AI sees: AI-generated summary (not full text)
Cost: ~10-20% of tokens
Use when: Background material, reference context
Example: "Use this as context but focus on the main source"
Not in Context:
AI sees: Nothing (excluded)
Cost: 0 tokens
Use when: Confidential, not relevant, or archived
Example: "Keep this in notebook but don't use in this conversation"
How to Set Context (in Chat)
1. Go to Chat
2. Click "Select Context Sources"
3. For each source:
- Toggle ON/OFF (include/exclude)
- Choose level (Full/Summary/Excluded)
4. Click "Save"
5. Now chat uses these settings
Common Mistakes
| Mistake | What Happens | How to Fix |
|---|---|---|
| Upload 200 sources at once | System gets slow, processing stalls | Add 10-20 at a time, wait for processing |
| Use full content for all sources | Token usage skyrockets, expensive | Use "Summary" or "Excluded" for background material |
| Add huge PDFs without splitting | Processing is slow, search results less precise | Consider splitting large PDFs into chapters |
| Forget source titles | Can't distinguish between similar sources | Rename sources with descriptive titles right after uploading |
| Don't tag sources | Hard to find and organize later | Add tags immediately: "primary", "background", etc. |
| Mix languages in one source | Transcription/embedding quality drops | Keep each language in separate sources |
| Use same source multiple times | Takes up space, creates confusion | Add once; reuse in multiple chats/notebooks |
Processing Status & Troubleshooting
What the Status Indicators Mean
🟡 Processing
→ Source is being extracted and embedded
→ Wait 30 seconds - 3 minutes depending on size
→ Don't use in Chat yet
🟢 Ready
→ Source is processed and searchable
→ Can use immediately in Chat
→ Can apply transformations
🔴 Error
→ Something went wrong
→ Common reasons:
- Unsupported file format
- File too large or corrupted
- Network timeout
⚪ Not in Context
→ Source added but excluded from Chat
→ Still searchable, not sent to AI
Common Errors & Solutions
"Unsupported file type"
- You tried to upload a format not in the list (e.g.,
.webpimage) - Solution: Convert to supported format (PDF for documents, MP3 for audio)
"Processing timeout"
- Very large file (>100MB) or very long audio
- Solution: Split into smaller pieces or try uploading again
"Transcription failed"
- Audio quality too poor or language not detected
- Solution: Re-record with better quality, or paste text transcript manually
"Web link won't extract"
- Website blocks automated access or uses JavaScript for content
- Solution: Copy the article text and paste as "Text" instead
Tips for Best Results
For PDFs
- Clean, digital PDFs work best
- Remove copy protection if present (legally)
- Scanned PDFs work but take longer
For Web Articles
- Use full URL including domain
- Avoid cookie/popup-laden sites
- If extraction fails, copy-paste text instead
For Audio
- Clear, well-recorded audio transcribes better
- Remove background noise if possible
- YouTube videos usually have good transcriptions built-in
For Large Documents
- Consider splitting into smaller sources
- Gives more precise search results
- Processing is faster for smaller pieces
For Organization
- Name sources clearly (not "document_2.pdf")
- Add tags immediately after uploading
- Use descriptions for complex documents
What Comes After: Using Your Sources
Once you've added sources, you can:
- Chat → Ask questions (see Chat Effectively)
- Search → Find specific content (see Search Effectively)
- Transformations → Extract structured insights (see Working with Notes)
- Ask → Get comprehensive answers (see Search Effectively)
- Podcasts → Turn into audio (see Creating Podcasts)
Summary Checklist
Before adding sources, confirm:
- File is in supported format
- File is under 100MB (or splitting large ones)
- Web links are full URLs (not shortened)
- Audio files have clear speech (if transcription-dependent)
- You've named source clearly
- You've added tags for organization
- You understand context levels (Full/Summary/Excluded)
Done! Sources are now ready for Chat, Search, Transformations, and more.