feat: Added Speech to Text support.

- Supports audio & video files.
- Will be useful for Youtube vids which dont have transcripts.
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2025-05-13 21:13:53 -07:00
parent 57987ecc76
commit a8080d2dc7
8 changed files with 172 additions and 73 deletions

View file

@ -27,28 +27,27 @@ https://github.com/user-attachments/assets/bf64a6ca-934b-47ac-9e1b-edac5fe972ec
## Key Features ## Key Features
### 1. Latest
#### 💡 **Idea**: ### 💡 **Idea**:
Have your own highly customizable private NotebookLM and Perplexity integrated with external sources. Have your own highly customizable private NotebookLM and Perplexity integrated with external sources.
#### 📁 **Multiple File Format Uploading Support** ### 📁 **Multiple File Format Uploading Support**
Save content from your own personal files *(Documents, images and supports **27 file extensions**)* to your own personal knowledge base . Save content from your own personal files *(Documents, images, videos and supports **34 file extensions**)* to your own personal knowledge base .
#### 🔍 **Powerful Search** ### 🔍 **Powerful Search**
Quickly research or find anything in your saved content . Quickly research or find anything in your saved content .
#### 💬 **Chat with your Saved Content** ### 💬 **Chat with your Saved Content**
Interact in Natural Language and get cited answers. Interact in Natural Language and get cited answers.
#### 📄 **Cited Answers** ### 📄 **Cited Answers**
Get Cited answers just like Perplexity. Get Cited answers just like Perplexity.
#### 🔔 **Privacy & Local LLM Support** ### 🔔 **Privacy & Local LLM Support**
Works Flawlessly with Ollama local LLMs. Works Flawlessly with Ollama local LLMs.
#### 🏠 **Self Hostable** ### 🏠 **Self Hostable**
Open source and easy to deploy locally. Open source and easy to deploy locally.
#### 🎙️ Podcasts ### 🎙️ Podcasts
- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.) - Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.)
- Convert your chat conversations into engaging audio content - Convert your chat conversations into engaging audio content
- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI) - Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI)
#### 📊 **Advanced RAG Techniques** ### 📊 **Advanced RAG Techniques**
- Supports 150+ LLM's - Supports 150+ LLM's
- Supports 6000+ Embedding Models. - Supports 6000+ Embedding Models.
- Supports all major Rerankers (Pinecode, Cohere, Flashrank etc) - Supports all major Rerankers (Pinecode, Cohere, Flashrank etc)
@ -56,7 +55,7 @@ Open source and easy to deploy locally.
- Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion). - Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).
- RAG as a Service API Backend. - RAG as a Service API Backend.
#### **External Sources** ### **External Sources**
- Search Engines (Tavily, LinkUp) - Search Engines (Tavily, LinkUp)
- Slack - Slack
- Linear - Linear
@ -65,7 +64,39 @@ Open source and easy to deploy locally.
- GitHub - GitHub
- and more to come..... - and more to come.....
#### 🔖 Cross Browser Extension ### 📄 **Supported File Extensions**
#### Document
`.doc`, `.docx`, `.odt`, `.rtf`, `.pdf`, `.xml`
#### Text & Markup
`.txt`, `.md`, `.markdown`, `.rst`, `.html`, `.org`
#### Spreadsheets & Tables
`.xls`, `.xlsx`, `.csv`, `.tsv`
#### Audio & Video
`.mp3`, `.mpga`, `.m4a`, `.wav`, `.mp4`, `.mpeg`, `.webm`
#### Images
`.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.heic`
#### Email & eBooks
`.eml`, `.msg`, `.epub`
#### PowerPoint Presentations & Other
`.ppt`, `.pptx`, `.p7s`
### 🔖 Cross Browser Extension
- The SurfSense extension can be used to save any webpage you like. - The SurfSense extension can be used to save any webpage you like.
- Its main usecase is to save any webpages protected beyond authentication. - Its main usecase is to save any webpages protected beyond authentication.
@ -209,16 +240,8 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
## Future Work ## Future Work
- Add More Connectors. - Add More Connectors.
- Patch minor bugs. - Patch minor bugs.
- Implement Canvas. - Document Chat **[REIMPLEMENT]**
- Complete Hybrid Search. **[Done]** - Document Podcasts
- Add support for file uploads QA. **[Done]**
- Shift to WebSockets for Streaming responses. **[Deprecated in favor of AI SDK Stream Protocol]**
- Based on feedback, I will work on making it compatible with local models. **[Done]**
- Cross Browser Extension **[Done]**
- Critical Notifications **[Done | PAUSED]**
- Saving Chats **[Done]**
- Basic keyword search page for saved sessions **[Done]**
- Multi & Single Document Chat **[Done]**

View file

@ -18,6 +18,9 @@ LONG_CONTEXT_LLM="gemini/gemini-2.0-flash"
#LiteLLM TTS Provider: https://docs.litellm.ai/docs/text_to_speech#supported-providers #LiteLLM TTS Provider: https://docs.litellm.ai/docs/text_to_speech#supported-providers
TTS_SERVICE="openai/tts-1" TTS_SERVICE="openai/tts-1"
#LiteLLM STT Provider: https://docs.litellm.ai/docs/audio_transcription#supported-providers
STT_SERVICE="openai/whisper-1"
# Chosen LiteLLM Providers Keys # Chosen LiteLLM Providers Keys
OPENAI_API_KEY="sk-proj-iA" OPENAI_API_KEY="sk-proj-iA"
GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124" GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124"
@ -35,3 +38,5 @@ LANGSMITH_PROJECT="surfsense"
FAST_LLM_API_BASE="" FAST_LLM_API_BASE=""
STRATEGIC_LLM_API_BASE="" STRATEGIC_LLM_API_BASE=""
LONG_CONTEXT_LLM_API_BASE="" LONG_CONTEXT_LLM_API_BASE=""
TTS_SERVICE_API_BASE=""
STT_SERVICE_API_BASE=""

View file

@ -135,14 +135,23 @@ async def create_merged_podcast_audio(state: State, config: RunnableConfig) -> D
filename = f"{temp_dir}/{session_id}_{index}.mp3" filename = f"{temp_dir}/{session_id}_{index}.mp3"
try: try:
# Generate speech using litellm if app_config.TTS_SERVICE_API_BASE:
response = await aspeech( response = await aspeech(
model=app_config.TTS_SERVICE, model=app_config.TTS_SERVICE,
voice=voice, api_base=app_config.TTS_SERVICE_API_BASE,
input=dialog, voice=voice,
max_retries=2, input=dialog,
timeout=600, max_retries=2,
) timeout=600,
)
else:
response = await aspeech(
model=app_config.TTS_SERVICE,
voice=voice,
input=dialog,
max_retries=2,
timeout=600,
)
# Save the audio to a file - use proper streaming method # Save the audio to a file - use proper streaming method
with open(filename, 'wb') as f: with open(filename, 'wb') as f:

View file

@ -6,7 +6,7 @@ from chonkie import AutoEmbeddings, CodeChunker, RecursiveChunker
from dotenv import load_dotenv from dotenv import load_dotenv
from langchain_community.chat_models import ChatLiteLLM from langchain_community.chat_models import ChatLiteLLM
from rerankers import Reranker from rerankers import Reranker
from litellm import speech
# Get the base directory of the project # Get the base directory of the project
BASE_DIR = Path(__file__).resolve().parent.parent.parent BASE_DIR = Path(__file__).resolve().parent.parent.parent
@ -97,6 +97,12 @@ class Config:
# Litellm TTS Configuration # Litellm TTS Configuration
TTS_SERVICE = os.getenv("TTS_SERVICE") TTS_SERVICE = os.getenv("TTS_SERVICE")
TTS_SERVICE_API_BASE = os.getenv("TTS_SERVICE_API_BASE")
# Litellm STT Configuration
STT_SERVICE = os.getenv("STT_SERVICE")
STT_SERVICE_API_BASE = os.getenv("STT_SERVICE_API_BASE")
# Validation Checks # Validation Checks
# Check embedding dimension # Check embedding dimension

View file

@ -1,3 +1,4 @@
from litellm import atranscription
from fastapi import APIRouter, Depends, BackgroundTasks, UploadFile, Form, HTTPException from fastapi import APIRouter, Depends, BackgroundTasks, UploadFile, Form, HTTPException
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select from sqlalchemy.future import select
@ -7,6 +8,7 @@ from app.schemas import DocumentsCreate, DocumentUpdate, DocumentRead
from app.users import current_active_user from app.users import current_active_user
from app.utils.check_ownership import check_ownership from app.utils.check_ownership import check_ownership
from app.tasks.background_tasks import add_received_markdown_file_document, add_extension_received_document, add_received_file_document, add_crawled_url_document, add_youtube_video_document from app.tasks.background_tasks import add_received_markdown_file_document, add_extension_received_document, add_received_file_document, add_crawled_url_document, add_youtube_video_document
from app.config import config as app_config
# Force asyncio to use standard event loop before unstructured imports # Force asyncio to use standard event loop before unstructured imports
import asyncio import asyncio
try: try:
@ -17,9 +19,9 @@ import os
os.environ["UNSTRUCTURED_HAS_PATCHED_LOOP"] = "1" os.environ["UNSTRUCTURED_HAS_PATCHED_LOOP"] = "1"
router = APIRouter() router = APIRouter()
@router.post("/documents/") @router.post("/documents/")
async def create_documents( async def create_documents(
request: DocumentsCreate, request: DocumentsCreate,
@ -30,19 +32,19 @@ async def create_documents(
try: try:
# Check if the user owns the search space # Check if the user owns the search space
await check_ownership(session, SearchSpace, request.search_space_id, user) await check_ownership(session, SearchSpace, request.search_space_id, user)
if request.document_type == DocumentType.EXTENSION: if request.document_type == DocumentType.EXTENSION:
for individual_document in request.content: for individual_document in request.content:
fastapi_background_tasks.add_task( fastapi_background_tasks.add_task(
process_extension_document_with_new_session, process_extension_document_with_new_session,
individual_document, individual_document,
request.search_space_id request.search_space_id
) )
elif request.document_type == DocumentType.CRAWLED_URL: elif request.document_type == DocumentType.CRAWLED_URL:
for url in request.content: for url in request.content:
fastapi_background_tasks.add_task( fastapi_background_tasks.add_task(
process_crawled_url_with_new_session, process_crawled_url_with_new_session,
url, url,
request.search_space_id request.search_space_id
) )
elif request.document_type == DocumentType.YOUTUBE_VIDEO: elif request.document_type == DocumentType.YOUTUBE_VIDEO:
@ -57,7 +59,7 @@ async def create_documents(
status_code=400, status_code=400,
detail="Invalid document type" detail="Invalid document type"
) )
await session.commit() await session.commit()
return {"message": "Documents processed successfully"} return {"message": "Documents processed successfully"}
except HTTPException: except HTTPException:
@ -69,6 +71,7 @@ async def create_documents(
detail=f"Failed to process documents: {str(e)}" detail=f"Failed to process documents: {str(e)}"
) )
@router.post("/documents/fileupload") @router.post("/documents/fileupload")
async def create_documents( async def create_documents(
files: list[UploadFile], files: list[UploadFile],
@ -79,26 +82,26 @@ async def create_documents(
): ):
try: try:
await check_ownership(session, SearchSpace, search_space_id, user) await check_ownership(session, SearchSpace, search_space_id, user)
if not files: if not files:
raise HTTPException(status_code=400, detail="No files provided") raise HTTPException(status_code=400, detail="No files provided")
for file in files: for file in files:
try: try:
# Save file to a temporary location to avoid stream issues # Save file to a temporary location to avoid stream issues
import tempfile import tempfile
import aiofiles import aiofiles
import os import os
# Create temp file # Create temp file
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as temp_file: with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as temp_file:
temp_path = temp_file.name temp_path = temp_file.name
# Write uploaded file to temp file # Write uploaded file to temp file
content = await file.read() content = await file.read()
with open(temp_path, "wb") as f: with open(temp_path, "wb") as f:
f.write(content) f.write(content)
# Process in background to avoid uvloop conflicts # Process in background to avoid uvloop conflicts
fastapi_background_tasks.add_task( fastapi_background_tasks.add_task(
process_file_in_background_with_new_session, process_file_in_background_with_new_session,
@ -111,7 +114,7 @@ async def create_documents(
status_code=422, status_code=422,
detail=f"Failed to process file {file.filename}: {str(e)}" detail=f"Failed to process file {file.filename}: {str(e)}"
) )
await session.commit() await session.commit()
return {"message": "Files uploaded for processing"} return {"message": "Files uploaded for processing"}
except HTTPException: except HTTPException:
@ -136,14 +139,14 @@ async def process_file_in_background(
# For markdown files, read the content directly # For markdown files, read the content directly
with open(file_path, 'r', encoding='utf-8') as f: with open(file_path, 'r', encoding='utf-8') as f:
markdown_content = f.read() markdown_content = f.read()
# Clean up the temp file # Clean up the temp file
import os import os
try: try:
os.unlink(file_path) os.unlink(file_path)
except: except:
pass pass
# Process markdown directly through specialized function # Process markdown directly through specialized function
await add_received_markdown_file_document( await add_received_markdown_file_document(
session, session,
@ -151,10 +154,46 @@ async def process_file_in_background(
markdown_content, markdown_content,
search_space_id search_space_id
) )
# Check if the file is an audio file
elif filename.lower().endswith(('.mp3', '.mp4', '.mpeg', '.mpga', '.m4a', '.wav', '.webm')):
# Open the audio file for transcription
with open(file_path, "rb") as audio_file:
# Use LiteLLM for audio transcription
if app_config.STT_SERVICE_API_BASE:
transcription_response = await atranscription(
model=app_config.STT_SERVICE,
file=audio_file,
api_base=app_config.STT_SERVICE_API_BASE
)
else:
transcription_response = await atranscription(
model=app_config.STT_SERVICE,
file=audio_file
)
# Extract the transcribed text
transcribed_text = transcription_response.get("text", "")
# Add metadata about the transcription
transcribed_text = f"# Transcription of {filename}\n\n{transcribed_text}"
# Clean up the temp file
try:
os.unlink(file_path)
except:
pass
# Process transcription as markdown document
await add_received_markdown_file_document(
session,
filename,
transcribed_text,
search_space_id
)
else: else:
# Use synchronous unstructured API to avoid event loop issues # Use synchronous unstructured API to avoid event loop issues
from langchain_unstructured import UnstructuredLoader from langchain_unstructured import UnstructuredLoader
# Process the file # Process the file
loader = UnstructuredLoader( loader = UnstructuredLoader(
file_path, file_path,
@ -165,16 +204,16 @@ async def process_file_in_background(
include_metadata=False, include_metadata=False,
strategy="auto", strategy="auto",
) )
docs = await loader.aload() docs = await loader.aload()
# Clean up the temp file # Clean up the temp file
import os import os
try: try:
os.unlink(file_path) os.unlink(file_path)
except: except:
pass pass
# Pass the documents to the existing background task # Pass the documents to the existing background task
await add_received_file_document( await add_received_file_document(
session, session,
@ -186,6 +225,7 @@ async def process_file_in_background(
import logging import logging
logging.error(f"Error processing file in background: {str(e)}") logging.error(f"Error processing file in background: {str(e)}")
@router.get("/documents/", response_model=List[DocumentRead]) @router.get("/documents/", response_model=List[DocumentRead])
async def read_documents( async def read_documents(
skip: int = 0, skip: int = 0,
@ -195,17 +235,18 @@ async def read_documents(
user: User = Depends(current_active_user) user: User = Depends(current_active_user)
): ):
try: try:
query = select(Document).join(SearchSpace).filter(SearchSpace.user_id == user.id) query = select(Document).join(SearchSpace).filter(
SearchSpace.user_id == user.id)
# Filter by search_space_id if provided # Filter by search_space_id if provided
if search_space_id is not None: if search_space_id is not None:
query = query.filter(Document.search_space_id == search_space_id) query = query.filter(Document.search_space_id == search_space_id)
result = await session.execute( result = await session.execute(
query.offset(skip).limit(limit) query.offset(skip).limit(limit)
) )
db_documents = result.scalars().all() db_documents = result.scalars().all()
# Convert database objects to API-friendly format # Convert database objects to API-friendly format
api_documents = [] api_documents = []
for doc in db_documents: for doc in db_documents:
@ -218,7 +259,7 @@ async def read_documents(
created_at=doc.created_at, created_at=doc.created_at,
search_space_id=doc.search_space_id search_space_id=doc.search_space_id
)) ))
return api_documents return api_documents
except Exception as e: except Exception as e:
raise HTTPException( raise HTTPException(
@ -226,6 +267,7 @@ async def read_documents(
detail=f"Failed to fetch documents: {str(e)}" detail=f"Failed to fetch documents: {str(e)}"
) )
@router.get("/documents/{document_id}", response_model=DocumentRead) @router.get("/documents/{document_id}", response_model=DocumentRead)
async def read_document( async def read_document(
document_id: int, document_id: int,
@ -239,13 +281,13 @@ async def read_document(
.filter(Document.id == document_id, SearchSpace.user_id == user.id) .filter(Document.id == document_id, SearchSpace.user_id == user.id)
) )
document = result.scalars().first() document = result.scalars().first()
if not document: if not document:
raise HTTPException( raise HTTPException(
status_code=404, status_code=404,
detail=f"Document with id {document_id} not found" detail=f"Document with id {document_id} not found"
) )
# Convert database object to API-friendly format # Convert database object to API-friendly format
return DocumentRead( return DocumentRead(
id=document.id, id=document.id,
@ -262,6 +304,7 @@ async def read_document(
detail=f"Failed to fetch document: {str(e)}" detail=f"Failed to fetch document: {str(e)}"
) )
@router.put("/documents/{document_id}", response_model=DocumentRead) @router.put("/documents/{document_id}", response_model=DocumentRead)
async def update_document( async def update_document(
document_id: int, document_id: int,
@ -277,19 +320,19 @@ async def update_document(
.filter(Document.id == document_id, SearchSpace.user_id == user.id) .filter(Document.id == document_id, SearchSpace.user_id == user.id)
) )
db_document = result.scalars().first() db_document = result.scalars().first()
if not db_document: if not db_document:
raise HTTPException( raise HTTPException(
status_code=404, status_code=404,
detail=f"Document with id {document_id} not found" detail=f"Document with id {document_id} not found"
) )
update_data = document_update.model_dump(exclude_unset=True) update_data = document_update.model_dump(exclude_unset=True)
for key, value in update_data.items(): for key, value in update_data.items():
setattr(db_document, key, value) setattr(db_document, key, value)
await session.commit() await session.commit()
await session.refresh(db_document) await session.refresh(db_document)
# Convert to DocumentRead for response # Convert to DocumentRead for response
return DocumentRead( return DocumentRead(
id=db_document.id, id=db_document.id,
@ -309,6 +352,7 @@ async def update_document(
detail=f"Failed to update document: {str(e)}" detail=f"Failed to update document: {str(e)}"
) )
@router.delete("/documents/{document_id}", response_model=dict) @router.delete("/documents/{document_id}", response_model=dict)
async def delete_document( async def delete_document(
document_id: int, document_id: int,
@ -323,13 +367,13 @@ async def delete_document(
.filter(Document.id == document_id, SearchSpace.user_id == user.id) .filter(Document.id == document_id, SearchSpace.user_id == user.id)
) )
document = result.scalars().first() document = result.scalars().first()
if not document: if not document:
raise HTTPException( raise HTTPException(
status_code=404, status_code=404,
detail=f"Document with id {document_id} not found" detail=f"Document with id {document_id} not found"
) )
await session.delete(document) await session.delete(document)
await session.commit() await session.commit()
return {"message": "Document deleted successfully"} return {"message": "Document deleted successfully"}
@ -340,16 +384,16 @@ async def delete_document(
raise HTTPException( raise HTTPException(
status_code=500, status_code=500,
detail=f"Failed to delete document: {str(e)}" detail=f"Failed to delete document: {str(e)}"
) )
async def process_extension_document_with_new_session( async def process_extension_document_with_new_session(
individual_document, individual_document,
search_space_id: int search_space_id: int
): ):
"""Create a new session and process extension document.""" """Create a new session and process extension document."""
from app.db import async_session_maker from app.db import async_session_maker
async with async_session_maker() as session: async with async_session_maker() as session:
try: try:
await add_extension_received_document(session, individual_document, search_space_id) await add_extension_received_document(session, individual_document, search_space_id)
@ -357,13 +401,14 @@ async def process_extension_document_with_new_session(
import logging import logging
logging.error(f"Error processing extension document: {str(e)}") logging.error(f"Error processing extension document: {str(e)}")
async def process_crawled_url_with_new_session( async def process_crawled_url_with_new_session(
url: str, url: str,
search_space_id: int search_space_id: int
): ):
"""Create a new session and process crawled URL.""" """Create a new session and process crawled URL."""
from app.db import async_session_maker from app.db import async_session_maker
async with async_session_maker() as session: async with async_session_maker() as session:
try: try:
await add_crawled_url_document(session, url, search_space_id) await add_crawled_url_document(session, url, search_space_id)
@ -371,6 +416,7 @@ async def process_crawled_url_with_new_session(
import logging import logging
logging.error(f"Error processing crawled URL: {str(e)}") logging.error(f"Error processing crawled URL: {str(e)}")
async def process_file_in_background_with_new_session( async def process_file_in_background_with_new_session(
file_path: str, file_path: str,
filename: str, filename: str,
@ -378,21 +424,21 @@ async def process_file_in_background_with_new_session(
): ):
"""Create a new session and process file.""" """Create a new session and process file."""
from app.db import async_session_maker from app.db import async_session_maker
async with async_session_maker() as session: async with async_session_maker() as session:
await process_file_in_background(file_path, filename, search_space_id, session) await process_file_in_background(file_path, filename, search_space_id, session)
async def process_youtube_video_with_new_session( async def process_youtube_video_with_new_session(
url: str, url: str,
search_space_id: int search_space_id: int
): ):
"""Create a new session and process YouTube video.""" """Create a new session and process YouTube video."""
from app.db import async_session_maker from app.db import async_session_maker
async with async_session_maker() as session: async with async_session_maker() as session:
try: try:
await add_youtube_video_document(session, url, search_space_id) await add_youtube_video_document(session, url, search_space_id)
except Exception as e: except Exception as e:
import logging import logging
logging.error(f"Error processing YouTube video: {str(e)}") logging.error(f"Error processing YouTube video: {str(e)}")

View file

@ -53,7 +53,7 @@ export default function FileUploader() {
'text/html': ['.html'], 'text/html': ['.html'],
'image/jpeg': ['.jpeg', '.jpg'], 'image/jpeg': ['.jpeg', '.jpg'],
'image/png': ['.png'], 'image/png': ['.png'],
'text/markdown': ['.md'], 'text/markdown': ['.md', '.markdown'],
'application/vnd.ms-outlook': ['.msg'], 'application/vnd.ms-outlook': ['.msg'],
'application/vnd.oasis.opendocument.text': ['.odt'], 'application/vnd.oasis.opendocument.text': ['.odt'],
'text/x-org': ['.org'], 'text/x-org': ['.org'],
@ -69,6 +69,10 @@ export default function FileUploader() {
'application/vnd.ms-excel': ['.xls'], 'application/vnd.ms-excel': ['.xls'],
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'], 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
'application/xml': ['.xml'], 'application/xml': ['.xml'],
'audio/mpeg': ['.mp3', '.mpeg', '.mpga'],
'audio/mp4': ['.mp4', '.m4a'],
'audio/wav': ['.wav'],
'audio/webm': ['.webm'],
} }
const supportedExtensions = Array.from(new Set(Object.values(acceptedFileTypes).flat())).sort() const supportedExtensions = Array.from(new Set(Object.values(acceptedFileTypes).flat())).sort()

View file

@ -94,6 +94,7 @@ Before you begin, ensure you have:
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing | | UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling | | FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
| TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `openai/tts-1`, `azure/neural`, `vertex_ai/`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) | | TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `openai/tts-1`, `azure/neural`, `vertex_ai/`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) |
| STT_SERVICE | Speech-to-Text API provider for Podcasts (e.g., `openai/whisper-1`). See [supported providers](https://docs.litellm.ai/docs/audio_transcription#supported-providers) |
Include API keys for the LLM providers you're using. For example: Include API keys for the LLM providers you're using. For example:
@ -114,6 +115,8 @@ Include API keys for the LLM providers you're using. For example:
| FAST_LLM_API_BASE | Custom API base URL for the fast LLM | | FAST_LLM_API_BASE | Custom API base URL for the fast LLM |
| STRATEGIC_LLM_API_BASE | Custom API base URL for the strategic LLM | | STRATEGIC_LLM_API_BASE | Custom API base URL for the strategic LLM |
| LONG_CONTEXT_LLM_API_BASE | Custom API base URL for the long context LLM | | LONG_CONTEXT_LLM_API_BASE | Custom API base URL for the long context LLM |
| TTS_SERVICE_API_BASE | Custom API base URL for the Text-to-Speech (TTS) service |
| STT_SERVICE_API_BASE | Custom API base URL for the Speech-to-Text (STT) service |
For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers). For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).

View file

@ -65,6 +65,7 @@ Edit the `.env` file and set the following variables:
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service | | UNSTRUCTURED_API_KEY | API key for Unstructured.io service |
| FIRECRAWL_API_KEY | API key for Firecrawl service (if using crawler) | | FIRECRAWL_API_KEY | API key for Firecrawl service (if using crawler) |
| TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `openai/tts-1`, `azure/neural`, `vertex_ai/`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) | | TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `openai/tts-1`, `azure/neural`, `vertex_ai/`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) |
| STT_SERVICE | Speech-to-Text API provider for Podcasts (e.g., `openai/whisper-1`). See [supported providers](https://docs.litellm.ai/docs/audio_transcription#supported-providers) |
**Important**: Since LLM calls are routed through LiteLLM, include API keys for the LLM providers you're using: **Important**: Since LLM calls are routed through LiteLLM, include API keys for the LLM providers you're using:
@ -86,6 +87,8 @@ Edit the `.env` file and set the following variables:
| FAST_LLM_API_BASE | Custom API base URL for the fast LLM | | FAST_LLM_API_BASE | Custom API base URL for the fast LLM |
| STRATEGIC_LLM_API_BASE | Custom API base URL for the strategic LLM | | STRATEGIC_LLM_API_BASE | Custom API base URL for the strategic LLM |
| LONG_CONTEXT_LLM_API_BASE | Custom API base URL for the long context LLM | | LONG_CONTEXT_LLM_API_BASE | Custom API base URL for the long context LLM |
| TTS_SERVICE_API_BASE | Custom API base URL for the Text-to-Speech (TTS) service |
| STT_SERVICE_API_BASE | Custom API base URL for the Speech-to-Text (STT) service |
### 2. Install Dependencies ### 2. Install Dependencies