mirror of
https://github.com/MODSetter/SurfSense.git
synced 2025-09-02 02:29:08 +00:00
- Supports audio & video files. - Will be useful for Youtube vids which dont have transcripts.
262 lines
8.4 KiB
Markdown
262 lines
8.4 KiB
Markdown
|
||

|
||
|
||
|
||
|
||
|
||
# SurfSense
|
||
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more to come.
|
||
|
||
<div align="center">
|
||
<a href="https://trendshift.io/repositories/13606" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13606" alt="MODSetter%2FSurfSense | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||
</div>
|
||
|
||
|
||
# Video
|
||
|
||
https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc
|
||
|
||
# Podcast's
|
||
|
||
https://github.com/user-attachments/assets/d516982f-de00-4c41-9e4c-632a7d942f41
|
||
|
||
## Podcast Sample
|
||
|
||
https://github.com/user-attachments/assets/bf64a6ca-934b-47ac-9e1b-edac5fe972ec
|
||
|
||
|
||
|
||
## Key Features
|
||
|
||
### 💡 **Idea**:
|
||
Have your own highly customizable private NotebookLM and Perplexity integrated with external sources.
|
||
### 📁 **Multiple File Format Uploading Support**
|
||
Save content from your own personal files *(Documents, images, videos and supports **34 file extensions**)* to your own personal knowledge base .
|
||
### 🔍 **Powerful Search**
|
||
Quickly research or find anything in your saved content .
|
||
### 💬 **Chat with your Saved Content**
|
||
Interact in Natural Language and get cited answers.
|
||
### 📄 **Cited Answers**
|
||
Get Cited answers just like Perplexity.
|
||
### 🔔 **Privacy & Local LLM Support**
|
||
Works Flawlessly with Ollama local LLMs.
|
||
### 🏠 **Self Hostable**
|
||
Open source and easy to deploy locally.
|
||
### 🎙️ Podcasts
|
||
- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.)
|
||
- Convert your chat conversations into engaging audio content
|
||
- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI)
|
||
|
||
### 📊 **Advanced RAG Techniques**
|
||
- Supports 150+ LLM's
|
||
- Supports 6000+ Embedding Models.
|
||
- Supports all major Rerankers (Pinecode, Cohere, Flashrank etc)
|
||
- Uses Hierarchical Indices (2 tiered RAG setup).
|
||
- Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).
|
||
- RAG as a Service API Backend.
|
||
|
||
### ℹ️ **External Sources**
|
||
- Search Engines (Tavily, LinkUp)
|
||
- Slack
|
||
- Linear
|
||
- Notion
|
||
- Youtube Videos
|
||
- GitHub
|
||
- and more to come.....
|
||
|
||
### 📄 **Supported File Extensions**
|
||
|
||
#### Document
|
||
|
||
`.doc`, `.docx`, `.odt`, `.rtf`, `.pdf`, `.xml`
|
||
|
||
#### Text & Markup
|
||
|
||
`.txt`, `.md`, `.markdown`, `.rst`, `.html`, `.org`
|
||
|
||
#### Spreadsheets & Tables
|
||
|
||
`.xls`, `.xlsx`, `.csv`, `.tsv`
|
||
|
||
#### Audio & Video
|
||
|
||
`.mp3`, `.mpga`, `.m4a`, `.wav`, `.mp4`, `.mpeg`, `.webm`
|
||
|
||
#### Images
|
||
|
||
`.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.heic`
|
||
|
||
#### Email & eBooks
|
||
|
||
`.eml`, `.msg`, `.epub`
|
||
|
||
#### PowerPoint Presentations & Other
|
||
|
||
`.ppt`, `.pptx`, `.p7s`
|
||
|
||
|
||
|
||
### 🔖 Cross Browser Extension
|
||
- The SurfSense extension can be used to save any webpage you like.
|
||
- Its main usecase is to save any webpages protected beyond authentication.
|
||
|
||
|
||
## FEATURE REQUESTS AND FUTURE
|
||
|
||
|
||
**SurfSense is actively being developed.** While it's not yet production-ready, you can help us speed up the process.
|
||
|
||
Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the future of SurfSense!
|
||
|
||
|
||
|
||
## How to get started?
|
||
|
||
### Installation Options
|
||
|
||
SurfSense provides two installation methods:
|
||
|
||
1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized.
|
||
- Includes pgAdmin for database management through a web UI
|
||
- Supports environment variable customization via `.env` file
|
||
- See [Docker Setup Guide](DOCKER_SETUP.md) for detailed instructions
|
||
|
||
2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.
|
||
|
||
Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
|
||
|
||
Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
|
||
- PGVector setup
|
||
- Google OAuth configuration
|
||
- Unstructured.io API key
|
||
- Other required API keys
|
||
|
||
## Screenshots
|
||
|
||
**Search Spaces**
|
||
|
||

|
||
|
||
**Manage Documents**
|
||

|
||
|
||
**Research Agent**
|
||
|
||

|
||
|
||
**Podcast Agent**
|
||

|
||
|
||
|
||
**Agent Chat**
|
||
|
||

|
||
|
||
**Browser Extension**
|
||
|
||

|
||
|
||

|
||
|
||
|
||
## Tech Stack
|
||
|
||
|
||
### **BackEnd**
|
||
|
||
- **FastAPI**: Modern, fast web framework for building APIs with Python
|
||
|
||
- **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches
|
||
|
||
- **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions
|
||
|
||
- **Alembic**: A database migrations tool for SQLAlchemy.
|
||
|
||
- **FastAPI Users**: Authentication and user management with JWT and OAuth support
|
||
|
||
- **LangGraph**: Framework for developing AI-agents.
|
||
|
||
- **LangChain**: Framework for developing AI-powered applications.
|
||
|
||
- **LLM Integration**: Integration with LLM models through LiteLLM
|
||
|
||
- **Rerankers**: Advanced result ranking for improved search relevance
|
||
|
||
- **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
|
||
|
||
- **Vector Embeddings**: Document and text embeddings for semantic search
|
||
|
||
- **pgvector**: PostgreSQL extension for efficient vector similarity operations
|
||
|
||
- **Chonkie**: Advanced document chunking and embedding library
|
||
- Uses `AutoEmbeddings` for flexible embedding model selection
|
||
- `LateChunker` for optimized document chunking based on embedding model's max sequence length
|
||
|
||
|
||
|
||
---
|
||
### **FrontEnd**
|
||
|
||
- **Next.js 15.2.3**: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering.
|
||
|
||
- **React 19.0.0**: JavaScript library for building user interfaces.
|
||
|
||
- **TypeScript**: Static type-checking for JavaScript, enhancing code quality and developer experience.
|
||
- **Vercel AI SDK Kit UI Stream Protocol**: To create scalable chat UI.
|
||
|
||
- **Tailwind CSS 4.x**: Utility-first CSS framework for building custom UI designs.
|
||
|
||
- **Shadcn**: Headless components library.
|
||
|
||
- **Lucide React**: Icon set implemented as React components.
|
||
|
||
- **Framer Motion**: Animation library for React.
|
||
|
||
- **Sonner**: Toast notification library.
|
||
|
||
- **Geist**: Font family from Vercel.
|
||
|
||
- **React Hook Form**: Form state management and validation.
|
||
|
||
- **Zod**: TypeScript-first schema validation with static type inference.
|
||
|
||
- **@hookform/resolvers**: Resolvers for using validation libraries with React Hook Form.
|
||
|
||
- **@tanstack/react-table**: Headless UI for building powerful tables & datagrids.
|
||
|
||
|
||
### **DevOps**
|
||
|
||
- **Docker**: Container platform for consistent deployment across environments
|
||
|
||
- **Docker Compose**: Tool for defining and running multi-container Docker applications
|
||
|
||
- **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup
|
||
|
||
|
||
### **Extension**
|
||
Manifest v3 on Plasmo
|
||
|
||
## Future Work
|
||
- Add More Connectors.
|
||
- Patch minor bugs.
|
||
- Document Chat **[REIMPLEMENT]**
|
||
- Document Podcasts
|
||
|
||
|
||
|
||
## Contribute
|
||
|
||
Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues.
|
||
Fine-tuning the Backend is always desired.
|
||
|
||
## Star History
|
||
|
||
<a href="https://www.star-history.com/#MODSetter/SurfSense&Date">
|
||
<picture>
|
||
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date&theme=dark" />
|
||
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" />
|
||
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" />
|
||
</picture>
|
||
</a>
|
||
|