mirror of https://github.com/MODSetter/SurfSense.git synced 2025-09-01 01:59:06 +00:00

History

Rohan Verma 85fb6c6d39 Some checks are pending pre-commit / pre-commit (push) Waiting to run Details Merge pull request #242 from CREDO23/feature/clickup-connector [Feature] Add clickup connector		2025-08-02 02:50:44 +05:30
..
alembic	fix ruff isues	2025-08-01 00:27:50 +02:00
app	fix ruff isues	2025-08-01 00:27:50 +02:00
.dockerignore	feat: Added Docker Support and missing dependencies.	2025-03-20 18:52:06 -07:00
.env.example	feat: Add Docling support as ETL_SERVICE option	2025-07-20 11:42:55 +03:00
.gitignore	fix: Added API_BASE param for LiteLLM.	2025-05-08 19:31:47 -07:00
.python-version	feat: SurfSense v0.0.6 init	2025-03-14 18:53:14 -07:00
alembic.ini	add github connector, add alembic for db migrations, fix bug updating connectors	2025-04-13 13:56:22 -07:00
Dockerfile	Fixed docker config to run on non-windows architectures	2025-08-01 10:34:01 -07:00
main.py	Fixed all ruff lint and formatting errors	2025-07-24 14:43:48 -07:00
pyproject.toml	fix: remove trailing whitespace and update pre-commit config	2025-07-24 15:08:45 -07:00
README.md	chore: update README and refactor ConnectorService for improved document handling and error management	2025-04-27 20:39:17 -07:00
uv.lock	Added ruff to dependencies	2025-07-24 10:24:14 -07:00

Surf Backend

Technology Stack Overview

This application is a modern AI-powered search and knowledge management platform built with the following technology stack:

Python 3.12+: The application requires Python 3.12 or newer
FastAPI: Modern, fast web framework for building APIs with Python
Uvicorn: ASGI server implementation, running the FastAPI application
PostgreSQL with pgvector: Database with vector search capabilities for similarity searches
SQLAlchemy: SQL toolkit and ORM (Object-Relational Mapping) for database interactions
FastAPI Users: Authentication and user management with JWT and OAuth support

Hybrid Search: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
Vector Embeddings: Document and text embeddings for semantic search
pgvector: PostgreSQL extension for efficient vector similarity operations
Chonkie: Advanced document chunking and embedding library
- Uses AutoEmbeddings for flexible embedding model selection
- LateChunker for optimized document chunking based on embedding model's max sequence length

LangChain: Framework for developing AI-powered applications
- Used for document processing, research, and response generation
- Integration with various LLM models through LiteLLM
- Document conversion utilities for standardized processing
GPT Integration: Integration with LLM models through LiteLLM
- Multiple LLM configurations for different use cases:
  - Fast LLM: Quick responses (default: gpt-4o-mini)
  - Smart LLM: More comprehensive analysis (default: gpt-4o-mini)
  - Strategic LLM: Complex reasoning (default: gpt-4o-mini)
  - Long Context LLM: For processing large documents (default: gemini-2.0-flash-thinking)
Rerankers with FlashRank: Advanced result ranking for improved search relevance
- Configurable reranking models (default: ms-marco-MiniLM-L-12-v2)
- Supports multiple reranking backends (FlashRank, Cohere, etc.)
- Improves search result quality by reordering based on semantic relevance
GPT-Researcher: Advanced research capabilities
- Multiple research modes (GENERAL, DEEP, DEEPER)
- Customizable report formats with proper citations
- Streaming research results for real-time updates

Slack Connector: Integration with Slack for data retrieval and notifications
Notion Connector: Integration with Notion for document retrieval
Search APIs: Integration with Tavily and Serper API for web search
Firecrawl: Web crawling and data extraction capabilities

Search Spaces: Isolated search environments for different contexts or projects
Documents: Storage and retrieval of various document types
Chunks: Document fragments for more precise retrieval
Chats: Conversation management with different depth levels (GENERAL, DEEP)
Podcasts: Audio content management with generation capabilities
Search Source Connectors: Integration with various data sources

The application uses a relational database with the following main entities:

The API is structured with the following main route groups:

The application is configured to run with Uvicorn and can be deployed with:

python main.py

This will start the server on all interfaces (0.0.0.0) with info-level logging.

See pyproject.toml for detailed dependency information. Key dependencies include: