mirror of
https://github.com/MODSetter/SurfSense.git
synced 2025-09-01 10:09:08 +00:00
130 lines
No EOL
4.7 KiB
Text
130 lines
No EOL
4.7 KiB
Text
---
|
|
title: Prerequisites
|
|
description: Required setup's before setting up SurfSense
|
|
full: true
|
|
---
|
|
|
|
## PGVector installation Guide
|
|
|
|
SurfSense requires the pgvector extension for PostgreSQL:
|
|
|
|
### Linux and Mac
|
|
|
|
Compile and install the extension (supports Postgres 13+)
|
|
|
|
```sh
|
|
cd /tmp
|
|
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
|
|
cd pgvector
|
|
make
|
|
make install # may need sudo
|
|
```
|
|
|
|
See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---linux-and-mac) if you run into issues
|
|
|
|
### Windows
|
|
|
|
Ensure [C++ support in Visual Studio](https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170#download-and-install-the-tools) is installed, and run:
|
|
|
|
```cmd
|
|
call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
|
|
```
|
|
|
|
Note: The exact path will vary depending on your Visual Studio version and edition
|
|
|
|
Then use `nmake` to build:
|
|
|
|
```cmd
|
|
set "PGROOT=C:\Program Files\PostgreSQL\16"
|
|
cd %TEMP%
|
|
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
|
|
cd pgvector
|
|
nmake /F Makefile.win
|
|
nmake /F Makefile.win install
|
|
```
|
|
|
|
See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---windows) if you run into issues
|
|
|
|
---
|
|
|
|
## Google OAuth Setup (Optional)
|
|
|
|
SurfSense supports both Google OAuth and local email/password authentication. Google OAuth is optional - if you prefer local authentication, you can skip this section.
|
|
|
|
**Note**: Google OAuth setup is **required** in your `.env` files if you want to use the Gmail and Google Calendar connectors in SurfSense.
|
|
|
|
To set up Google OAuth:
|
|
|
|
1. Login to your [Google Developer Console](https://console.cloud.google.com/)
|
|
2. Enable the required APIs:
|
|
- **People API** (required for basic Google OAuth)
|
|
- **Gmail API** (required if you want to use the Gmail connector)
|
|
- **Google Calendar API** (required if you want to use the Google Calendar connector)
|
|

|
|
3. Set up OAuth consent screen.
|
|

|
|
4. Create OAuth client ID and secret.
|
|

|
|
5. It should look like this.
|
|

|
|
|
|
---
|
|
|
|
## File Upload's
|
|
|
|
SurfSense supports three ETL (Extract, Transform, Load) services for converting files to LLM-friendly formats:
|
|
|
|
### Option 1: Unstructured
|
|
|
|
Files are converted using [Unstructured](https://github.com/Unstructured-IO/unstructured)
|
|
|
|
1. Get an Unstructured.io API key from [Unstructured Platform](https://platform.unstructured.io/)
|
|
2. You should be able to generate API keys once registered
|
|

|
|
|
|
### Option 2: LlamaIndex (LlamaCloud)
|
|
|
|
Files are converted using [LlamaIndex](https://www.llamaindex.ai/) which offers 50+ file format support.
|
|
|
|
1. Get a LlamaIndex API key from [LlamaCloud](https://cloud.llamaindex.ai/)
|
|
2. Sign up for a LlamaCloud account to access their parsing services
|
|
3. LlamaCloud provides enhanced parsing capabilities for complex documents
|
|
|
|
### Option 3: Docling (Recommended for Privacy)
|
|
|
|
Files are processed locally using [Docling](https://github.com/DS4SD/docling) - IBM's open-source document parsing library.
|
|
|
|
1. **No API key required** - all processing happens locally
|
|
2. **Privacy-focused** - documents never leave your system
|
|
3. **Supported formats**: PDF, Office documents (Word, Excel, PowerPoint), images (PNG, JPEG, TIFF, BMP, WebP), HTML, CSV, AsciiDoc
|
|
4. **Enhanced features**: Advanced table detection, image extraction, and structured document parsing
|
|
5. **GPU acceleration** support for faster processing (when available)
|
|
|
|
**Note**: You only need to set up one of these services.
|
|
|
|
---
|
|
|
|
## LLM Observability (Optional)
|
|
|
|
This is not required for SurfSense to work. But it is always a good idea to monitor LLM interactions. So we do not have those WTH moments.
|
|
|
|
1. Get a LangSmith API key from [smith.langchain.com](https://smith.langchain.com/)
|
|
2. This helps in observing SurfSense Researcher Agent.
|
|

|
|
|
|
---
|
|
|
|
## Crawler
|
|
|
|
SurfSense have 2 options for saving webpages:
|
|
- [SurfSense Extension](https://github.com/MODSetter/SurfSense/tree/main/surfsense_browser_extension) (Overall better experience & ability to save private webpages, recommended)
|
|
- Crawler (If you want to save public webpages)
|
|
|
|
**NOTE:** SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) for web crawling. If you plan on using the crawler, you will need to create a Firecrawl account and get an API key.
|
|
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
Once you have all prerequisites in place, proceed to the [installation guide](/docs/installation) to set up SurfSense. |