mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-27 00:25:10 +00:00
- Add agentic-synth actor with TRM/SONA self-learning - Integrate 13 popular Apify scrapers for data grounding - Add 6 use case templates (lead-intelligence, competitor-monitor, etc.) - Include MCP server for AI agent integration - Add comprehensive README with tutorials and SEO optimization - Support generate/integrate/template modes - Add webhook and embedding generation support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
491 lines
14 KiB
Markdown
491 lines
14 KiB
Markdown
# Self-Learning Postgres DB - Vector Database for AI Agents
|
|
|
|
A distributed vector database that **truly learns**. Store embeddings, query with semantic search, and let the index improve itself through TRM (Tiny Recursive Models), SONA (Self-Optimizing Neural Architecture), and Graph Neural Networks.
|
|
|
|
[](https://apify.com/ruv/self-learning-postgres-db)
|
|
[](https://www.postgresql.org/)
|
|
[](https://opensource.org/licenses/MIT)
|
|
[](https://github.com/ruvnet/ruvector)
|
|
|
|
## Key AI Features
|
|
|
|
| Feature | Description |
|
|
|---------|-------------|
|
|
| **TRM** | 7M parameter recursive reasoning (83% on GSM8K) |
|
|
| **SONA** | 3-tier learning (Instant/Background/Deep) |
|
|
| **EWC++** | Anti-forgetting protection (λ=2000) |
|
|
| **GNN** | Graph Neural Network index optimization |
|
|
| **Trajectory Tracking** | Learn from query patterns |
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
**30+ Operations** for complete vector database management:
|
|
|
|
- **Semantic Search** - Find documents by meaning, not just keywords
|
|
- **Batch Operations** - Insert and search thousands of documents efficiently
|
|
- **Hybrid Search** - Combine vector similarity with keyword matching
|
|
- **RAG Support** - Built-in Retrieval-Augmented Generation queries
|
|
- **Self-Learning** - GNN training for index optimization
|
|
- **Clustering** - K-means document clustering
|
|
- **Deduplication** - Find and remove duplicate content
|
|
- **Export/Import** - JSON and CSV data migration
|
|
|
|
**Zero Setup Required:**
|
|
- Embedded PostgreSQL with ruvector extension
|
|
- Local AI embeddings (no OpenAI API key needed)
|
|
- Automatic table and index creation
|
|
|
|
---
|
|
|
|
## Quick Start (30 Seconds)
|
|
|
|
### Full Demo
|
|
|
|
```json
|
|
{
|
|
"action": "full_workflow",
|
|
"query": "How does machine learning work?",
|
|
"documents": [
|
|
{"content": "Machine learning is AI that learns patterns from data.", "metadata": {"category": "AI"}},
|
|
{"content": "PostgreSQL is a powerful relational database.", "metadata": {"category": "Database"}},
|
|
{"content": "Neural networks consist of layers of nodes.", "metadata": {"category": "AI"}},
|
|
{"content": "Vector databases store embeddings for similarity search.", "metadata": {"category": "Database"}}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Result:** Documents ranked by semantic relevance to your query.
|
|
|
|
---
|
|
|
|
## All 38 Actions
|
|
|
|
### Document Operations
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `insert` | Add documents with auto-generated embeddings |
|
|
| `batch_insert` | Efficiently insert large document sets |
|
|
| `get` | Retrieve single document by ID |
|
|
| `list` | List documents with filtering |
|
|
| `update` | Modify existing document content/metadata |
|
|
| `delete` | Remove documents by ID, IDs, or filter |
|
|
| `upsert` | Insert or update (smart merge) |
|
|
|
|
### Search Operations
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `search` | Semantic similarity search |
|
|
| `batch_search` | Multiple queries in one call |
|
|
| `hybrid_search` | Vector + BM25 keyword combined |
|
|
| `multi_query_search` | Aggregate results from multiple queries |
|
|
| `mmr_search` | Maximal Marginal Relevance (diverse results) |
|
|
| `graph_search` | Graph-based similarity traversal |
|
|
| `range_search` | All results within distance threshold |
|
|
|
|
### Table Operations
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `create_table` | Create new vector collection |
|
|
| `drop_table` | Delete collection |
|
|
| `list_tables` | Show all vector tables |
|
|
| `table_stats` | Collection statistics and metrics |
|
|
| `create_index` | Add HNSW or IVFFlat index |
|
|
| `reindex` | Rebuild indexes |
|
|
|
|
### Self-Learning / GNN / SONA
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `train_gnn` | Train Graph Neural Network on data |
|
|
| `optimize_index` | Auto-tune HNSW parameters |
|
|
| `analyze_patterns` | Analyze data distribution |
|
|
| `sona_learn` | Trigger TRM/SONA background learning cycle |
|
|
| `sona_status` | Check SONA learning status and capabilities |
|
|
|
|
### Clustering & Deduplication
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `cluster` | K-means document clustering |
|
|
| `find_duplicates` | Detect similar document pairs |
|
|
| `deduplicate` | Remove duplicate documents |
|
|
|
|
### Data Operations
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `export` | Export to JSON or CSV |
|
|
| `import` | Import from JSON data |
|
|
|
|
### AI / RAG
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `rag_query` | Build RAG context from search results |
|
|
| `summarize` | Document statistics and previews |
|
|
|
|
### Utility
|
|
| Action | Description |
|
|
|--------|-------------|
|
|
| `ping` | Test database connection |
|
|
| `version` | Get version and feature info |
|
|
| `embedding_models` | List available models |
|
|
| `generate_embedding` | Create embeddings without storing |
|
|
| `similarity` | Compare similarity of two texts |
|
|
|
|
---
|
|
|
|
## Use Cases
|
|
|
|
### 1. AI Agent Memory
|
|
|
|
```json
|
|
{
|
|
"action": "insert",
|
|
"tableName": "agent_memory",
|
|
"documents": [
|
|
{"content": "User prefers dark mode", "metadata": {"user_id": "123", "type": "preference"}},
|
|
{"content": "User asked about Python tutorials", "metadata": {"user_id": "123", "type": "history"}}
|
|
]
|
|
}
|
|
```
|
|
|
|
Retrieve memories:
|
|
```json
|
|
{
|
|
"action": "search",
|
|
"tableName": "agent_memory",
|
|
"query": "What does this user like?",
|
|
"filter": "metadata->>'user_id' = '123'"
|
|
}
|
|
```
|
|
|
|
### 2. RAG Pipeline
|
|
|
|
```json
|
|
{
|
|
"action": "rag_query",
|
|
"query": "How do I return a product?",
|
|
"topK": 5,
|
|
"ragMaxTokens": 2000
|
|
}
|
|
```
|
|
|
|
Returns context ready to feed to your LLM.
|
|
|
|
### 3. Batch Document Processing
|
|
|
|
```json
|
|
{
|
|
"action": "batch_insert",
|
|
"batchSize": 100,
|
|
"documents": [
|
|
// ... thousands of documents
|
|
]
|
|
}
|
|
```
|
|
|
|
### 4. Find & Remove Duplicates
|
|
|
|
```json
|
|
{
|
|
"action": "find_duplicates",
|
|
"similarityThreshold": 0.95
|
|
}
|
|
```
|
|
|
|
Then:
|
|
```json
|
|
{
|
|
"action": "deduplicate",
|
|
"similarityThreshold": 0.95
|
|
}
|
|
```
|
|
|
|
### 5. Document Clustering
|
|
|
|
```json
|
|
{
|
|
"action": "cluster",
|
|
"numClusters": 10,
|
|
"clusteringAlgorithm": "kmeans"
|
|
}
|
|
```
|
|
|
|
### 6. Index Optimization
|
|
|
|
```json
|
|
{
|
|
"action": "optimize_index",
|
|
"enableLearning": true
|
|
}
|
|
```
|
|
|
|
### 7. SONA Self-Learning
|
|
|
|
Check learning status:
|
|
```json
|
|
{
|
|
"action": "sona_status"
|
|
}
|
|
```
|
|
|
|
Trigger learning cycle:
|
|
```json
|
|
{
|
|
"action": "sona_learn",
|
|
"ewcLambda": 2000,
|
|
"patternThreshold": 0.7
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Parameters Reference
|
|
|
|
### Core Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `action` | string | `search` | Operation to perform |
|
|
| `connectionString` | string | embedded | PostgreSQL URL for persistence |
|
|
| `tableName` | string | `documents` | Table/collection name |
|
|
|
|
### Search Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `query` | string | - | Natural language search query |
|
|
| `queryVector` | array | - | Pre-computed embedding vector |
|
|
| `topK` | integer | 10 | Number of results |
|
|
| `distanceMetric` | string | `cosine` | cosine, l2, inner_product, manhattan |
|
|
| `filter` | string | - | SQL WHERE clause |
|
|
| `minScore` | number | 0 | Minimum similarity score (0-1) |
|
|
| `maxDistance` | number | - | Maximum distance threshold |
|
|
|
|
### Embedding Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `embeddingModel` | string | `all-MiniLM-L6-v2` | AI embedding model |
|
|
| `generateEmbeddings` | boolean | true | Auto-generate embeddings |
|
|
| `dimensions` | integer | 384 | Vector dimensions |
|
|
|
|
### Index Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `indexType` | string | `hnsw` | hnsw, ivfflat, none |
|
|
| `hnswM` | integer | 16 | HNSW max connections |
|
|
| `hnswEfConstruction` | integer | 64 | HNSW build quality |
|
|
| `hnswEfSearch` | integer | 100 | HNSW search quality |
|
|
| `ivfLists` | integer | 100 | IVFFlat partitions |
|
|
|
|
### GNN Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `enableLearning` | boolean | false | Enable self-learning |
|
|
| `learningRate` | number | 0.01 | GNN learning rate |
|
|
| `gnnLayers` | integer | 2 | GNN layer count |
|
|
| `trainEpochs` | integer | 10 | Training epochs |
|
|
|
|
### SONA / TRM Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `sonaEnabled` | boolean | true | Enable TRM/SONA self-learning |
|
|
| `ewcLambda` | number | 2000 | EWC++ anti-forgetting strength |
|
|
| `patternThreshold` | number | 0.7 | Pattern recognition confidence |
|
|
| `maxTrajectories` | integer | 100 | Max trajectory steps to track |
|
|
| `sonaLearningTiers` | array | ["instant", "background"] | Learning tiers to enable |
|
|
|
|
### Clustering Parameters
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `numClusters` | integer | 10 | K-means clusters |
|
|
| `similarityThreshold` | number | 0.95 | Duplicate detection threshold |
|
|
|
|
---
|
|
|
|
## Embedding Models
|
|
|
|
| Model | Dimensions | Speed | Quality | Best For |
|
|
|-------|------------|-------|---------|----------|
|
|
| `all-MiniLM-L6-v2` | 384 | Fast | Good | Prototyping |
|
|
| `bge-small-en-v1.5` | 384 | Fast | Excellent | Production |
|
|
| `bge-base-en-v1.5` | 768 | Medium | Better | High accuracy |
|
|
| `nomic-embed-text-v1` | 768 | Medium | Good | Long documents (8K) |
|
|
| `gte-small` | 384 | Fast | Good | General use |
|
|
| `e5-small-v2` | 384 | Fast | Good | Multilingual |
|
|
|
|
---
|
|
|
|
## Persistent Storage
|
|
|
|
### Hybrid Persistence Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Actor Run │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
|
│ │ Key-Value │───▶│ Embedded │───▶│ Key-Value │ │
|
|
│ │ Store (load) │ │ PostgreSQL │ │ (save) │ │
|
|
│ └──────────────┘ └──────────────┘ └───────────┘ │
|
|
│ START WORK END │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Flow:**
|
|
1. **On Start** → Load documents from Key-Value Store into embedded PostgreSQL
|
|
2. **During Run** → Full vector search capabilities (HNSW, cosine, etc.)
|
|
3. **On End** → Export documents back to Key-Value Store
|
|
|
|
### Storage Options Comparison
|
|
|
|
| Feature | External PostgreSQL | Apify Key-Value Store |
|
|
|---------|---------------------|----------------------|
|
|
| Setup required | Yes | No |
|
|
| Cost | Separate billing | Included in Apify |
|
|
| Max size | Unlimited | ~9GB per store |
|
|
| Cold start | Fast | Slower (load data) |
|
|
| Best for | Large/production | Small-medium datasets |
|
|
|
|
### External PostgreSQL
|
|
|
|
For persistent storage with external database:
|
|
|
|
```json
|
|
{
|
|
"connectionString": "postgresql://user:password@host:5432/database",
|
|
"action": "search",
|
|
"query": "Your query"
|
|
}
|
|
```
|
|
|
|
**Supported:**
|
|
- PostgreSQL 14+ with ruvector extension
|
|
- PostgreSQL with pgvector (compatibility mode)
|
|
- Supabase, Neon, AWS RDS, etc.
|
|
|
|
---
|
|
|
|
## API Integration
|
|
|
|
### Python
|
|
```python
|
|
from apify_client import ApifyClient
|
|
|
|
client = ApifyClient("your-api-token")
|
|
run = client.actor("ruv/self-learning-postgres-db").call(run_input={
|
|
"action": "search",
|
|
"query": "machine learning basics",
|
|
"topK": 5
|
|
})
|
|
results = client.dataset(run["defaultDatasetId"]).list_items().items
|
|
```
|
|
|
|
### JavaScript
|
|
```javascript
|
|
import { ApifyClient } from 'apify-client';
|
|
|
|
const client = new ApifyClient({ token: 'your-api-token' });
|
|
const run = await client.actor('ruv/self-learning-postgres-db').call({
|
|
action: 'search',
|
|
query: 'machine learning basics',
|
|
topK: 5
|
|
});
|
|
const results = await client.dataset(run.defaultDatasetId).listItems();
|
|
```
|
|
|
|
### cURL
|
|
```bash
|
|
curl -X POST "https://api.apify.com/v2/acts/ruv~self-learning-postgres-db/runs" \
|
|
-H "Authorization: Bearer YOUR_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"action": "search",
|
|
"query": "machine learning",
|
|
"topK": 10
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## Performance
|
|
|
|
Built on PostgreSQL 17.7 with AVX-512 SIMD acceleration:
|
|
|
|
| Dataset Size | Search Time | Accuracy |
|
|
|--------------|-------------|----------|
|
|
| 10,000 docs | ~0.3ms | 99%+ |
|
|
| 100,000 docs | ~0.5ms | 99%+ |
|
|
| 1,000,000 docs | ~1.2ms | 98%+ |
|
|
|
|
---
|
|
|
|
## Pricing (Apify Pay-per-event)
|
|
|
|
### Core Events
|
|
| Event | Price | Description |
|
|
|-------|-------|-------------|
|
|
| Actor Start | $0.001 | Per GB memory used |
|
|
| Document Insert | $0.001 | Per document stored |
|
|
| Vector Search | $0.001 | Per search query |
|
|
| Result | $0.0005 | Per result returned |
|
|
|
|
### Advanced Operations
|
|
| Event | Price | Description |
|
|
|-------|-------|-------------|
|
|
| Batch Operation | $0.002 | Per batch insert/search |
|
|
| RAG Query | $0.002 | Per RAG context build |
|
|
| GNN Training | $0.01 | Per training session |
|
|
| Clustering | $0.005 | Per cluster operation |
|
|
| Deduplication | $0.003 | Per dedupe run |
|
|
| Data Export | $0.002 | Per export |
|
|
| Data Import | $0.002 | Per import |
|
|
| Table Operation | $0.001 | Create/drop table |
|
|
| Index Operation | $0.002 | Create/optimize index |
|
|
| Similarity Check | $0.001 | Per comparison |
|
|
| Embedding Generation | $0.001 | Per embedding |
|
|
|
|
**Volume Discounts:**
|
|
- Bronze: -14% off results
|
|
- Silver: -26% off results
|
|
- Gold: -40% off results
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Local Testing
|
|
|
|
```bash
|
|
# Start ruvector-postgres
|
|
docker run -d --name ruvector-pg -e POSTGRES_PASSWORD=secret -p 5432:5432 ruvnet/ruvector-postgres:latest
|
|
|
|
# Run tests
|
|
DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" npm test
|
|
```
|
|
|
|
### Deployment
|
|
|
|
```bash
|
|
# Set your API token in root .env
|
|
echo "APIFY_API_TOKEN=your_token" >> ../../../.env
|
|
|
|
# Deploy
|
|
npm run deploy
|
|
```
|
|
|
|
---
|
|
|
|
## Links
|
|
|
|
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
|
- [Apify Store](https://apify.com/ruv/self-learning-postgres-db)
|
|
- [Docker Image](https://hub.docker.com/r/ruvnet/ruvector-postgres)
|
|
- [RuVector Documentation](https://github.com/ruvnet/ruvector/tree/main/crates/ruvector-postgres)
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
- [Open an Issue](https://github.com/ruvnet/ruvector/issues)
|
|
- [Apify Community](https://discord.gg/apify)
|
|
|
|
---
|
|
|
|
**Built with RuVector** - High-performance vector search with TRM/SONA self-learning for the AI era.
|