Self-Learning Postgres DB - Vector Database for AI Agents
A distributed vector database that truly learns. Store embeddings, query with semantic search, and let the index improve itself through TRM (Tiny Recursive Models), SONA (Self-Optimizing Neural Architecture), and Graph Neural Networks.

Key AI Features
| Feature |
Description |
| TRM |
7M parameter recursive reasoning (83% on GSM8K) |
| SONA |
3-tier learning (Instant/Background/Deep) |
| EWC++ |
Anti-forgetting protection (λ=2000) |
| GNN |
Graph Neural Network index optimization |
| Trajectory Tracking |
Learn from query patterns |
Features
30+ Operations for complete vector database management:
- Semantic Search - Find documents by meaning, not just keywords
- Batch Operations - Insert and search thousands of documents efficiently
- Hybrid Search - Combine vector similarity with keyword matching
- RAG Support - Built-in Retrieval-Augmented Generation queries
- Self-Learning - GNN training for index optimization
- Clustering - K-means document clustering
- Deduplication - Find and remove duplicate content
- Export/Import - JSON and CSV data migration
Zero Setup Required:
- Embedded PostgreSQL with ruvector extension
- Local AI embeddings (no OpenAI API key needed)
- Automatic table and index creation
Quick Start (30 Seconds)
Full Demo
{
"action": "full_workflow",
"query": "How does machine learning work?",
"documents": [
{"content": "Machine learning is AI that learns patterns from data.", "metadata": {"category": "AI"}},
{"content": "PostgreSQL is a powerful relational database.", "metadata": {"category": "Database"}},
{"content": "Neural networks consist of layers of nodes.", "metadata": {"category": "AI"}},
{"content": "Vector databases store embeddings for similarity search.", "metadata": {"category": "Database"}}
]
}
Result: Documents ranked by semantic relevance to your query.
All 38 Actions
Document Operations
| Action |
Description |
insert |
Add documents with auto-generated embeddings |
batch_insert |
Efficiently insert large document sets |
get |
Retrieve single document by ID |
list |
List documents with filtering |
update |
Modify existing document content/metadata |
delete |
Remove documents by ID, IDs, or filter |
upsert |
Insert or update (smart merge) |
Search Operations
| Action |
Description |
search |
Semantic similarity search |
batch_search |
Multiple queries in one call |
hybrid_search |
Vector + BM25 keyword combined |
multi_query_search |
Aggregate results from multiple queries |
mmr_search |
Maximal Marginal Relevance (diverse results) |
graph_search |
Graph-based similarity traversal |
range_search |
All results within distance threshold |
Table Operations
| Action |
Description |
create_table |
Create new vector collection |
drop_table |
Delete collection |
list_tables |
Show all vector tables |
table_stats |
Collection statistics and metrics |
create_index |
Add HNSW or IVFFlat index |
reindex |
Rebuild indexes |
Self-Learning / GNN / SONA
| Action |
Description |
train_gnn |
Train Graph Neural Network on data |
optimize_index |
Auto-tune HNSW parameters |
analyze_patterns |
Analyze data distribution |
sona_learn |
Trigger TRM/SONA background learning cycle |
sona_status |
Check SONA learning status and capabilities |
Clustering & Deduplication
| Action |
Description |
cluster |
K-means document clustering |
find_duplicates |
Detect similar document pairs |
deduplicate |
Remove duplicate documents |
Data Operations
| Action |
Description |
export |
Export to JSON or CSV |
import |
Import from JSON data |
AI / RAG
| Action |
Description |
rag_query |
Build RAG context from search results |
summarize |
Document statistics and previews |
Utility
| Action |
Description |
ping |
Test database connection |
version |
Get version and feature info |
embedding_models |
List available models |
generate_embedding |
Create embeddings without storing |
similarity |
Compare similarity of two texts |
Use Cases
1. AI Agent Memory
{
"action": "insert",
"tableName": "agent_memory",
"documents": [
{"content": "User prefers dark mode", "metadata": {"user_id": "123", "type": "preference"}},
{"content": "User asked about Python tutorials", "metadata": {"user_id": "123", "type": "history"}}
]
}
Retrieve memories:
{
"action": "search",
"tableName": "agent_memory",
"query": "What does this user like?",
"filter": "metadata->>'user_id' = '123'"
}
2. RAG Pipeline
{
"action": "rag_query",
"query": "How do I return a product?",
"topK": 5,
"ragMaxTokens": 2000
}
Returns context ready to feed to your LLM.
3. Batch Document Processing
{
"action": "batch_insert",
"batchSize": 100,
"documents": [
// ... thousands of documents
]
}
4. Find & Remove Duplicates
{
"action": "find_duplicates",
"similarityThreshold": 0.95
}
Then:
{
"action": "deduplicate",
"similarityThreshold": 0.95
}
5. Document Clustering
{
"action": "cluster",
"numClusters": 10,
"clusteringAlgorithm": "kmeans"
}
6. Index Optimization
{
"action": "optimize_index",
"enableLearning": true
}
7. SONA Self-Learning
Check learning status:
{
"action": "sona_status"
}
Trigger learning cycle:
{
"action": "sona_learn",
"ewcLambda": 2000,
"patternThreshold": 0.7
}
Parameters Reference
Core Parameters
| Parameter |
Type |
Default |
Description |
action |
string |
search |
Operation to perform |
connectionString |
string |
embedded |
PostgreSQL URL for persistence |
tableName |
string |
documents |
Table/collection name |
Search Parameters
| Parameter |
Type |
Default |
Description |
query |
string |
- |
Natural language search query |
queryVector |
array |
- |
Pre-computed embedding vector |
topK |
integer |
10 |
Number of results |
distanceMetric |
string |
cosine |
cosine, l2, inner_product, manhattan |
filter |
string |
- |
SQL WHERE clause |
minScore |
number |
0 |
Minimum similarity score (0-1) |
maxDistance |
number |
- |
Maximum distance threshold |
Embedding Parameters
| Parameter |
Type |
Default |
Description |
embeddingModel |
string |
all-MiniLM-L6-v2 |
AI embedding model |
generateEmbeddings |
boolean |
true |
Auto-generate embeddings |
dimensions |
integer |
384 |
Vector dimensions |
Index Parameters
| Parameter |
Type |
Default |
Description |
indexType |
string |
hnsw |
hnsw, ivfflat, none |
hnswM |
integer |
16 |
HNSW max connections |
hnswEfConstruction |
integer |
64 |
HNSW build quality |
hnswEfSearch |
integer |
100 |
HNSW search quality |
ivfLists |
integer |
100 |
IVFFlat partitions |
GNN Parameters
| Parameter |
Type |
Default |
Description |
enableLearning |
boolean |
false |
Enable self-learning |
learningRate |
number |
0.01 |
GNN learning rate |
gnnLayers |
integer |
2 |
GNN layer count |
trainEpochs |
integer |
10 |
Training epochs |
SONA / TRM Parameters
| Parameter |
Type |
Default |
Description |
sonaEnabled |
boolean |
true |
Enable TRM/SONA self-learning |
ewcLambda |
number |
2000 |
EWC++ anti-forgetting strength |
patternThreshold |
number |
0.7 |
Pattern recognition confidence |
maxTrajectories |
integer |
100 |
Max trajectory steps to track |
sonaLearningTiers |
array |
["instant", "background"] |
Learning tiers to enable |
Clustering Parameters
| Parameter |
Type |
Default |
Description |
numClusters |
integer |
10 |
K-means clusters |
similarityThreshold |
number |
0.95 |
Duplicate detection threshold |
Embedding Models
| Model |
Dimensions |
Speed |
Quality |
Best For |
all-MiniLM-L6-v2 |
384 |
Fast |
Good |
Prototyping |
bge-small-en-v1.5 |
384 |
Fast |
Excellent |
Production |
bge-base-en-v1.5 |
768 |
Medium |
Better |
High accuracy |
nomic-embed-text-v1 |
768 |
Medium |
Good |
Long documents (8K) |
gte-small |
384 |
Fast |
Good |
General use |
e5-small-v2 |
384 |
Fast |
Good |
Multilingual |
Persistent Storage
Hybrid Persistence Architecture
┌─────────────────────────────────────────────────────────┐
│ Actor Run │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Key-Value │───▶│ Embedded │───▶│ Key-Value │ │
│ │ Store (load) │ │ PostgreSQL │ │ (save) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ START WORK END │
└─────────────────────────────────────────────────────────┘
Flow:
- On Start → Load documents from Key-Value Store into embedded PostgreSQL
- During Run → Full vector search capabilities (HNSW, cosine, etc.)
- On End → Export documents back to Key-Value Store
Storage Options Comparison
| Feature |
External PostgreSQL |
Apify Key-Value Store |
| Setup required |
Yes |
No |
| Cost |
Separate billing |
Included in Apify |
| Max size |
Unlimited |
~9GB per store |
| Cold start |
Fast |
Slower (load data) |
| Best for |
Large/production |
Small-medium datasets |
External PostgreSQL
For persistent storage with external database:
{
"connectionString": "postgresql://user:password@host:5432/database",
"action": "search",
"query": "Your query"
}
Supported:
- PostgreSQL 14+ with ruvector extension
- PostgreSQL with pgvector (compatibility mode)
- Supabase, Neon, AWS RDS, etc.
API Integration
Python
from apify_client import ApifyClient
client = ApifyClient("your-api-token")
run = client.actor("ruv/self-learning-postgres-db").call(run_input={
"action": "search",
"query": "machine learning basics",
"topK": 5
})
results = client.dataset(run["defaultDatasetId"]).list_items().items
JavaScript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'your-api-token' });
const run = await client.actor('ruv/self-learning-postgres-db').call({
action: 'search',
query: 'machine learning basics',
topK: 5
});
const results = await client.dataset(run.defaultDatasetId).listItems();
cURL
curl -X POST "https://api.apify.com/v2/acts/ruv~self-learning-postgres-db/runs" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"action": "search",
"query": "machine learning",
"topK": 10
}'
Performance
Built on PostgreSQL 17.7 with AVX-512 SIMD acceleration:
| Dataset Size |
Search Time |
Accuracy |
| 10,000 docs |
~0.3ms |
99%+ |
| 100,000 docs |
~0.5ms |
99%+ |
| 1,000,000 docs |
~1.2ms |
98%+ |
Pricing (Apify Pay-per-event)
Core Events
| Event |
Price |
Description |
| Actor Start |
$0.001 |
Per GB memory used |
| Document Insert |
$0.001 |
Per document stored |
| Vector Search |
$0.001 |
Per search query |
| Result |
$0.0005 |
Per result returned |
Advanced Operations
| Event |
Price |
Description |
| Batch Operation |
$0.002 |
Per batch insert/search |
| RAG Query |
$0.002 |
Per RAG context build |
| GNN Training |
$0.01 |
Per training session |
| Clustering |
$0.005 |
Per cluster operation |
| Deduplication |
$0.003 |
Per dedupe run |
| Data Export |
$0.002 |
Per export |
| Data Import |
$0.002 |
Per import |
| Table Operation |
$0.001 |
Create/drop table |
| Index Operation |
$0.002 |
Create/optimize index |
| Similarity Check |
$0.001 |
Per comparison |
| Embedding Generation |
$0.001 |
Per embedding |
Volume Discounts:
- Bronze: -14% off results
- Silver: -26% off results
- Gold: -40% off results
Development
Local Testing
# Start ruvector-postgres
docker run -d --name ruvector-pg -e POSTGRES_PASSWORD=secret -p 5432:5432 ruvnet/ruvector-postgres:latest
# Run tests
DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" npm test
Deployment
# Set your API token in root .env
echo "APIFY_API_TOKEN=your_token" >> ../../../.env
# Deploy
npm run deploy
Links
Support
Built with RuVector - High-performance vector search with TRM/SONA self-learning for the AI era.