ruvector/examples/apify/ai-memory-engine
rUv 2288e79bd8 feat(ai-memory-engine): Integrate persistence backends + add tests
Integration:
- Added storageBackend input parameter parsing
- Integrated createPersistence() into main actor flow
- Load session uses new persistence with legacy fallback
- Save session uses new persistence with legacy fallback
- Added persistence result to output metadata
- Proper connection cleanup on exit

Test Suite (tests/persistence-benchmark.js):
- Hyperbolic geometry: projection, distance, Möbius add, exp/log maps
- Binary persistence: 4.8x smaller than JSON, perfect fidelity
- Session persistence: data survives across simulated runs
- Benchmarks: 2M ops/sec Euclidean, 0.4M ops/sec hyperbolic

Results:
- Binary format: 4.8x size reduction (7.06 MB → 1.46 MB)
- Serialization: 39x faster (54ms → 1.4ms)
- All hyperbolic geometry functions verified correct

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 05:15:53 +00:00
..
.actor feat(ai-memory-engine): Add 8 vector storage backends + hyperbolic geometry 2025-12-14 04:12:06 +00:00
src feat(ai-memory-engine): Integrate persistence backends + add tests 2025-12-14 05:15:53 +00:00
tests feat(ai-memory-engine): Integrate persistence backends + add tests 2025-12-14 05:15:53 +00:00
package.json feat(apify): Add AI Synthetic Data Generator with MCP & Actor Integration 2025-12-13 16:27:54 +00:00
README.md feat(ai-memory-engine): Add 8 vector storage backends + hyperbolic geometry 2025-12-14 04:12:06 +00:00

AI Memory Engine - Smart Database That Learns & Remembers

Give your AI persistent memory. Store conversations, search semantically, build knowledge graphs, and watch your AI get smarter with every interaction. The ultimate memory solution for AI agents, chatbots, and intelligent automation.

🧠 Self-Learning · 🔍 Semantic Search · 🕸️ Knowledge Graphs · Sub-millisecond · 🔗 LLM Agnostic · 🔌 One-Click Integrations · 📊 8 Vector DB Backends · 🌀 Hyperbolic Embeddings

Apify Actor License: MIT


Why AI Memory Engine?

The Problem: Every AI application needs memory, but building it is complex. You need embeddings, vector databases, search algorithms, and persistence - all expensive and time-consuming to set up.

The Solution: AI Memory Engine gives you production-ready AI memory in one click. Built on RuvLLM with native SIMD acceleration, HNSW indexing, and SONA self-learning - no external dependencies required.

What Makes This Different

Feature AI Memory Engine Traditional Solutions
Setup Time 1 minute Hours to days
Cost Pay per use ($0.001/operation) $50-500/month subscriptions
Memory Persistence 8 storage backends (Supabase, Qdrant, Pinecone, etc.) Single DB only
Self-Learning SONA neural architecture Manual tuning
Apify Integration One-click from any actor Custom code required
Hyperbolic Geometry Poincaré ball for hierarchical data Euclidean only
Export Options 6 vector DB formats Limited

Key Capabilities

27 Actions including:

  • Core: store, search, get, list, update, delete, clear
  • Advanced Search: batch_search, hybrid_search, find_duplicates, deduplicate
  • AI Features: chat, recommend, analyze, build_knowledge, learn
  • Integration: integrate_actor, integrate_synthetic, integrate_scraper, template
  • Utilities: natural language commands, cluster, export_vectordb, feedback

8 Storage Backends:

  • Apify Binary - Default, 4x smaller than JSON with HNSW re-indexing
  • PostgreSQL/Supabase/Neon - pgvector for production deployments
  • Qdrant - High-throughput cloud vector DB
  • Pinecone - Serverless managed infrastructure
  • Weaviate - GraphQL-based hybrid search
  • LanceDB - Local embedded database
  • Hyperbolic - Poincaré ball for hierarchical data

One-Click Actor Integration with 10+ popular scrapers:

  • Google Maps, Instagram, TikTok, YouTube, Twitter, Amazon, TripAdvisor, LinkedIn, and more
  • Automatically memorize any scraper results with semantic search

6 Pre-Built Templates for instant deployment:

  • Lead Intelligence, Customer Support, Research Assistant
  • Competitor Intelligence, Content Library, Product Catalog

What Does This Do?

Without AI Memory With AI Memory Engine
AI forgets everything between sessions Remembers all conversations
Same questions, same generic answers Personalized, context-aware responses
No learning from interactions Gets smarter with every use
Expensive vector DB subscriptions Built-in, no external dependencies
Complex RAG setup One-click semantic search
Manual integration with scrapers One-click memory from any actor
Learn new tools for each vector DB Export to Pinecone, Weaviate, ChromaDB

Use Cases

💬 Chatbots with Memory

{
  "action": "chat",
  "chatMessage": "What products did I look at last week?",
  "sessionId": "customer-123"
}

Your chatbot remembers every conversation and provides personalized responses.

📚 RAG (Retrieval Augmented Generation)

{
  "action": "store",
  "memories": [
    {"text": "Product X requires 220V power supply", "metadata": {"type": "specs"}},
    {"text": "Product X comes with 2-year warranty", "metadata": {"type": "warranty"}}
  ]
}

Then search with natural language:

{
  "action": "search",
  "query": "What's the warranty on Product X?"
}

🛍️ Recommendation Engine

{
  "action": "recommend",
  "query": "customer interested in home automation"
}

Get personalized recommendations based on learned patterns.

🧠 Knowledge Graph

{
  "action": "build_knowledge",
  "memories": [
    "John works at TechCorp as a Senior Engineer",
    "TechCorp is located in San Francisco",
    "John manages the AI team"
  ]
}

Automatically extract entities and relationships.


Quick Start (1 Minute)

Try the Demo

{
  "action": "demo"
}

This will:

  1. Store sample memories (customer preferences, product info, support tickets)
  2. Run semantic searches
  3. Generate recommendations
  4. Show you what's possible

Output:

{
  "demo": true,
  "memoriesStored": 8,
  "sampleSearch": {
    "query": "What does the customer prefer?",
    "results": [
      {
        "text": "Customer prefers eco-friendly products and fast shipping",
        "similarity": 0.89
      }
    ]
  }
}

Core Features

1. Store Memories

Add any text to your AI's memory:

{
  "action": "store",
  "memories": [
    {"text": "Customer John prefers phone support over email", "metadata": {"customerId": "C001", "type": "preference"}},
    {"text": "Issue resolved by offering free shipping upgrade", "metadata": {"type": "resolution", "success": true}},
    {"text": "User mentioned they have a small apartment", "metadata": {"type": "context", "userId": "U001"}}
  ]
}

Output:

{
  "stored": 3,
  "totalMemories": 3,
  "memories": [
    {"id": "mem_1702489200_0", "text": "Customer John prefers phone support..."}
  ]
}

Find relevant memories using natural language:

{
  "action": "search",
  "query": "How do we usually resolve customer complaints?",
  "topK": 5,
  "similarityThreshold": 0.6
}

Output:

{
  "query": "How do we usually resolve customer complaints?",
  "resultsFound": 2,
  "results": [
    {
      "text": "Issue resolved by offering free shipping upgrade",
      "similarity": 0.82,
      "metadata": {"type": "resolution", "success": true}
    },
    {
      "text": "Customer complaint resolved by offering 20% discount",
      "similarity": 0.78
    }
  ]
}

3. Chat with Memory

Have conversations where AI remembers everything:

{
  "action": "chat",
  "chatMessage": "What do we know about customer John?",
  "chatHistory": [
    {"role": "user", "content": "I need to call John today"},
    {"role": "assistant", "content": "I can help you prepare for the call."}
  ],
  "sessionId": "support-session-1",
  "provider": "gemini",
  "apiKey": "your-gemini-key"
}

Output:

{
  "message": "What do we know about customer John?",
  "response": "Based on our records, John prefers phone support over email. He mentioned having a small apartment, which might be relevant for product recommendations.",
  "contextUsed": 2,
  "relevantMemories": [
    {"text": "Customer John prefers phone support over email", "similarity": 0.91}
  ]
}

4. Build Knowledge Graphs

Automatically extract entities and relationships:

{
  "action": "build_knowledge",
  "memories": [
    "Apple Inc was founded by Steve Jobs in California",
    "Steve Jobs was CEO of Apple until 2011",
    "Tim Cook became CEO of Apple in 2011",
    "Apple headquarters is in Cupertino, California"
  ]
}

Output:

{
  "nodesCreated": 6,
  "edgesCreated": 8,
  "topEntities": [
    {"label": "Apple", "mentions": 4},
    {"label": "Steve Jobs", "mentions": 2},
    {"label": "California", "mentions": 2}
  ]
}

5. Analyze Patterns

Get insights from your stored memories:

{
  "action": "analyze"
}

Output:

{
  "totalMemories": 150,
  "insights": [
    "You have 150 memories stored",
    "42 searches performed with 128 results returned",
    "Most common metadata keys: type, customerId, category",
    "Top keyword: 'shipping' (23 occurrences)"
  ],
  "topKeywords": [
    {"word": "shipping", "count": 23},
    {"word": "customer", "count": 18}
  ]
}

Integrations

🔗 Integrate with Synthetic Data Generator

Generate test data and automatically memorize it:

{
  "action": "integrate_synthetic",
  "integrationConfig": {
    "syntheticDataActor": "ruv/ai-synthetic-data-generator",
    "dataType": "ecommerce",
    "count": 1000,
    "memorizeFields": ["title", "description", "category"]
  }
}

This calls the AI Synthetic Data Generator and stores the results as searchable memories.

Supported data types:

  • ecommerce - Product catalogs
  • jobs - Job listings
  • real_estate - Property listings
  • social - Social media posts
  • stock_trading - Market data
  • medical - Healthcare records
  • company - Corporate data

🌐 Integrate with Web Scraper

Scrape websites and build a searchable knowledge base:

{
  "action": "integrate_scraper",
  "scraperConfig": {
    "urls": [
      "https://docs.example.com/getting-started",
      "https://docs.example.com/api-reference"
    ],
    "selector": "article",
    "maxPages": 50
  }
}

Perfect for:

  • Documentation search
  • Competitor analysis
  • Content aggregation
  • Research databases

🔌 One-Click Actor Integration

Instantly memorize results from any Apify actor:

{
  "action": "integrate_actor",
  "actorId": "apify/google-maps-scraper",
  "actorConfig": {
    "runId": "latest",
    "memorizeFields": ["title", "address", "totalScore", "categoryName"],
    "limit": 500
  }
}

Supported actors (one-click ready):

Actor What Gets Memorized
apify/google-maps-scraper Business name, address, rating, reviews
apify/instagram-scraper Captions, hashtags, engagement
apify/tiktok-scraper Video text, author, play count
apify/youtube-scraper Titles, descriptions, view counts
apify/twitter-scraper Tweets, authors, engagement
apify/amazon-scraper Products, prices, ratings
apify/tripadvisor-scraper Venues, ratings, locations
apify/linkedin-scraper Profiles, headlines, companies
apify/web-scraper Any website content
apify/website-content-crawler Full page content

Example: Build a local business database:

{
  "action": "integrate_actor",
  "actorId": "apify/google-maps-scraper",
  "actorConfig": {
    "actorInput": {
      "searchStringsArray": ["restaurants near San Francisco"],
      "maxCrawledPlaces": 100
    }
  }
}

Then search naturally:

{
  "action": "search",
  "query": "highly rated Italian restaurants with parking"
}

Pre-Built Templates

Get started instantly with industry templates:

{
  "action": "template",
  "template": "customer-support"
}

Available templates:

Template Use Case
lead-intelligence Sales lead tracking, CRM enrichment
customer-support FAQ knowledge base, ticket resolution
research-assistant Academic & market research
competitor-intelligence Market tracking, competitive analysis
content-library Content ideas, editorial planning
product-catalog E-commerce knowledge base

Each template includes:

  • Sample memories for immediate use
  • Suggested search queries
  • Optimized metadata structure

Natural Language Commands

Talk to your memory database in plain English:

{
  "action": "natural",
  "command": "remember that John prefers email communication"
}

Supported commands:

Command Action
remember [text] Store a new memory
forget about [topic] Remove related memories
what do you know about [topic] Semantic search
find [query] Search memories
how many memories Get stats
list memories Show all memories
clear everything Reset database
analyze Get insights
find duplicates Detect duplicates
export Download data
help Show commands

Examples:

{"action": "natural", "command": "what do you know about customer preferences"}
{"action": "natural", "command": "forget about old pricing"}
{"action": "natural", "command": "analyze"}

Memory Clustering

Automatically group memories by similarity:

{
  "action": "cluster",
  "numClusters": 5
}

Output:

{
  "totalMemories": 150,
  "numClusters": 5,
  "clusters": [
    {
      "id": 0,
      "label": "customer, preferences, shipping",
      "keywords": ["customer", "preferences", "shipping", "support", "email"],
      "size": 34,
      "sampleMemories": [...]
    },
    {
      "id": 1,
      "label": "product, pricing, features",
      "keywords": ["product", "pricing", "features", "specifications"],
      "size": 28,
      "sampleMemories": [...]
    }
  ]
}

Use cases:

  • Discover topics in large memory collections
  • Identify content gaps
  • Organize knowledge automatically
  • Find related memories without a query

Vector Storage Backends

8 storage backends for different scalability and persistence needs:

{
  "action": "store",
  "memories": [...],
  "storageBackend": "supabase",
  "postgresUrl": "postgres://user:pass@db.supabase.co:5432/postgres"
}

Available Backends

Backend Best For Features
apify-binary Default, simple persistence 4x smaller than JSON, automatic HNSW re-indexing
postgres / supabase / neon Production deployments pgvector extension, SQL queries, ACID compliance
qdrant High-throughput search Built-in filtering, clustering, 100M+ vectors
pinecone Serverless, zero ops Managed infrastructure, auto-scaling
weaviate GraphQL queries Hybrid search, graph traversal
lancedb Local/embedded No server needed, ML-pipeline friendly
hyperbolic Hierarchical data Poincaré ball embeddings, tree structures

PostgreSQL / Supabase / Neon

Use pgvector for production-grade vector search:

{
  "action": "store",
  "memories": [{"text": "Product X specs..."}],
  "storageBackend": "supabase",
  "postgresUrl": "postgres://user:pass@db.supabase.co:5432/postgres",
  "sessionId": "my-app"
}

Features:

  • IVFFlat and HNSW indexing
  • SQL filtering with vector search
  • Automatic schema creation
  • Incremental updates (upsert)

Qdrant Cloud

High-performance vector search at scale:

{
  "action": "store",
  "storageBackend": "qdrant",
  "qdrantUrl": "https://your-cluster.qdrant.io",
  "qdrantApiKey": "your-api-key"
}

Pinecone Serverless

Zero-ops managed vector database:

{
  "action": "store",
  "storageBackend": "pinecone",
  "pineconeApiKey": "your-api-key",
  "pineconeIndex": "my-memories"
}

Hyperbolic Embeddings

NEW: Poincaré ball embeddings for hierarchical and tree-structured data:

{
  "action": "store",
  "memories": [
    {"text": "CEO leads the company", "metadata": {"level": 1}},
    {"text": "VP Engineering reports to CEO", "metadata": {"level": 2}},
    {"text": "Senior Engineer reports to VP", "metadata": {"level": 3}}
  ],
  "storageBackend": "hyperbolic",
  "curvature": 1.0
}

Why Hyperbolic?

Traditional Euclidean embeddings struggle with:

  • Taxonomies (products → categories → subcategories)
  • Organization charts (CEO → VPs → Managers → Staff)
  • Knowledge graphs (concepts → related concepts)
  • File systems (directories → subdirectories → files)

Hyperbolic space naturally represents hierarchical relationships because:

  • Distance from origin = depth in hierarchy
  • Points near boundary = leaf nodes
  • Points near center = root nodes

Hyperbolic Features

Feature Description
poincareDistance True geodesic distance in curved space
mobiusAdd Addition operation preserving geometry
frechetMean Hyperbolic centroid computation
hyperbolicKMeans Clustering respecting hierarchy
expMap / logMap Tangent space operations

Hyperbolic Clustering

Group hierarchical data naturally:

{
  "action": "cluster",
  "numClusters": 5,
  "hyperbolicClustering": true,
  "curvature": 1.0
}

Vector DB Export

Export to any vector database:

{
  "action": "export_vectordb",
  "vectorDbFormat": "pinecone"
}

Supported formats:

Format Database Output Structure
pinecone Pinecone {id, values, metadata}
weaviate Weaviate {id, vector, properties}
chromadb ChromaDB {ids, embeddings, documents, metadatas}
qdrant Qdrant {id, vector, payload}
langchain LangChain {pageContent, metadata, embeddings}
openai OpenAI-compatible {input, embedding}

Example: Migrate to Pinecone

{
  "action": "export_vectordb",
  "vectorDbFormat": "pinecone"
}

Output:

{
  "format": "pinecone",
  "vectors": [
    {
      "id": "mem_123",
      "values": [0.123, 0.456, ...],
      "metadata": {"text": "Customer prefers...", "type": "preference"}
    }
  ],
  "namespace": "default"
}

Session Persistence

Keep memories across multiple runs:

{
  "action": "store",
  "memories": [{"text": "New customer preference discovered"}],
  "sessionId": "my-project-memory"
}

Later, in another run:

{
  "action": "search",
  "query": "customer preferences",
  "sessionId": "my-project-memory"
}

All memories from the session are automatically restored.


Export & Import

Export Memories

{
  "action": "export",
  "exportFormat": "json"
}

Formats available:

  • json - Full export with metadata and embeddings
  • csv - Spreadsheet compatible
  • embeddings - Raw vectors for ML pipelines

Import Memories

{
  "action": "import",
  "importData": {
    "memories": [
      {"text": "Imported memory 1", "metadata": {}, "embedding": [...]}
    ]
  }
}

Configuration Options

Embedding Models

Model Dimensions Speed Quality API Required
local-384 384 Fastest Good No
local-768 768 Fast Better No
gemini 768 Normal Best Yes (free tier)
openai 1536 Normal Best Yes

Distance Metrics

Metric Best For
cosine Text similarity (default)
euclidean Numerical data
dot_product Normalized vectors
manhattan Outlier-resistant

AI Providers

Provider Models Available
local No LLM (search only)
gemini Gemini 2.0 Flash, 1.5 Pro
openrouter GPT-4o, Claude 3.5, Llama 3.3, 100+ models

API Integration

Python

from apify_client import ApifyClient

client = ApifyClient("your-api-token")

# Store memories
client.actor("ruv/ai-memory-engine").call(run_input={
    "action": "store",
    "memories": [{"text": "Customer feedback: Great product!", "metadata": {"type": "feedback"}}],
    "sessionId": "my-app"
})

# Search memories
result = client.actor("ruv/ai-memory-engine").call(run_input={
    "action": "search",
    "query": "What feedback have we received?",
    "sessionId": "my-app"
})

print(result["defaultDatasetId"])

JavaScript

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'your-api-token' });

// Chat with memory
const run = await client.actor('ruv/ai-memory-engine').call({
    action: 'chat',
    chatMessage: 'What do customers like about our product?',
    sessionId: 'support-bot',
    provider: 'gemini',
    apiKey: process.env.GEMINI_API_KEY
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].result.response);

cURL

curl -X POST "https://api.apify.com/v2/acts/ruv~ai-memory-engine/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "search",
    "query": "shipping preferences",
    "sessionId": "customer-data"
  }'

Pricing

This actor uses Pay-Per-Event pricing:

Event Price Description
apify-actor-start $0.00005 Actor startup (per GB memory)
apify-default-dataset-item $0.00001 Result in dataset
memory-store $0.001 Store memory with RuvLLM embeddings
memory-search $0.001 Semantic search with HNSW
chat-interaction $0.003 AI chat with memory context
knowledge-graph-build $0.005 Build knowledge graph
recommendation $0.002 Get recommendations (per batch)
pattern-analysis $0.003 Analyze patterns with SONA
memory-export $0.002 Export database
memory-import $0.002 Import data
synthetic-integration $0.005 Integrate with synthetic data
scraper-integration $0.01 Integrate with web scraper
learning-cycle $0.005 Force SONA learning cycle

Example costs:

  • Store 1,000 memories: $1.00
  • 1,000 searches: $1.00
  • 100 chat interactions: $0.30
  • Build knowledge graph: $0.005

Performance

Operation Latency Throughput
Store memory ~2ms 500/sec
Search (1000 memories) ~5ms 200/sec
Search (10000 memories) ~20ms 50/sec
Chat with context ~100ms 10/sec

FAQ

Q: How long are memories stored? A: Memories persist as long as you use a sessionId. Without a session, memories exist only for the run.

Q: Can I use this with my own LLM? A: Yes! Use OpenRouter to access 100+ models including GPT-4, Claude, Llama, Mistral, and more.

Q: Is there a limit on memories? A: No hard limit. Performance is optimized for up to 100,000 memories per session.

Q: Can I use this for production? A: Absolutely! The actor is designed for production workloads with session persistence and high throughput.

Q: Does it work without an API key? A: Yes! Local embeddings and search work without any API. LLM features require Gemini or OpenRouter key.


MCP Integration (Model Context Protocol)

AI Memory Engine is fully compatible with Apify's MCP server, allowing AI agents (Claude, GPT-4, etc.) to directly interact with your memory database.

Quick Setup

Add to Claude Code (one command):

claude mcp add ai-memory --transport sse --url "https://mcp.apify.com/sse?token=YOUR_APIFY_TOKEN&actors=ruv/ai-memory-engine"

Replace YOUR_APIFY_TOKEN with your Apify API token.

For Claude Desktop / VS Code / Cursor (config file):

{
  "mcpServers": {
    "ai-memory": {
      "transport": "sse",
      "url": "https://mcp.apify.com/sse?token=YOUR_APIFY_TOKEN&actors=ruv/ai-memory-engine"
    }
  }
}

Alternative: Local MCP Server

{
  "mcpServers": {
    "ai-memory": {
      "command": "npx",
      "args": ["-y", "@apify/actors-mcp-server", "--actors", "ruv/ai-memory-engine"],
      "env": {
        "APIFY_TOKEN": "your-apify-token"
      }
    }
  }
}

What AI Agents Can Do

Once connected, AI agents can autonomously:

  • Store memories from conversations
  • Search semantically through stored knowledge
  • Build knowledge graphs from unstructured data
  • Get recommendations based on patterns
  • Manage sessions across conversations

Example AI Agent Workflow

User: "Remember that I prefer dark mode and fast responses"
Agent: [Calls AI Memory Engine store action]
       [Stores: {"text": "User prefers dark mode and fast responses", "metadata": {"type": "preference"}}]
       "Got it! I'll remember your preference for dark mode and fast responses."

User: "What are my preferences?"
Agent: [Calls AI Memory Engine search action with query "user preferences"]
       [Returns stored memory with similarity 0.94]
       "Based on our conversation, you prefer dark mode and fast responses."

MCP Resources




Built with RuVector - High-performance vector database for AI applications.