ruvector/crates/ruvector-postgres/src/graph
rUv 0d24f43e2b feat(postgres): Add 53 SQL function definitions for all advanced modules (#46)
* feat(postgres): Add 7 advanced AI modules to ruvector-postgres

Comprehensive implementation of advanced AI capabilities:

## New Modules (23,541 lines of code)

### 1. Self-Learning / ReasoningBank (`src/learning/`)
- Trajectory tracking for query optimization
- Pattern extraction using K-means clustering
- ReasoningBank for pattern storage and matching
- Adaptive search parameter optimization

### 2. Attention Mechanisms (`src/attention/`)
- Scaled dot-product attention (core)
- Multi-head attention with parallel heads
- Flash Attention v2 (memory-efficient)
- 10 attention types with PostgresEnum support

### 3. GNN Layers (`src/gnn/`)
- Message passing framework
- GCN (Graph Convolutional Network)
- GraphSAGE with mean/max aggregation
- Configurable aggregation methods

### 4. Hyperbolic Embeddings (`src/hyperbolic/`)
- Poincaré ball model
- Lorentz hyperboloid model
- Hyperbolic distance metrics
- Möbius operations

### 5. Sparse Vectors (`src/sparse/`)
- COO format sparse vector type
- Efficient sparse-sparse distance functions
- BM25/SPLADE compatible
- Top-k pruning operations

### 6. Graph Operations & Cypher (`src/graph/`)
- Property graph storage (nodes/edges)
- BFS, DFS, Dijkstra traversal
- Cypher query parser (AST-based)
- Query executor with pattern matching

### 7. Tiny Dancer Routing (`src/routing/`)
- FastGRNN neural network
- Agent registry with capabilities
- Multi-objective routing optimization
- Cost/latency/quality balancing

## Docker Infrastructure
- Dockerfile with pgrx 0.12.6 and PostgreSQL 16
- docker-compose.yml with test runner
- Initialization SQL with test tables
- Shell scripts for dev/test/benchmark

## Feature Flags
- `learning`, `attention`, `gnn`, `hyperbolic`
- `sparse`, `graph`, `routing`
- `ai-complete` and `graph-complete` bundles

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(docker): Copy entire workspace for pgrx build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(docker): Build standalone crate without workspace

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: Update README to enhance clarity and structure

* fix(postgres): Resolve compilation errors and Docker build issues

- Fix simsimd Option/Result type mismatch in scaled_dot.rs
- Fix f32/f64 type conversions in poincare.rs and lorentz.rs
- Fix AVX512 missing wrapper functions by using AVX2 fallback
- Fix Vec<Vec<f32>> to JsonB for pgrx pg_extern compatibility
- Fix DashMap get() to get_mut() for mutable access
- Fix router.rs dereference for best_score comparison
- Update Dockerfile to copy pre-written SQL file for pgrx
- Simplify init.sql to use correct function names
- Add postgres-cli npm package for CLI tooling

All changes tested successfully in Docker with:
- Extension loads with AVX2 SIMD support (8 floats/op)
- Distance functions verified working
- PostgreSQL 16 container runs successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add ruvLLM examples and enhanced postgres-cli

Added from claude/ruvector-lfm2-llm-01YS5Tc7i64PyYCLecT9L1dN branch:
- examples/ruvLLM: Complete LLM inference system with SIMD optimization
  - Pretraining, benchmarking, and optimization system
  - Real SIMD-optimized CPU inference engine
  - Comprehensive SOTA benchmark suite
  - Attention mechanisms, memory management, router

Enhanced postgres-cli with full ruvector-postgres integration:
- Sparse vector operations (BM25, top-k, prune, conversions)
- Hyperbolic geometry (Poincare, Lorentz, Mobius operations)
- Agent routing (Tiny Dancer system)
- Vector quantization (binary, scalar, product)
- Enhanced graph and learning commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(postgres-cli): Use native ruvector type instead of pgvector

- Change createVectorTable to use ruvector type (native RuVector extension)
- Add dimensions column for metadata since ruvector is variable-length
- Update index creation to use simple btree (HNSW/IVFFlat TBD)
- Tested against Docker container with ruvector extension

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(postgres): Add 53 SQL function definitions for all advanced modules

Enable all advanced PostgreSQL extension functions by adding their SQL
definitions to the extension file. This exposes all Rust #[pg_extern]
functions to PostgreSQL.

## New SQL Functions (53 total)

### Hyperbolic Geometry (8 functions)
- ruvector_poincare_distance, ruvector_lorentz_distance
- ruvector_mobius_add, ruvector_exp_map, ruvector_log_map
- ruvector_poincare_to_lorentz, ruvector_lorentz_to_poincare
- ruvector_minkowski_dot

### Sparse Vectors (14 functions)
- ruvector_sparse_create, ruvector_sparse_from_dense
- ruvector_sparse_dot, ruvector_sparse_cosine, ruvector_sparse_l2_distance
- ruvector_sparse_add, ruvector_sparse_scale, ruvector_sparse_to_dense
- ruvector_sparse_nnz, ruvector_sparse_dim
- ruvector_bm25_score, ruvector_tf_idf, ruvector_sparse_normalize
- ruvector_sparse_topk

### GNN - Graph Neural Networks (5 functions)
- ruvector_gnn_gcn_layer, ruvector_gnn_graphsage_layer
- ruvector_gnn_gat_layer, ruvector_gnn_message_pass
- ruvector_gnn_aggregate

### Routing/Agents - "Tiny Dancer" (11 functions)
- ruvector_route_query, ruvector_route_with_context
- ruvector_calculate_agent_affinity, ruvector_select_best_agent
- ruvector_multi_agent_route, ruvector_create_agent_embedding
- ruvector_get_routing_stats, ruvector_register_agent
- ruvector_update_agent_performance, ruvector_adaptive_route
- ruvector_fastgrnn_forward

### Learning/ReasoningBank (7 functions)
- ruvector_record_trajectory, ruvector_get_verdict
- ruvector_distill_memory, ruvector_adaptive_search
- ruvector_learning_feedback, ruvector_get_learning_patterns
- ruvector_optimize_search_params

### Graph/Cypher (8 functions)
- ruvector_graph_create_node, ruvector_graph_create_edge
- ruvector_graph_get_neighbors, ruvector_graph_shortest_path
- ruvector_graph_pagerank, ruvector_cypher_query
- ruvector_graph_traverse, ruvector_graph_similarity_search

## CLI Updates
- Enabled hyperbolic geometry commands in postgres-cli
- Added vector distance and normalize commands
- Enhanced client with connection pooling and retry logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 22:49:29 -05:00
..
cypher feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00
mod.rs feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00
operators.rs feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00
README.md feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00
storage.rs feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00
traversal.rs feat(postgres): Add 53 SQL function definitions for all advanced modules (#46) 2025-12-02 22:49:29 -05:00

Graph Operations & Cypher Module

This module provides graph database capabilities for the ruvector-postgres extension, including graph storage, traversal algorithms, and Cypher query support.

Features

  • Concurrent Graph Storage: Thread-safe graph storage using DashMap
  • Node & Edge Management: Full-featured node and edge storage with properties
  • Label Indexing: Fast node lookups by label
  • Adjacency Lists: Efficient edge traversal with O(1) neighbor access
  • Graph Traversal: BFS, DFS, and Dijkstra's shortest path algorithms
  • Cypher Support: Simplified Cypher query language for graph operations
  • PostgreSQL Integration: Native pgrx-based PostgreSQL functions

Architecture

Storage Layer (storage.rs)

// Node with labels and properties
pub struct Node {
    pub id: u64,
    pub labels: Vec<String>,
    pub properties: HashMap<String, JsonValue>,
}

// Edge with type and properties
pub struct Edge {
    pub id: u64,
    pub source: u64,
    pub target: u64,
    pub edge_type: String,
    pub properties: HashMap<String, JsonValue>,
}

// Concurrent storage with indexing
pub struct GraphStore {
    pub nodes: NodeStore,  // DashMap-based
    pub edges: EdgeStore,  // DashMap-based
}

Traversal Layer (traversal.rs)

Implements common graph algorithms:

  • BFS: Breadth-first search for shortest path by hop count
  • DFS: Depth-first search with visitor pattern
  • Dijkstra: Weighted shortest path with custom edge weights
  • All Paths: Find multiple paths between nodes

Cypher Layer (cypher/)

Simplified Cypher query language support:

  • AST (ast.rs): Complete abstract syntax tree for Cypher
  • Parser (parser.rs): Basic parser for common Cypher patterns
  • Executor (executor.rs): Query execution engine

Supported Cypher clauses:

  • CREATE: Create nodes and relationships
  • MATCH: Pattern matching
  • WHERE: Filtering
  • RETURN: Result projection
  • SET, DELETE, WITH: Basic support

PostgreSQL Functions

Graph Management

-- Create a new graph
SELECT ruvector_create_graph('my_graph');

-- List all graphs
SELECT ruvector_list_graphs();

-- Delete a graph
SELECT ruvector_delete_graph('my_graph');

-- Get graph statistics
SELECT ruvector_graph_stats('my_graph');
-- Returns: {"name": "my_graph", "node_count": 100, "edge_count": 250, ...}

Node Operations

-- Add a node
SELECT ruvector_add_node(
    'my_graph',
    ARRAY['Person', 'Employee'],  -- Labels
    '{"name": "Alice", "age": 30, "department": "Engineering"}'::jsonb
);
-- Returns: node_id (bigint)

-- Get a node by ID
SELECT ruvector_get_node('my_graph', 1);
-- Returns: {"id": 1, "labels": ["Person"], "properties": {...}}

-- Find nodes by label
SELECT ruvector_find_nodes_by_label('my_graph', 'Person');
-- Returns: array of nodes

Edge Operations

-- Add an edge
SELECT ruvector_add_edge(
    'my_graph',
    1,  -- source_id
    2,  -- target_id
    'KNOWS',  -- edge_type
    '{"since": 2020, "weight": 0.8}'::jsonb
);
-- Returns: edge_id (bigint)

-- Get an edge by ID
SELECT ruvector_get_edge('my_graph', 1);

-- Get neighbors of a node
SELECT ruvector_get_neighbors('my_graph', 1);
-- Returns: array of node IDs

Graph Traversal

-- Find shortest path (unweighted)
SELECT ruvector_shortest_path(
    'my_graph',
    1,    -- start_id
    10,   -- end_id
    5     -- max_hops
);
-- Returns: {"nodes": [1, 3, 7, 10], "edges": [12, 45, 89], "length": 4, "cost": 0}

-- Find weighted shortest path
SELECT ruvector_shortest_path_weighted(
    'my_graph',
    1,    -- start_id
    10,   -- end_id
    'weight'  -- property name for edge weights
);
-- Returns: {"nodes": [...], "edges": [...], "length": 4, "cost": 2.5}

Cypher Queries

-- Create nodes
SELECT ruvector_cypher(
    'my_graph',
    'CREATE (n:Person {name: ''Alice'', age: 30}) RETURN n',
    NULL
);

-- Match and filter
SELECT ruvector_cypher(
    'my_graph',
    'MATCH (n:Person) WHERE n.age > 25 RETURN n.name, n.age',
    NULL
);

-- Parameterized queries
SELECT ruvector_cypher(
    'my_graph',
    'MATCH (n:Person) WHERE n.name = $name RETURN n',
    '{"name": "Alice"}'::jsonb
);

-- Create relationships
SELECT ruvector_cypher(
    'my_graph',
    'CREATE (a:Person {name: ''Alice''})-[:KNOWS {since: 2020}]->(b:Person {name: ''Bob''}) RETURN a, b',
    NULL
);

Usage Examples

Social Network

-- Create graph
SELECT ruvector_create_graph('social_network');

-- Add users
WITH users AS (
    SELECT ruvector_add_node('social_network', ARRAY['Person'],
        jsonb_build_object('name', name, 'age', age))
    FROM (VALUES
        ('Alice', 30),
        ('Bob', 25),
        ('Charlie', 35),
        ('Diana', 28)
    ) AS t(name, age)
)

-- Create friendships
SELECT ruvector_add_edge('social_network', 1, 2, 'FRIENDS',
    '{"since": "2020-01-15"}'::jsonb);
SELECT ruvector_add_edge('social_network', 2, 3, 'FRIENDS',
    '{"since": "2019-06-20"}'::jsonb);
SELECT ruvector_add_edge('social_network', 1, 4, 'FRIENDS',
    '{"since": "2021-03-10"}'::jsonb);

-- Find connection between Alice and Charlie
SELECT ruvector_shortest_path('social_network', 1, 3, 10);

-- Cypher: Find all friends of friends
SELECT ruvector_cypher(
    'social_network',
    'MATCH (a:Person)-[:FRIENDS]->(b:Person)-[:FRIENDS]->(c:Person)
     WHERE a.name = ''Alice'' RETURN c.name',
    NULL
);

Knowledge Graph

-- Create knowledge graph
SELECT ruvector_create_graph('knowledge');

-- Add concepts
SELECT ruvector_add_node('knowledge', ARRAY['Concept'],
    '{"name": "Machine Learning", "category": "AI"}'::jsonb);
SELECT ruvector_add_node('knowledge', ARRAY['Concept'],
    '{"name": "Neural Networks", "category": "AI"}'::jsonb);
SELECT ruvector_add_node('knowledge', ARRAY['Concept'],
    '{"name": "Deep Learning", "category": "AI"}'::jsonb);

-- Create relationships
SELECT ruvector_add_edge('knowledge', 1, 2, 'INCLUDES',
    '{"strength": 0.9}'::jsonb);
SELECT ruvector_add_edge('knowledge', 2, 3, 'SPECIALIZES_IN',
    '{"strength": 0.95}'::jsonb);

-- Find weighted path
SELECT ruvector_shortest_path_weighted('knowledge', 1, 3, 'strength');

Recommendation System

-- Create graph
SELECT ruvector_create_graph('recommendations');

-- Add users and items
SELECT ruvector_cypher('recommendations',
    'CREATE (u:User {name: ''Alice''})
     CREATE (m1:Movie {title: ''Inception''})
     CREATE (m2:Movie {title: ''Interstellar''})
     CREATE (u)-[:WATCHED {rating: 5}]->(m1)
     CREATE (u)-[:WATCHED {rating: 4}]->(m2)
     RETURN u, m1, m2',
    NULL
);

-- Find similar users or items
SELECT ruvector_cypher('recommendations',
    'MATCH (u1:User)-[:WATCHED]->(m:Movie)<-[:WATCHED]-(u2:User)
     WHERE u1.name = ''Alice''
     RETURN u2.name, COUNT(m) AS common_movies
     ORDER BY common_movies DESC',
    NULL
);

Performance Characteristics

Storage

  • Node Lookup: O(1) by ID, O(k) by label (k = nodes with label)
  • Edge Lookup: O(1) by ID, O(d) for neighbors (d = degree)
  • Concurrent Access: Lock-free reads, minimal contention on writes

Traversal

  • BFS: O(V + E) time, O(V) space
  • DFS: O(V + E) time, O(h) space (h = max depth)
  • Dijkstra: O((V + E) log V) time with binary heap

Scalability

  • Thread-safe concurrent operations
  • Memory-efficient adjacency lists
  • Label and type indexing for fast filtering

Implementation Details

Concurrent Storage

Uses DashMap for lock-free concurrent access:

pub struct NodeStore {
    nodes: DashMap<u64, Node>,
    label_index: DashMap<String, HashSet<u64>>,
    next_id: AtomicU64,
}

Graph Registry

Global registry for named graphs:

static GRAPH_REGISTRY: Lazy<DashMap<String, Arc<GraphStore>>> = ...

Cypher Parser

Basic recursive descent parser:

  • Handles common patterns: (n:Label {prop: value})
  • Relationship patterns: -[:TYPE]->, <-[:TYPE]-
  • WHERE conditions, RETURN projections
  • Property extraction and type inference

Limitations

Current Parser Limitations

The Cypher parser is simplified for demonstration:

  • No support for complex WHERE conditions (AND/OR)
  • Limited expression support (basic comparisons only)
  • No aggregation functions (COUNT, SUM, etc.)
  • No ORDER BY or GROUP BY clauses
  • Basic pattern matching only

Production Recommendations

For production use, consider:

  • Using a proper parser library (nom, pest, lalrpop)
  • Adding comprehensive error messages
  • Implementing full Cypher specification
  • Query optimization and planning
  • Transaction support
  • Persistence layer

Testing

Comprehensive test suite included:

# Run all tests
cargo pgrx test

# Run specific test
cargo pgrx test test_create_graph

Test coverage:

  • Node and edge CRUD operations
  • Graph traversal algorithms
  • Cypher query execution
  • PostgreSQL function integration
  • Concurrent access patterns

Future Enhancements

  • Graph analytics (PageRank, community detection)
  • Temporal graphs (time-aware edges)
  • Property graph constraints
  • Full-text search on properties
  • Persistent storage backend
  • Query optimization
  • Distributed graph support
  • GraphQL interface

References