ruvector/crates/ruvector-postgres/docs/examples/self-learning-usage.sql
rUv 84f8b685c1 feat(postgres): Add 53 SQL function definitions for all advanced modules (#46)
* feat(postgres): Add 7 advanced AI modules to ruvector-postgres

Comprehensive implementation of advanced AI capabilities:

## New Modules (23,541 lines of code)

### 1. Self-Learning / ReasoningBank (`src/learning/`)
- Trajectory tracking for query optimization
- Pattern extraction using K-means clustering
- ReasoningBank for pattern storage and matching
- Adaptive search parameter optimization

### 2. Attention Mechanisms (`src/attention/`)
- Scaled dot-product attention (core)
- Multi-head attention with parallel heads
- Flash Attention v2 (memory-efficient)
- 10 attention types with PostgresEnum support

### 3. GNN Layers (`src/gnn/`)
- Message passing framework
- GCN (Graph Convolutional Network)
- GraphSAGE with mean/max aggregation
- Configurable aggregation methods

### 4. Hyperbolic Embeddings (`src/hyperbolic/`)
- Poincaré ball model
- Lorentz hyperboloid model
- Hyperbolic distance metrics
- Möbius operations

### 5. Sparse Vectors (`src/sparse/`)
- COO format sparse vector type
- Efficient sparse-sparse distance functions
- BM25/SPLADE compatible
- Top-k pruning operations

### 6. Graph Operations & Cypher (`src/graph/`)
- Property graph storage (nodes/edges)
- BFS, DFS, Dijkstra traversal
- Cypher query parser (AST-based)
- Query executor with pattern matching

### 7. Tiny Dancer Routing (`src/routing/`)
- FastGRNN neural network
- Agent registry with capabilities
- Multi-objective routing optimization
- Cost/latency/quality balancing

## Docker Infrastructure
- Dockerfile with pgrx 0.12.6 and PostgreSQL 16
- docker-compose.yml with test runner
- Initialization SQL with test tables
- Shell scripts for dev/test/benchmark

## Feature Flags
- `learning`, `attention`, `gnn`, `hyperbolic`
- `sparse`, `graph`, `routing`
- `ai-complete` and `graph-complete` bundles

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(docker): Copy entire workspace for pgrx build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(docker): Build standalone crate without workspace

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: Update README to enhance clarity and structure

* fix(postgres): Resolve compilation errors and Docker build issues

- Fix simsimd Option/Result type mismatch in scaled_dot.rs
- Fix f32/f64 type conversions in poincare.rs and lorentz.rs
- Fix AVX512 missing wrapper functions by using AVX2 fallback
- Fix Vec<Vec<f32>> to JsonB for pgrx pg_extern compatibility
- Fix DashMap get() to get_mut() for mutable access
- Fix router.rs dereference for best_score comparison
- Update Dockerfile to copy pre-written SQL file for pgrx
- Simplify init.sql to use correct function names
- Add postgres-cli npm package for CLI tooling

All changes tested successfully in Docker with:
- Extension loads with AVX2 SIMD support (8 floats/op)
- Distance functions verified working
- PostgreSQL 16 container runs successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add ruvLLM examples and enhanced postgres-cli

Added from claude/ruvector-lfm2-llm-01YS5Tc7i64PyYCLecT9L1dN branch:
- examples/ruvLLM: Complete LLM inference system with SIMD optimization
  - Pretraining, benchmarking, and optimization system
  - Real SIMD-optimized CPU inference engine
  - Comprehensive SOTA benchmark suite
  - Attention mechanisms, memory management, router

Enhanced postgres-cli with full ruvector-postgres integration:
- Sparse vector operations (BM25, top-k, prune, conversions)
- Hyperbolic geometry (Poincare, Lorentz, Mobius operations)
- Agent routing (Tiny Dancer system)
- Vector quantization (binary, scalar, product)
- Enhanced graph and learning commands

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(postgres-cli): Use native ruvector type instead of pgvector

- Change createVectorTable to use ruvector type (native RuVector extension)
- Add dimensions column for metadata since ruvector is variable-length
- Update index creation to use simple btree (HNSW/IVFFlat TBD)
- Tested against Docker container with ruvector extension

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(postgres): Add 53 SQL function definitions for all advanced modules

Enable all advanced PostgreSQL extension functions by adding their SQL
definitions to the extension file. This exposes all Rust #[pg_extern]
functions to PostgreSQL.

## New SQL Functions (53 total)

### Hyperbolic Geometry (8 functions)
- ruvector_poincare_distance, ruvector_lorentz_distance
- ruvector_mobius_add, ruvector_exp_map, ruvector_log_map
- ruvector_poincare_to_lorentz, ruvector_lorentz_to_poincare
- ruvector_minkowski_dot

### Sparse Vectors (14 functions)
- ruvector_sparse_create, ruvector_sparse_from_dense
- ruvector_sparse_dot, ruvector_sparse_cosine, ruvector_sparse_l2_distance
- ruvector_sparse_add, ruvector_sparse_scale, ruvector_sparse_to_dense
- ruvector_sparse_nnz, ruvector_sparse_dim
- ruvector_bm25_score, ruvector_tf_idf, ruvector_sparse_normalize
- ruvector_sparse_topk

### GNN - Graph Neural Networks (5 functions)
- ruvector_gnn_gcn_layer, ruvector_gnn_graphsage_layer
- ruvector_gnn_gat_layer, ruvector_gnn_message_pass
- ruvector_gnn_aggregate

### Routing/Agents - "Tiny Dancer" (11 functions)
- ruvector_route_query, ruvector_route_with_context
- ruvector_calculate_agent_affinity, ruvector_select_best_agent
- ruvector_multi_agent_route, ruvector_create_agent_embedding
- ruvector_get_routing_stats, ruvector_register_agent
- ruvector_update_agent_performance, ruvector_adaptive_route
- ruvector_fastgrnn_forward

### Learning/ReasoningBank (7 functions)
- ruvector_record_trajectory, ruvector_get_verdict
- ruvector_distill_memory, ruvector_adaptive_search
- ruvector_learning_feedback, ruvector_get_learning_patterns
- ruvector_optimize_search_params

### Graph/Cypher (8 functions)
- ruvector_graph_create_node, ruvector_graph_create_edge
- ruvector_graph_get_neighbors, ruvector_graph_shortest_path
- ruvector_graph_pagerank, ruvector_cypher_query
- ruvector_graph_traverse, ruvector_graph_similarity_search

## CLI Updates
- Enabled hyperbolic geometry commands in postgres-cli
- Added vector distance and normalize commands
- Enhanced client with connection pooling and retry logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 22:49:29 -05:00

322 lines
10 KiB
SQL

-- =============================================================================
-- RuVector Self-Learning Module Usage Examples
-- =============================================================================
-- This file demonstrates how to use the self-learning and ReasoningBank
-- features for adaptive query optimization.
-- -----------------------------------------------------------------------------
-- 1. Basic Setup: Enable Learning
-- -----------------------------------------------------------------------------
-- Enable learning for a table with default configuration
SELECT ruvector_enable_learning('my_vectors');
-- Enable with custom configuration
SELECT ruvector_enable_learning(
'my_vectors',
'{"max_trajectories": 2000, "num_clusters": 15}'::jsonb
);
-- -----------------------------------------------------------------------------
-- 2. Recording Query Trajectories
-- -----------------------------------------------------------------------------
-- Trajectories are typically recorded automatically by search functions,
-- but you can also record them manually for testing or custom workflows.
-- Record a query trajectory
SELECT ruvector_record_trajectory(
'my_vectors', -- table name
ARRAY[0.1, 0.2, 0.3, 0.4], -- query vector
ARRAY[1, 2, 3, 4, 5]::bigint[], -- result IDs
1500, -- latency in microseconds
50, -- ef_search used
10 -- probes used
);
-- -----------------------------------------------------------------------------
-- 3. Providing Relevance Feedback
-- -----------------------------------------------------------------------------
-- After seeing query results, users can provide feedback about which
-- results were actually relevant
SELECT ruvector_record_feedback(
'my_vectors', -- table name
ARRAY[0.1, 0.2, 0.3, 0.4], -- query vector
ARRAY[1, 2, 5]::bigint[], -- relevant IDs
ARRAY[3, 4]::bigint[] -- irrelevant IDs
);
-- -----------------------------------------------------------------------------
-- 4. Extracting and Managing Patterns
-- -----------------------------------------------------------------------------
-- Extract patterns from recorded trajectories using k-means clustering
SELECT ruvector_extract_patterns(
'my_vectors', -- table name
10 -- number of clusters
);
-- Get current learning statistics
SELECT ruvector_learning_stats('my_vectors');
-- Example output:
-- {
-- "trajectories": {
-- "total": 150,
-- "with_feedback": 45,
-- "avg_latency_us": 1234.5,
-- "avg_precision": 0.85,
-- "avg_recall": 0.78
-- },
-- "patterns": {
-- "total": 10,
-- "total_samples": 150,
-- "avg_confidence": 0.87,
-- "total_usage": 523
-- }
-- }
-- -----------------------------------------------------------------------------
-- 5. Auto-Tuning Search Parameters
-- -----------------------------------------------------------------------------
-- Auto-tune for balanced performance (default)
SELECT ruvector_auto_tune('my_vectors');
-- Auto-tune optimizing for speed
SELECT ruvector_auto_tune('my_vectors', 'speed');
-- Auto-tune optimizing for accuracy
SELECT ruvector_auto_tune('my_vectors', 'accuracy');
-- Auto-tune with sample queries
SELECT ruvector_auto_tune(
'my_vectors',
'balanced',
ARRAY[
ARRAY[0.1, 0.2, 0.3],
ARRAY[0.4, 0.5, 0.6],
ARRAY[0.7, 0.8, 0.9]
]
);
-- -----------------------------------------------------------------------------
-- 6. Getting Optimized Search Parameters
-- -----------------------------------------------------------------------------
-- Get optimized search parameters for a specific query
SELECT ruvector_get_search_params(
'my_vectors',
ARRAY[0.1, 0.2, 0.3, 0.4]
);
-- Example output:
-- {
-- "ef_search": 52,
-- "probes": 12,
-- "confidence": 0.89
-- }
-- Use these parameters in your search:
-- SET ruvector.ef_search = 52;
-- SET ruvector.probes = 12;
-- SELECT * FROM my_vectors ORDER BY embedding <-> '[0.1, 0.2, 0.3, 0.4]' LIMIT 10;
-- -----------------------------------------------------------------------------
-- 7. Pattern Consolidation and Pruning
-- -----------------------------------------------------------------------------
-- Consolidate similar patterns to reduce memory usage
-- Patterns with similarity >= 0.95 will be merged
SELECT ruvector_consolidate_patterns('my_vectors', 0.95);
-- Prune low-quality patterns
-- Remove patterns with usage < 5 or confidence < 0.5
SELECT ruvector_prune_patterns(
'my_vectors',
5, -- min_usage
0.5 -- min_confidence
);
-- -----------------------------------------------------------------------------
-- 8. Complete Workflow Example
-- -----------------------------------------------------------------------------
-- Create a table with vectors
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT,
embedding vector(384)
);
-- Insert some sample data
INSERT INTO documents (title, embedding)
SELECT
'Document ' || i,
ruvector_random(384)
FROM generate_series(1, 1000) i;
-- Create an HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Enable learning for adaptive optimization
SELECT ruvector_enable_learning('documents');
-- Simulate user queries and collect trajectories
DO $$
DECLARE
query_vec vector(384);
results bigint[];
start_time bigint;
end_time bigint;
BEGIN
FOR i IN 1..50 LOOP
-- Generate random query
query_vec := ruvector_random(384);
-- Execute search and measure time
start_time := EXTRACT(EPOCH FROM clock_timestamp()) * 1000000;
SELECT array_agg(id) INTO results
FROM (
SELECT id FROM documents
ORDER BY embedding <=> query_vec
LIMIT 10
) t;
end_time := EXTRACT(EPOCH FROM clock_timestamp()) * 1000000;
-- Record trajectory
PERFORM ruvector_record_trajectory(
'documents',
query_vec::float4[],
results,
(end_time - start_time)::bigint,
50, -- current ef_search
10 -- current probes
);
-- Occasionally provide feedback
IF i % 5 = 0 THEN
PERFORM ruvector_record_feedback(
'documents',
query_vec::float4[],
results[1:3], -- first 3 were relevant
results[8:10] -- last 3 were not relevant
);
END IF;
END LOOP;
END $$;
-- Extract patterns from collected data
SELECT ruvector_extract_patterns('documents', 10);
-- View learning statistics
SELECT ruvector_learning_stats('documents');
-- Auto-tune for optimal performance
SELECT ruvector_auto_tune('documents', 'balanced');
-- Get optimized parameters for a new query
WITH query AS (
SELECT ruvector_random(384) AS vec
),
params AS (
SELECT ruvector_get_search_params('documents', (SELECT vec::float4[] FROM query)) AS p
)
SELECT
(p->'ef_search')::int AS ef_search,
(p->'probes')::int AS probes,
(p->'confidence')::float AS confidence
FROM params;
-- -----------------------------------------------------------------------------
-- 9. Monitoring and Maintenance
-- -----------------------------------------------------------------------------
-- Regularly consolidate patterns (can be run in a cron job)
SELECT ruvector_consolidate_patterns('documents', 0.92);
-- Prune low-quality patterns monthly
SELECT ruvector_prune_patterns('documents', 10, 0.6);
-- Clear all learning data if needed
SELECT ruvector_clear_learning('documents');
-- -----------------------------------------------------------------------------
-- 10. Advanced: Integration with Application Code
-- -----------------------------------------------------------------------------
-- Example: Python application using learned parameters
/*
import psycopg2
def search_with_learning(conn, table, query_vector, limit=10):
"""Search using learned optimal parameters"""
# Get optimized parameters
with conn.cursor() as cur:
cur.execute("""
SELECT ruvector_get_search_params(%s, %s::float4[])
""", (table, query_vector))
params = cur.fetchone()[0]
# Apply parameters and search
with conn.cursor() as cur:
cur.execute(f"""
SET ruvector.ef_search = {params['ef_search']};
SET ruvector.probes = {params['probes']};
SELECT id, title, embedding <=> %s::vector AS distance
FROM {table}
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_vector, query_vector, limit))
results = cur.fetchall()
return results, params
# Use it
conn = psycopg2.connect("dbname=mydb")
results, params = search_with_learning(
conn,
'documents',
[0.1, 0.2, 0.3, ...],
limit=10
)
print(f"Search completed with ef_search={params['ef_search']}, "
f"confidence={params['confidence']:.2f}")
*/
-- -----------------------------------------------------------------------------
-- 11. Best Practices
-- -----------------------------------------------------------------------------
-- 1. Collect enough trajectories before extracting patterns (50+ recommended)
-- 2. Provide relevance feedback when possible for better learning
-- 3. Consolidate patterns regularly to manage memory
-- 4. Prune low-quality patterns periodically
-- 5. Monitor learning statistics to track improvement
-- 6. Start with balanced optimization, adjust based on needs
-- 7. Re-extract patterns when query patterns change significantly
-- Example monitoring query:
SELECT
jsonb_pretty(ruvector_learning_stats('documents')) AS stats,
CASE
WHEN (stats->'trajectories'->>'total')::int < 50
THEN 'Collecting data - need more trajectories'
WHEN (stats->'patterns'->>'total')::int = 0
THEN 'Ready to extract patterns'
WHEN (stats->'patterns'->>'avg_confidence')::float < 0.7
THEN 'Low confidence - collect more feedback'
ELSE 'System is learning well'
END AS recommendation
FROM (
SELECT ruvector_learning_stats('documents') AS stats
) t;