* docs(coherence-engine): add ADR-014 and DDD for sheaf Laplacian coherence engine Add comprehensive architecture documentation for ruvector-coherence crate: - ADR-014: Sheaf Laplacian-based coherence witnessing architecture - Universal coherence object with domain-agnostic interpretation - 5-layer architecture (Application → Gate → Computation → Governance → Storage) - 4-tier compute ladder (Reflex → Retrieval → Heavy → Human) - Full ruvector ecosystem integration (10+ crates) - 15 internal architectural decisions - DDD: Domain-Driven Design with 10 bounded contexts - Tile Fabric (cognitum-gate-kernel) - Adaptive Learning (sona) - Neural Gating (ruvector-nervous-system) - Learned Restriction Maps (ruvector-gnn) - Hyperbolic Coherence (ruvector-hyperbolic-hnsw) - Incoherence Isolation (ruvector-mincut) - Attention-Weighted Coherence (ruvector-attention) - Distributed Consensus (ruvector-raft) Key concept: "This is not prediction. It is a continuously updated field of coherence that shows where action is safe and where action must stop." Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement sheaf Laplacian coherence engine Implement the complete Prime-Radiant crate based on ADR-014: Core Modules: - substrate/: SheafGraph, SheafNode, SheafEdge, RestrictionMap (SIMD-optimized) - coherence/: CoherenceEngine, energy computation, spectral drift detection - governance/: PolicyBundle, WitnessRecord, LineageRecord (Blake3 hashing) - execution/: CoherenceGate, ComputeLane, ActionExecutor Ecosystem Integrations (feature-gated): - tiles/: cognitum-gate-kernel 256-tile WASM fabric adapter - sona_tuning/: Adaptive threshold learning with EWC++ - neural_gate/: Biologically-inspired gating with HDC encoding - learned_rho/: GNN-based learned restriction maps - attention/: Topology-gated attention, MoE routing, PDE diffusion - distributed/: Raft-based multi-node coherence Testing: - 138 tests (integration, property-based, chaos) - 8 benchmarks covering ADR-014 performance targets Stats: 91 files, ~30K lines of Rust code "This is not prediction. It is a continuously updated field of coherence that shows where action is safe and where action must stop." Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add RuvLLM integration to ADR-014 v0.4 - Add coherence-gated LLM inference architecture diagram - Add 5 integration modules with code examples: - SheafCoherenceValidator (replaces heuristic scoring) - UnifiedWitnessLog (merged audit trail) - PatternToRestrictionBridge (ReasoningBank → learned ρ) - MemoryCoherenceLayer (context as sheaf nodes) - CoherenceConfidence (energy → confidence mapping) - Add 7 integration ADRs (ADR-CE-016 through ADR-CE-022) - Add ruvllm to crate integration matrix and dependencies - Add 4 LLM-specific benefits to consequences - Add ruvllm feature flag Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add 22 coherence engine internal ADRs Create detailed ADR files for all internal coherence engine decisions: Core Architecture (ADR-CE-001 to ADR-CE-008): - 001: Sheaf Laplacian defines coherence witness - 002: Incremental computation with stored residuals - 003: PostgreSQL + ruvector hybrid storage - 004: Signed event log with deterministic replay - 005: First-class governance objects - 006: Coherence gate controls compute ladder - 007: Thresholds auto-tuned from traces - 008: Multi-tenant isolation boundaries Universal Coherence (ADR-CE-009 to ADR-CE-015): - 009: Single coherence object (one math, many interpretations) - 010: Domain-agnostic nodes and edges - 011: Residual = contradiction energy - 012: Gate = refusal mechanism with witness - 013: Not prediction (coherence field, not forecasting) - 014: Reflex lane default (most ops stay fast) - 015: Adapt without losing control RuvLLM Integration (ADR-CE-016 to ADR-CE-022): - 016: CoherenceValidator uses sheaf energy - 017: Unified audit trail (WitnessLog + governance) - 018: Pattern-to-restriction bridge (ReasoningBank) - 019: Memory as nodes (agentic, working, episodic) - 020: Confidence from energy (sigmoid mapping) - 021: Shared SONA between ruvllm and prime-radiant - 022: Failure learning (ErrorPatternLearner → ρ maps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement RuvLLM integration layer (ADR-014 v0.4) Implement complete Prime-Radiant + RuvLLM integration per ADR-CE-016 through ADR-CE-022: Core Integration Modules: - coherence_validator.rs: SheafCoherenceValidator using sheaf energy - witness_log.rs: UnifiedWitnessLog with hash chain for tamper evidence - pattern_bridge.rs: PatternToRestrictionBridge learning from verdicts - memory_layer.rs: MemoryCoherenceLayer tracking context as sheaf nodes - confidence.rs: CoherenceConfidence with sigmoid energy→confidence mapping Supporting Infrastructure: - mod.rs: Public API, re-exports, convenience constructors - error.rs: Comprehensive error types for each ADR - config.rs: LlmCoherenceConfig, thresholds, policies - gate.rs: LlmCoherenceGate high-level interface - adapter.rs: RuvLlmAdapter bridging type systems - bridge.rs: PolicyBridge, SonaBridge for synchronization - witness.rs: WitnessAdapter for correlation - traits.rs: Trait definitions for loose coupling Testing: - 22 integration tests covering all modules - Self-contained mock implementations - Feature-gated with #[cfg(feature = "ruvllm")] Feature Flags: - ruvllm feature in Cargo.toml - Optional dependency on ruvllm crate - Added to "full" feature set Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prime-radiant): add comprehensive README with examples Add user-friendly documentation covering: - Introduction explaining coherence vs confidence - Core concepts (coherence field, compute ladder) - Features overview (engine, governance, RuvLLM integration) - Quick start code examples: - Basic coherence check - LLM response validation - Memory consistency tracking - Confidence from energy - Application tiers (today, near-term, future) - Domain examples (AI, finance, medical, robotics, security) - Feature flags reference - Performance targets - Architecture diagram Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add ADR-015 Coherence-Gated Transformer (Sheaf Attention) Propose novel low-latency transformer architecture using coherence energy: Core Innovation: - Route tokens to compute lanes based on coherence energy, not confidence - Sparse attention using residual energy (skip coherent pairs) - Early exit when energy converges (not confidence threshold) - Restriction maps replace QKV projections Architecture: - Lane 0 (Reflex): 1-2 layers, local attention, <0.1ms - Lane 1 (Standard): 6 layers, sparse sheaf attention, ~1ms - Lane 2 (Deep): 12+ layers, full + MoE, ~5ms - Lane 3 (Escalate): Return uncertainty Performance Targets: - 5-10x latency reduction (10ms → 1-2ms for 128 tokens) - 2.5x memory reduction - <5% quality degradation - Provable coherence bound on output Mathematical Foundation: - Attention weight ∝ exp(-β × residual_energy) - Token routing via E(t) = Σ w_e ||ρ_t(x) - ρ_ctx(x)||² - Early exit when ΔE < ε (energy converged) Target: ruvector-attention crate with sheaf/ and coherence_gated/ modules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement coherence engine with CGT attention Complete implementation of Prime-Radiant coherence engine and Coherence-Gated Transformer (CGT) sheaf attention module. Core Features: - Sheaf Laplacian energy computation with restriction maps - 4-lane compute ladder (Reflex/Retrieval/Heavy/Human) - Cryptographic witness chains for audit trails - Policy bundles with multi-party approval Storage Backends: - InMemoryStorage with KNN search - FileStorage with Write-Ahead Logging (WAL) - PostgresStorage with full schema (feature-gated) - HybridStorage combining file + optional PostgreSQL CGT Sheaf Attention (ruvector-attention): - RestrictionMap with residual/energy computation - SheafAttention layer: A_ij = exp(-β×E_ij)/Z - TokenRouter with compute lane routing - SparseResidualAttention with energy-based masking - EarlyExit with energy convergence detection Performance Optimizations: - Zero-allocation hot paths (apply_into, compute_residual_norm_sq) - SIMD-friendly 4-way unrolled loops - Branchless lane routing - Pre-allocated buffers for batch operations RuvLLM Integration: - SheafCoherenceValidator for LLM response validation - UnifiedWitnessLog linking inference + coherence - MemoryCoherenceLayer for contradiction detection - CoherenceConfidence for interpretable uncertainty Tests: 202 passing in ruvector-attention, 180+ in prime-radiant Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): add GPU acceleration, SIMD optimizations, and benchmarks GPU Acceleration (wgpu-rs): - GpuCoherenceEngine with automatic CPU fallback - GpuDevice: adapter/device management with high-perf selection - GpuDispatcher: kernel execution with pipeline caching and buffer pooling - GpuBufferManager: typed buffer management with pooling - Compute kernels: residuals, energy reduction, sheaf attention, token routing WGSL Compute Shaders (6 files, 1,412 lines): - compute_residuals.wgsl: parallel edge residual computation - compute_energy.wgsl: two-phase parallel reduction - sheaf_attention.wgsl: energy-based attention weights A_ij = exp(-beta * E_ij) - token_routing.wgsl: branchless lane assignment - sparse_mask.wgsl: sparse attention mask generation - types.wgsl: shared GPU struct definitions SIMD Optimizations (wide crate): - Runtime CPU feature detection (AVX2, AVX-512, SSE4.2, NEON) - f32x8 vectorized operations - simd/vectors.rs: dot_product_simd, norm_squared_simd, subtract_simd - simd/matrix.rs: matmul_simd, matvec_simd, transpose_simd - simd/energy.rs: batch_residuals_simd, weighted_energy_sum_simd - 38 unit tests verifying SIMD correctness Benchmarks (criterion): - coherence_benchmarks.rs: core operations, graph scaling - simd_benchmarks.rs: SIMD vs naive comparisons - gpu_benchmarks.rs: CPU vs GPU performance Tests: - 18 GPU coherence tests (16 active, 2 perf ignored) - GPU-CPU consistency within 1% relative error - Error handling and fallback verification README improvements: - "What Prime-Radiant is NOT" section - Concrete numeric example with arithmetic - Flagship LLM hallucination refusal walkthrough - Infrastructure positioning Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * perf(prime-radiant): optimize SIMD and core computation patterns SIMD Optimizations: - Replace element-by-element load_f32x8 with try_into for direct memory copy - Fix redundant SIMD comparisons in lane assignment (compute masks once, use blend) - Apply across vectors.rs, matrix.rs, and energy.rs Core Computation Patterns: - Replace i % 4 modulo with chunks_exact() for proper auto-vectorization - Fix edge.rs: residual_norm_squared, residual_with_energy - Fix node.rs: norm_squared, dot product Graph API: - Add get_node_ref() for zero-copy node access via DashMap reference - Add with_node() closure API for efficient read-only operations Benchmark findings: - Incremental updates meet target (<100us): 59us actual - Linear O(n) scaling confirmed - Further SIMD/parallelization needed for <1us/edge target Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * perf(prime-radiant): add CSR sparse matrix, GPU buffer prealloc, thread-local scratch Performance optimizations for Prime-Radiant coherence engine: CSR Sparse Matrix (restriction.rs): - Full CsrMatrix struct with row_ptr, col_indices, values - COO to CSR conversion with from_coo() and from_coo_arrays() - Zero-allocation matvec_into() and matvec_add_into() - SIMD-friendly 4-element loop unrolling - 13 new tests covering all CSR operations GPU Buffer Pre-allocation (engine.rs, kernels.rs): - Pre-allocated params, energy_params, partial_sums, staging buffers - Zero per-frame allocations in compute_energy() - New create_bind_group_raw() methods for raw buffer references - CSR matrix support in convert_restriction_map() Thread-Local Scratch Buffers (edge.rs): - EdgeScratch struct with 3 reusable Vec<f32> buffers - thread_local! SCRATCH for zero-allocation hot paths - residual_norm_squared_no_alloc() and weighted_residual_energy_no_alloc() - 7 new tests for allocation-free energy computation WGSL Vec4 Optimization (compute_residuals.wgsl): - vec4-based processing loop with dot(r_vec, r_vec) - store_residuals flag in GpuParams struct - ~4x GPU throughput improvement README Updates: - Root README: 40 attention mechanisms, Prime-Radiant section, CGT Sheaf Attention - WASM README: CGT Sheaf Attention API documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: SEO optimize package metadata for crates.io and npm - prime-radiant: Enhanced description, keywords, categories - ruvector-attention-wasm: Add version to path dep, SEO keywords - package.json: 23 keywords, better description, engines config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(hyperbolic-hnsw): SEO optimize for crates.io publish * chore(prime-radiant): add version numbers to path dependencies for crates.io publish * fix(prime-radiant): shorten keyword for crates.io compliance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(readme): add prime-radiant and ruvector-attention-wasm package references - Add prime-radiant to Quantum Coherence section (sheaf Laplacian AI safety) - Add ruvector-attention-wasm to npm WASM packages (Flash, MoE, Hyperbolic, CGT) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement 6 advanced mathematical frameworks Comprehensive implementation of cutting-edge mathematical foundations: ## Modules Implemented 1. **Sheaf Cohomology** (10 files) - Coboundary operator, Cohomology groups, Betti numbers - Sheaf Laplacian, Obstruction detection, Diffusion - Sheaf Neural Networks with CohomologyPooling 2. **Category Theory/Topos** (12 files) - Category trait, Functors, Natural transformations - Topos with SubobjectClassifier, InternalLogic - 2-Category with Mac Lane coherence (pentagon/triangle) - BeliefTopos for probabilistic reasoning 3. **Homotopy Type Theory** (8 files) - Type/Term AST with Pi, Sigma, Identity types - Path operations, J-eliminator, Transport - Univalence axiom, Bidirectional type checker - Coherence as paths between belief states 4. **Spectral Invariants** (8 files) - Lanczos eigensolver for sparse matrices - Cheeger inequality bounds and sweep algorithm - Spectral clustering with k-means++ - Collapse prediction and early warning system 5. **Causal Abstraction** (7 files) - Structural Causal Models with do-calculus - D-separation (Bayes Ball), Topological ordering - Counterfactuals: ATE, ITE, NDE, NIE - Causal abstraction verification 6. **Quantum/Algebraic Topology** (10 files) - Quantum states, Density matrices, Channels - Simplicial complexes, Persistent homology - Topological codes (surface, toric, stabilizer) - Structure-preserving quantum encodings ## Supporting Infrastructure - **Security Module**: 17 issues fixed, path traversal prevention - **WASM Bindings**: 6 engines with TypeScript definitions - **Benchmarks**: 4,762 lines of criterion benchmarks - **Documentation**: 6 ADRs + DDD domain model (3,141 lines) - **Tests**: 191+ tests passing ## Mathematical Foundations - Sheaf Laplacian: E(S) = Σ w_e ||ρ_u(x_u) - ρ_v(x_v)||² - Cheeger inequality: λ₂/2 ≤ h(G) ≤ √(2λ₂) - Univalence: (A ≃ B) ≃ (A = B) - Do-calculus: P(Y|do(X)) identification Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(router-core): resolve HNSW index deadlock on second insert (#133) The insert() method was holding write locks on graph and entry_point while calling search_knn_internal(), which tries to acquire read locks on the same RwLocks. Since parking_lot::RwLock is NOT reentrant, this caused a deadlock on the second insert. Fix: Release all locks before calling search_knn_internal(), then re-acquire for modifications. Added regression tests: - test_hnsw_multiple_inserts_no_deadlock - test_hnsw_concurrent_inserts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: bump versions for v2.0.1 release - Rust workspace: 2.0.0 -> 2.0.1 - npm @ruvector/router: 0.1.25 -> 0.1.26 - npm platform packages: -> 0.1.26 - Added darwin-x64 to optional dependencies Contains fix for HNSW deadlock issue #133 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Reuven <cohen@ruv-mac-mini.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| benches | ||
| src | ||
| Cargo.toml | ||
| README.md | ||
Router Core
High-performance vector database and neural routing inference engine built in Rust.
Core engine powering Ruvector's intelligent request distribution, model selection, and sub-millisecond vector similarity search. Combines advanced indexing algorithms with SIMD-optimized distance calculations for maximum performance.
🎯 Overview
Router Core is the foundation of Ruvector's vector database capabilities, providing:
- Neural Routing: Intelligent request distribution across multiple models and endpoints
- Vector Database: High-performance storage and retrieval with HNSW indexing
- Model Selection: Adaptive routing strategies for multi-model AI systems
- SIMD Acceleration: Hardware-optimized vector operations via simsimd
- Memory Efficiency: Advanced quantization techniques (4-32x compression)
- Zero Dependencies: Pure Rust implementation with minimal external dependencies
⚡ Key Features
Core Capabilities
- Sub-Millisecond Search: <0.5ms p50 latency with HNSW indexing
- HNSW Indexing: Hierarchical Navigable Small World for fast approximate nearest neighbor search
- Multiple Distance Metrics: Euclidean, Cosine, Dot Product, Manhattan
- Advanced Quantization: Scalar (4x), Product (8-16x), Binary (32x) compression
- SIMD Optimizations: Hardware-accelerated distance calculations
- Zero-Copy I/O: Memory-mapped files for efficient data access
- Thread-Safe: Concurrent read/write operations with minimal locking
- Persistent Storage: Durable vector storage with redb backend
Neural Routing Features
- Intelligent Request Distribution: Route queries to optimal model endpoints
- Load Balancing: Distribute workload across multiple inference servers
- Model Selection: Automatically select best model based on query characteristics
- Adaptive Strategies: Learn and optimize routing decisions over time
- Latency Optimization: Minimize end-to-end inference time
- Failover Support: Automatic fallback to backup endpoints
📦 Installation
Add to your Cargo.toml:
[dependencies]
router-core = "0.1.0"
Or use the full ruvector package:
[dependencies]
ruvector-core = "0.1.0"
🚀 Quick Start
Basic Vector Database
use router_core::{VectorDB, VectorEntry, SearchQuery, DistanceMetric};
use std::collections::HashMap;
// Create database with builder pattern
let db = VectorDB::builder()
.dimensions(384) // Vector dimensions
.distance_metric(DistanceMetric::Cosine)
.hnsw_m(32) // HNSW connections per node
.hnsw_ef_construction(200) // Construction accuracy
.storage_path("./vectors.db")
.build()?;
// Insert vectors
let entry = VectorEntry {
id: "doc1".to_string(),
vector: vec![0.1; 384],
metadata: HashMap::new(),
timestamp: chrono::Utc::now().timestamp(),
};
db.insert(entry)?;
// Search for similar vectors
let query = SearchQuery {
vector: vec![0.1; 384],
k: 10, // Top 10 results
filters: None,
threshold: Some(0.8), // Minimum similarity
ef_search: Some(100), // Search accuracy
};
let results = db.search(query)?;
for result in results {
println!("{}: {}", result.id, result.score);
}
Batch Operations
use router_core::{VectorDB, VectorEntry};
// Insert multiple vectors efficiently
let entries: Vec<VectorEntry> = (0..1000)
.map(|i| VectorEntry {
id: format!("doc{}", i),
vector: vec![0.1; 384],
metadata: HashMap::new(),
timestamp: chrono::Utc::now().timestamp(),
})
.collect();
// Batch insert (much faster than individual inserts)
db.insert_batch(entries)?;
// Check statistics
let stats = db.stats();
println!("Total vectors: {}", stats.total_vectors);
println!("Avg latency: {:.2}μs", stats.avg_query_latency_us);
Advanced Configuration
use router_core::{VectorDB, DistanceMetric, QuantizationType};
let db = VectorDB::builder()
.dimensions(768) // Larger embeddings
.max_elements(10_000_000) // 10M vectors
.distance_metric(DistanceMetric::Cosine) // Cosine similarity
.hnsw_m(64) // More connections = higher recall
.hnsw_ef_construction(400) // Higher accuracy during build
.hnsw_ef_search(200) // Search-time accuracy
.quantization(QuantizationType::Scalar) // 4x memory compression
.mmap_vectors(true) // Memory-mapped storage
.storage_path("./large_db.redb")
.build()?;
🧠 Neural Routing Strategies
Router Core supports multiple routing strategies for intelligent request distribution:
1. Round-Robin Routing
Simple load balancing across endpoints:
use router_core::routing::{Router, RoundRobinStrategy};
let router = Router::new(RoundRobinStrategy::new(vec![
"http://model1:8080",
"http://model2:8080",
"http://model3:8080",
]));
let endpoint = router.select_endpoint(&query)?;
2. Latency-Based Routing
Route to fastest available endpoint:
use router_core::routing::{Router, LatencyBasedStrategy};
let router = Router::new(LatencyBasedStrategy::new(vec![
("http://model1:8080", 50), // 50ms avg latency
("http://model2:8080", 30), // 30ms avg latency (preferred)
("http://model3:8080", 100), // 100ms avg latency
]));
3. Semantic Routing
Route based on query similarity to model specializations:
use router_core::routing::{Router, SemanticStrategy};
// Define model specializations with example vectors
let models = vec![
("general-model", vec![0.1; 384]), // General queries
("code-model", vec![0.8, 0.2, ...]), // Code-related queries
("math-model", vec![0.3, 0.9, ...]), // Math queries
];
let router = Router::new(SemanticStrategy::new(models));
// Routes to most appropriate model based on query vector
let endpoint = router.select_endpoint(&query_vector)?;
4. Adaptive Routing
Learn optimal routing decisions over time:
use router_core::routing::{Router, AdaptiveStrategy};
let mut router = Router::new(AdaptiveStrategy::new());
// Router learns from feedback
router.record_request(&query, &endpoint, latency, success)?;
// Routing improves with more data
let best_endpoint = router.select_endpoint(&query)?;
🎨 Distance Metrics
Router Core supports multiple distance metrics with SIMD optimization:
Cosine Similarity
Best for normalized embeddings (recommended for most AI applications):
use router_core::{DistanceMetric, distance::calculate_distance};
let a = vec![1.0, 0.0, 0.0];
let b = vec![0.9, 0.1, 0.0];
let dist = calculate_distance(&a, &b, DistanceMetric::Cosine)?;
// Returns 1 - cosine_similarity (0 = identical, 2 = opposite)
Euclidean Distance (L2)
Measures absolute geometric distance:
let dist = calculate_distance(&a, &b, DistanceMetric::Euclidean)?;
// Returns sqrt(sum((a[i] - b[i])^2))
Dot Product
Fast similarity for pre-normalized vectors:
let dist = calculate_distance(&a, &b, DistanceMetric::DotProduct)?;
// Returns -sum(a[i] * b[i]) (negated for distance)
Manhattan Distance (L1)
Sum of absolute differences:
let dist = calculate_distance(&a, &b, DistanceMetric::Manhattan)?;
// Returns sum(|a[i] - b[i]|)
🗜️ Quantization Techniques
Reduce memory usage with minimal accuracy loss:
Scalar Quantization (4x compression)
Compress float32 to int8:
use router_core::{QuantizationType, VectorDB};
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Scalar)
.build()?;
// Automatic quantization on insert
// 384 dims × 4 bytes = 1536 bytes → 384 bytes + overhead
Product Quantization (8-16x compression)
Divide vector into subspaces and quantize independently:
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Product {
subspaces: 8, // Divide into 8 subspaces
k: 256, // 256 centroids per subspace
})
.build()?;
// 384 dims × 4 bytes = 1536 bytes → 8 bytes + overhead
Binary Quantization (32x compression)
Compress to 1 bit per dimension:
let db = VectorDB::builder()
.dimensions(384)
.quantization(QuantizationType::Binary)
.build()?;
// 384 dims × 4 bytes = 1536 bytes → 48 bytes + overhead
// Fast Hamming distance for similarity
Compression Ratio Comparison
use router_core::quantization::calculate_compression_ratio;
let dims = 384;
let none_ratio = calculate_compression_ratio(dims, QuantizationType::None);
// 1x - no compression
let scalar_ratio = calculate_compression_ratio(dims, QuantizationType::Scalar);
// ~4x compression
let product_ratio = calculate_compression_ratio(
dims,
QuantizationType::Product { subspaces: 8, k: 256 }
);
// ~8-16x compression
let binary_ratio = calculate_compression_ratio(dims, QuantizationType::Binary);
// ~32x compression
📊 HNSW Index Configuration
Tune the HNSW index for your performance/accuracy requirements:
M Parameter (Connections per Node)
Controls graph connectivity and search accuracy:
// Low M = faster build, less memory, lower recall
let db_fast = VectorDB::builder()
.hnsw_m(16) // Minimal connections
.build()?;
// Medium M = balanced (default)
let db_balanced = VectorDB::builder()
.hnsw_m(32) // Default setting
.build()?;
// High M = slower build, more memory, higher recall
let db_accurate = VectorDB::builder()
.hnsw_m(64) // Maximum accuracy
.build()?;
ef_construction (Build-Time Accuracy)
Controls accuracy during index construction:
// Fast build, lower recall
let db_fast = VectorDB::builder()
.hnsw_ef_construction(100)
.build()?;
// Balanced (default)
let db_balanced = VectorDB::builder()
.hnsw_ef_construction(200)
.build()?;
// Slow build, maximum recall
let db_accurate = VectorDB::builder()
.hnsw_ef_construction(400)
.build()?;
ef_search (Query-Time Accuracy)
Can be adjusted per query for dynamic performance/accuracy tradeoff:
// Fast search, lower recall
let query_fast = SearchQuery {
vector: query_vec,
k: 10,
ef_search: Some(50), // Override default
..Default::default()
};
// Accurate search
let query_accurate = SearchQuery {
vector: query_vec,
k: 10,
ef_search: Some(200), // Higher accuracy
..Default::default()
};
🎯 Use Cases
Multi-Model AI Systems
Route queries to specialized models based on content:
// Route code questions to code model, math to math model, etc.
let router = SemanticRouter::new(vec![
("gpt-4-code", code_specialization_vector),
("gpt-4-math", math_specialization_vector),
("gpt-4-general", general_specialization_vector),
]);
let best_model = router.route(&user_query_embedding)?;
Load Balancing
Distribute inference load across multiple servers:
// Balance load across 10 GPU servers
let router = LoadBalancer::new(vec![
"gpu-0.internal:8080",
"gpu-1.internal:8080",
// ... gpu-9
]);
let endpoint = router.next_endpoint()?;
RAG (Retrieval-Augmented Generation)
Fast context retrieval for LLMs:
// Store document embeddings
for doc in documents {
let embedding = embed_model.encode(&doc.text)?;
db.insert(VectorEntry {
id: doc.id,
vector: embedding,
metadata: doc.metadata,
timestamp: now(),
})?;
}
// Retrieve relevant context for query
let query_embedding = embed_model.encode(&user_query)?;
let context_docs = db.search(SearchQuery {
vector: query_embedding,
k: 5, // Top 5 most relevant
threshold: Some(0.7),
..Default::default()
})?;
Semantic Search
Build intelligent search engines:
// Index product catalog
for product in catalog {
let embedding = encode_product(&product)?;
db.insert(VectorEntry {
id: product.sku,
vector: embedding,
metadata: product.to_metadata(),
timestamp: now(),
})?;
}
// Search by natural language
let search_embedding = encode_query("comfortable running shoes")?;
let results = db.search(SearchQuery {
vector: search_embedding,
k: 20,
filters: Some(HashMap::from([
("category", "footwear"),
("in_stock", true),
])),
..Default::default()
})?;
Agent Memory Systems
Store and retrieve agent experiences:
// Store agent observations
struct AgentMemory {
db: VectorDB,
}
impl AgentMemory {
pub fn remember(&self, observation: &str, context: Vec<f32>) -> Result<()> {
self.db.insert(VectorEntry {
id: uuid::Uuid::new_v4().to_string(),
vector: context,
metadata: HashMap::from([
("observation", observation.into()),
("timestamp", now().into()),
]),
timestamp: now(),
})
}
pub fn recall(&self, query_context: Vec<f32>, k: usize) -> Result<Vec<String>> {
let results = self.db.search(SearchQuery {
vector: query_context,
k,
..Default::default()
})?;
Ok(results.iter()
.filter_map(|r| r.metadata.get("observation"))
.map(|v| v.as_str().unwrap().to_string())
.collect())
}
}
🔧 Configuration Guide
Optimizing for Different Workloads
High Throughput (Batch Processing)
let db = VectorDB::builder()
.dimensions(384)
.hnsw_m(16) // Lower M for faster queries
.hnsw_ef_construction(100) // Faster build
.hnsw_ef_search(50) // Lower default search accuracy
.quantization(QuantizationType::Scalar) // Compress for speed
.mmap_vectors(true) // Reduce memory pressure
.build()?;
High Accuracy (Research/Analysis)
let db = VectorDB::builder()
.dimensions(768)
.hnsw_m(64) // Maximum connections
.hnsw_ef_construction(400) // High build accuracy
.hnsw_ef_search(200) // High search accuracy
.quantization(QuantizationType::None) // No compression
.build()?;
Memory Constrained (Edge Devices)
let db = VectorDB::builder()
.dimensions(256) // Smaller embeddings
.max_elements(100_000) // Limit dataset size
.hnsw_m(16) // Fewer connections
.quantization(QuantizationType::Binary) // 32x compression
.mmap_vectors(true) // Use disk instead of RAM
.build()?;
Balanced (Production Default)
let db = VectorDB::builder()
.dimensions(384)
.hnsw_m(32)
.hnsw_ef_construction(200)
.hnsw_ef_search(100)
.quantization(QuantizationType::Scalar)
.mmap_vectors(true)
.build()?;
📈 Performance Characteristics
Latency Benchmarks
Configuration Query Latency (p50) Recall@10
─────────────────────────────────────────────────────────
Uncompressed, M=64 0.3ms 98.5%
Scalar Quant, M=32 0.4ms 96.2%
Product Quant, M=32 0.5ms 94.8%
Binary Quant, M=16 0.6ms 91.3%
Memory Usage (1M vectors @ 384 dims)
Quantization Memory Usage Compression Ratio
───────────────────────────────────────────────────────
None (float32) 1536 MB 1x
Scalar (int8) 392 MB 3.9x
Product (8 subspaces) 120 MB 12.8x
Binary (1 bit/dim) 52 MB 29.5x
Throughput (1M vectors)
Operation Throughput Notes
─────────────────────────────────────────────────────────
Single Insert ~100K/sec Sequential
Batch Insert ~500K/sec Parallel (rayon)
Query (k=10) ~50K QPS ef_search=100
Query (k=100) ~20K QPS ef_search=100
🏗️ Integration with Vector Database
Router Core integrates seamlessly with the main Ruvector database:
use ruvector_core::VectorDB as MainDB;
use router_core::VectorDB as RouterDB;
// Use router-core for specialized routing logic
let router_db = RouterDB::builder()
.dimensions(384)
.build()?;
// Or use main ruvector-core for full features
let main_db = MainDB::builder()
.dimensions(384)
.build()?;
// Both share the same API!
🧪 Building and Testing
Build
# Build library
cargo build --release -p router-core
# Build with all features
cargo build --release -p router-core --all-features
# Build static library
cargo build --release -p router-core --lib
Test
# Run all tests
cargo test -p router-core
# Run specific test
cargo test -p router-core test_hnsw_insert_and_search
# Run with logging
RUST_LOG=debug cargo test -p router-core
Benchmark
# Run benchmarks
cargo bench -p router-core
# Run specific benchmark
cargo bench -p router-core --bench vector_search
# With criterion output
cargo bench -p router-core -- --output-format verbose
📚 API Documentation
Core Types
VectorDB: Main database interfaceVectorEntry: Vector with ID, data, and metadataSearchQuery: Query parameters for similarity searchSearchResult: Search result with ID, score, and metadataDistanceMetric: Enum for distance calculation methodsQuantizationType: Enum for compression methods
Key Methods
// VectorDB
pub fn new(config: VectorDbConfig) -> Result<Self>
pub fn builder() -> VectorDbBuilder
pub fn insert(&self, entry: VectorEntry) -> Result<String>
pub fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<String>>
pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>
pub fn delete(&self, id: &str) -> Result<bool>
pub fn get(&self, id: &str) -> Result<Option<VectorEntry>>
pub fn stats(&self) -> VectorDbStats
pub fn count(&self) -> Result<usize>
// Distance calculations
pub fn calculate_distance(a: &[f32], b: &[f32], metric: DistanceMetric) -> Result<f32>
pub fn batch_distance(query: &[f32], vectors: &[Vec<f32>], metric: DistanceMetric) -> Result<Vec<f32>>
// Quantization
pub fn quantize(vector: &[f32], qtype: QuantizationType) -> Result<QuantizedVector>
pub fn dequantize(quantized: &QuantizedVector) -> Vec<f32>
pub fn calculate_compression_ratio(original_dims: usize, qtype: QuantizationType) -> f32
🔗 Links
- Main Repository: github.com/ruvnet/ruvector
- Documentation: docs/README.md
- API Reference: docs/api/RUST_API.md
- Performance Guide: docs/optimization/PERFORMANCE_TUNING_GUIDE.md
- Examples: examples/
📊 Related Crates
ruvector-core: Full-featured vector database (superset of router-core)ruvector-node: Node.js bindings via NAPI-RSruvector-wasm: WebAssembly bindings for browsersrouter-cli: Command-line interface for router operationsrouter-ffi: Foreign function interface for C/C++router-wasm: WebAssembly bindings for router
🤝 Contributing
Contributions are welcome! Please see:
📜 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
Built with battle-tested technologies:
- HNSW: Hierarchical Navigable Small World algorithm
- Product Quantization: Memory-efficient vector compression
- simsimd: SIMD-accelerated similarity computations
- redb: Embedded database for persistent storage
- rayon: Data parallelism for batch operations
- parking_lot: High-performance synchronization primitives