ruvector/crates/ruvector-router-core
rUv 6c50685c4a
feat(prime-radiant): Advanced Mathematical Frameworks + fix(router): VectorDb Deadlock (#133) (#132)
* docs(coherence-engine): add ADR-014 and DDD for sheaf Laplacian coherence engine

Add comprehensive architecture documentation for ruvector-coherence crate:

- ADR-014: Sheaf Laplacian-based coherence witnessing architecture
  - Universal coherence object with domain-agnostic interpretation
  - 5-layer architecture (Application → Gate → Computation → Governance → Storage)
  - 4-tier compute ladder (Reflex → Retrieval → Heavy → Human)
  - Full ruvector ecosystem integration (10+ crates)
  - 15 internal architectural decisions

- DDD: Domain-Driven Design with 10 bounded contexts
  - Tile Fabric (cognitum-gate-kernel)
  - Adaptive Learning (sona)
  - Neural Gating (ruvector-nervous-system)
  - Learned Restriction Maps (ruvector-gnn)
  - Hyperbolic Coherence (ruvector-hyperbolic-hnsw)
  - Incoherence Isolation (ruvector-mincut)
  - Attention-Weighted Coherence (ruvector-attention)
  - Distributed Consensus (ruvector-raft)

Key concept: "This is not prediction. It is a continuously updated field
of coherence that shows where action is safe and where action must stop."

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement sheaf Laplacian coherence engine

Implement the complete Prime-Radiant crate based on ADR-014:

Core Modules:
- substrate/: SheafGraph, SheafNode, SheafEdge, RestrictionMap (SIMD-optimized)
- coherence/: CoherenceEngine, energy computation, spectral drift detection
- governance/: PolicyBundle, WitnessRecord, LineageRecord (Blake3 hashing)
- execution/: CoherenceGate, ComputeLane, ActionExecutor

Ecosystem Integrations (feature-gated):
- tiles/: cognitum-gate-kernel 256-tile WASM fabric adapter
- sona_tuning/: Adaptive threshold learning with EWC++
- neural_gate/: Biologically-inspired gating with HDC encoding
- learned_rho/: GNN-based learned restriction maps
- attention/: Topology-gated attention, MoE routing, PDE diffusion
- distributed/: Raft-based multi-node coherence

Testing:
- 138 tests (integration, property-based, chaos)
- 8 benchmarks covering ADR-014 performance targets

Stats: 91 files, ~30K lines of Rust code

"This is not prediction. It is a continuously updated field of coherence
that shows where action is safe and where action must stop."

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add RuvLLM integration to ADR-014 v0.4

- Add coherence-gated LLM inference architecture diagram
- Add 5 integration modules with code examples:
  - SheafCoherenceValidator (replaces heuristic scoring)
  - UnifiedWitnessLog (merged audit trail)
  - PatternToRestrictionBridge (ReasoningBank → learned ρ)
  - MemoryCoherenceLayer (context as sheaf nodes)
  - CoherenceConfidence (energy → confidence mapping)
- Add 7 integration ADRs (ADR-CE-016 through ADR-CE-022)
- Add ruvllm to crate integration matrix and dependencies
- Add 4 LLM-specific benefits to consequences
- Add ruvllm feature flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add 22 coherence engine internal ADRs

Create detailed ADR files for all internal coherence engine decisions:

Core Architecture (ADR-CE-001 to ADR-CE-008):
- 001: Sheaf Laplacian defines coherence witness
- 002: Incremental computation with stored residuals
- 003: PostgreSQL + ruvector hybrid storage
- 004: Signed event log with deterministic replay
- 005: First-class governance objects
- 006: Coherence gate controls compute ladder
- 007: Thresholds auto-tuned from traces
- 008: Multi-tenant isolation boundaries

Universal Coherence (ADR-CE-009 to ADR-CE-015):
- 009: Single coherence object (one math, many interpretations)
- 010: Domain-agnostic nodes and edges
- 011: Residual = contradiction energy
- 012: Gate = refusal mechanism with witness
- 013: Not prediction (coherence field, not forecasting)
- 014: Reflex lane default (most ops stay fast)
- 015: Adapt without losing control

RuvLLM Integration (ADR-CE-016 to ADR-CE-022):
- 016: CoherenceValidator uses sheaf energy
- 017: Unified audit trail (WitnessLog + governance)
- 018: Pattern-to-restriction bridge (ReasoningBank)
- 019: Memory as nodes (agentic, working, episodic)
- 020: Confidence from energy (sigmoid mapping)
- 021: Shared SONA between ruvllm and prime-radiant
- 022: Failure learning (ErrorPatternLearner → ρ maps)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement RuvLLM integration layer (ADR-014 v0.4)

Implement complete Prime-Radiant + RuvLLM integration per ADR-CE-016 through ADR-CE-022:

Core Integration Modules:
- coherence_validator.rs: SheafCoherenceValidator using sheaf energy
- witness_log.rs: UnifiedWitnessLog with hash chain for tamper evidence
- pattern_bridge.rs: PatternToRestrictionBridge learning from verdicts
- memory_layer.rs: MemoryCoherenceLayer tracking context as sheaf nodes
- confidence.rs: CoherenceConfidence with sigmoid energy→confidence mapping

Supporting Infrastructure:
- mod.rs: Public API, re-exports, convenience constructors
- error.rs: Comprehensive error types for each ADR
- config.rs: LlmCoherenceConfig, thresholds, policies
- gate.rs: LlmCoherenceGate high-level interface
- adapter.rs: RuvLlmAdapter bridging type systems
- bridge.rs: PolicyBridge, SonaBridge for synchronization
- witness.rs: WitnessAdapter for correlation
- traits.rs: Trait definitions for loose coupling

Testing:
- 22 integration tests covering all modules
- Self-contained mock implementations
- Feature-gated with #[cfg(feature = "ruvllm")]

Feature Flags:
- ruvllm feature in Cargo.toml
- Optional dependency on ruvllm crate
- Added to "full" feature set

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prime-radiant): add comprehensive README with examples

Add user-friendly documentation covering:
- Introduction explaining coherence vs confidence
- Core concepts (coherence field, compute ladder)
- Features overview (engine, governance, RuvLLM integration)
- Quick start code examples:
  - Basic coherence check
  - LLM response validation
  - Memory consistency tracking
  - Confidence from energy
- Application tiers (today, near-term, future)
- Domain examples (AI, finance, medical, robotics, security)
- Feature flags reference
- Performance targets
- Architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add ADR-015 Coherence-Gated Transformer (Sheaf Attention)

Propose novel low-latency transformer architecture using coherence energy:

Core Innovation:
- Route tokens to compute lanes based on coherence energy, not confidence
- Sparse attention using residual energy (skip coherent pairs)
- Early exit when energy converges (not confidence threshold)
- Restriction maps replace QKV projections

Architecture:
- Lane 0 (Reflex): 1-2 layers, local attention, <0.1ms
- Lane 1 (Standard): 6 layers, sparse sheaf attention, ~1ms
- Lane 2 (Deep): 12+ layers, full + MoE, ~5ms
- Lane 3 (Escalate): Return uncertainty

Performance Targets:
- 5-10x latency reduction (10ms → 1-2ms for 128 tokens)
- 2.5x memory reduction
- <5% quality degradation
- Provable coherence bound on output

Mathematical Foundation:
- Attention weight ∝ exp(-β × residual_energy)
- Token routing via E(t) = Σ w_e ||ρ_t(x) - ρ_ctx(x)||²
- Early exit when ΔE < ε (energy converged)

Target: ruvector-attention crate with sheaf/ and coherence_gated/ modules

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement coherence engine with CGT attention

Complete implementation of Prime-Radiant coherence engine and
Coherence-Gated Transformer (CGT) sheaf attention module.

Core Features:
- Sheaf Laplacian energy computation with restriction maps
- 4-lane compute ladder (Reflex/Retrieval/Heavy/Human)
- Cryptographic witness chains for audit trails
- Policy bundles with multi-party approval

Storage Backends:
- InMemoryStorage with KNN search
- FileStorage with Write-Ahead Logging (WAL)
- PostgresStorage with full schema (feature-gated)
- HybridStorage combining file + optional PostgreSQL

CGT Sheaf Attention (ruvector-attention):
- RestrictionMap with residual/energy computation
- SheafAttention layer: A_ij = exp(-β×E_ij)/Z
- TokenRouter with compute lane routing
- SparseResidualAttention with energy-based masking
- EarlyExit with energy convergence detection

Performance Optimizations:
- Zero-allocation hot paths (apply_into, compute_residual_norm_sq)
- SIMD-friendly 4-way unrolled loops
- Branchless lane routing
- Pre-allocated buffers for batch operations

RuvLLM Integration:
- SheafCoherenceValidator for LLM response validation
- UnifiedWitnessLog linking inference + coherence
- MemoryCoherenceLayer for contradiction detection
- CoherenceConfidence for interpretable uncertainty

Tests: 202 passing in ruvector-attention, 180+ in prime-radiant

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): add GPU acceleration, SIMD optimizations, and benchmarks

GPU Acceleration (wgpu-rs):
- GpuCoherenceEngine with automatic CPU fallback
- GpuDevice: adapter/device management with high-perf selection
- GpuDispatcher: kernel execution with pipeline caching and buffer pooling
- GpuBufferManager: typed buffer management with pooling
- Compute kernels: residuals, energy reduction, sheaf attention, token routing

WGSL Compute Shaders (6 files, 1,412 lines):
- compute_residuals.wgsl: parallel edge residual computation
- compute_energy.wgsl: two-phase parallel reduction
- sheaf_attention.wgsl: energy-based attention weights A_ij = exp(-beta * E_ij)
- token_routing.wgsl: branchless lane assignment
- sparse_mask.wgsl: sparse attention mask generation
- types.wgsl: shared GPU struct definitions

SIMD Optimizations (wide crate):
- Runtime CPU feature detection (AVX2, AVX-512, SSE4.2, NEON)
- f32x8 vectorized operations
- simd/vectors.rs: dot_product_simd, norm_squared_simd, subtract_simd
- simd/matrix.rs: matmul_simd, matvec_simd, transpose_simd
- simd/energy.rs: batch_residuals_simd, weighted_energy_sum_simd
- 38 unit tests verifying SIMD correctness

Benchmarks (criterion):
- coherence_benchmarks.rs: core operations, graph scaling
- simd_benchmarks.rs: SIMD vs naive comparisons
- gpu_benchmarks.rs: CPU vs GPU performance

Tests:
- 18 GPU coherence tests (16 active, 2 perf ignored)
- GPU-CPU consistency within 1% relative error
- Error handling and fallback verification

README improvements:
- "What Prime-Radiant is NOT" section
- Concrete numeric example with arithmetic
- Flagship LLM hallucination refusal walkthrough
- Infrastructure positioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* perf(prime-radiant): optimize SIMD and core computation patterns

SIMD Optimizations:
- Replace element-by-element load_f32x8 with try_into for direct memory copy
- Fix redundant SIMD comparisons in lane assignment (compute masks once, use blend)
- Apply across vectors.rs, matrix.rs, and energy.rs

Core Computation Patterns:
- Replace i % 4 modulo with chunks_exact() for proper auto-vectorization
- Fix edge.rs: residual_norm_squared, residual_with_energy
- Fix node.rs: norm_squared, dot product

Graph API:
- Add get_node_ref() for zero-copy node access via DashMap reference
- Add with_node() closure API for efficient read-only operations

Benchmark findings:
- Incremental updates meet target (<100us): 59us actual
- Linear O(n) scaling confirmed
- Further SIMD/parallelization needed for <1us/edge target

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* perf(prime-radiant): add CSR sparse matrix, GPU buffer prealloc, thread-local scratch

Performance optimizations for Prime-Radiant coherence engine:

CSR Sparse Matrix (restriction.rs):
- Full CsrMatrix struct with row_ptr, col_indices, values
- COO to CSR conversion with from_coo() and from_coo_arrays()
- Zero-allocation matvec_into() and matvec_add_into()
- SIMD-friendly 4-element loop unrolling
- 13 new tests covering all CSR operations

GPU Buffer Pre-allocation (engine.rs, kernels.rs):
- Pre-allocated params, energy_params, partial_sums, staging buffers
- Zero per-frame allocations in compute_energy()
- New create_bind_group_raw() methods for raw buffer references
- CSR matrix support in convert_restriction_map()

Thread-Local Scratch Buffers (edge.rs):
- EdgeScratch struct with 3 reusable Vec<f32> buffers
- thread_local! SCRATCH for zero-allocation hot paths
- residual_norm_squared_no_alloc() and weighted_residual_energy_no_alloc()
- 7 new tests for allocation-free energy computation

WGSL Vec4 Optimization (compute_residuals.wgsl):
- vec4-based processing loop with dot(r_vec, r_vec)
- store_residuals flag in GpuParams struct
- ~4x GPU throughput improvement

README Updates:
- Root README: 40 attention mechanisms, Prime-Radiant section, CGT Sheaf Attention
- WASM README: CGT Sheaf Attention API documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: SEO optimize package metadata for crates.io and npm

- prime-radiant: Enhanced description, keywords, categories
- ruvector-attention-wasm: Add version to path dep, SEO keywords
- package.json: 23 keywords, better description, engines config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore(hyperbolic-hnsw): SEO optimize for crates.io publish

* chore(prime-radiant): add version numbers to path dependencies for crates.io publish

* fix(prime-radiant): shorten keyword for crates.io compliance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(readme): add prime-radiant and ruvector-attention-wasm package references

- Add prime-radiant to Quantum Coherence section (sheaf Laplacian AI safety)
- Add ruvector-attention-wasm to npm WASM packages (Flash, MoE, Hyperbolic, CGT)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement 6 advanced mathematical frameworks

Comprehensive implementation of cutting-edge mathematical foundations:

## Modules Implemented

1. **Sheaf Cohomology** (10 files)
   - Coboundary operator, Cohomology groups, Betti numbers
   - Sheaf Laplacian, Obstruction detection, Diffusion
   - Sheaf Neural Networks with CohomologyPooling

2. **Category Theory/Topos** (12 files)
   - Category trait, Functors, Natural transformations
   - Topos with SubobjectClassifier, InternalLogic
   - 2-Category with Mac Lane coherence (pentagon/triangle)
   - BeliefTopos for probabilistic reasoning

3. **Homotopy Type Theory** (8 files)
   - Type/Term AST with Pi, Sigma, Identity types
   - Path operations, J-eliminator, Transport
   - Univalence axiom, Bidirectional type checker
   - Coherence as paths between belief states

4. **Spectral Invariants** (8 files)
   - Lanczos eigensolver for sparse matrices
   - Cheeger inequality bounds and sweep algorithm
   - Spectral clustering with k-means++
   - Collapse prediction and early warning system

5. **Causal Abstraction** (7 files)
   - Structural Causal Models with do-calculus
   - D-separation (Bayes Ball), Topological ordering
   - Counterfactuals: ATE, ITE, NDE, NIE
   - Causal abstraction verification

6. **Quantum/Algebraic Topology** (10 files)
   - Quantum states, Density matrices, Channels
   - Simplicial complexes, Persistent homology
   - Topological codes (surface, toric, stabilizer)
   - Structure-preserving quantum encodings

## Supporting Infrastructure

- **Security Module**: 17 issues fixed, path traversal prevention
- **WASM Bindings**: 6 engines with TypeScript definitions
- **Benchmarks**: 4,762 lines of criterion benchmarks
- **Documentation**: 6 ADRs + DDD domain model (3,141 lines)
- **Tests**: 191+ tests passing

## Mathematical Foundations

- Sheaf Laplacian: E(S) = Σ w_e ||ρ_u(x_u) - ρ_v(x_v)||²
- Cheeger inequality: λ₂/2 ≤ h(G) ≤ √(2λ₂)
- Univalence: (A ≃ B) ≃ (A = B)
- Do-calculus: P(Y|do(X)) identification

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(router-core): resolve HNSW index deadlock on second insert (#133)

The insert() method was holding write locks on graph and entry_point
while calling search_knn_internal(), which tries to acquire read locks
on the same RwLocks. Since parking_lot::RwLock is NOT reentrant, this
caused a deadlock on the second insert.

Fix: Release all locks before calling search_knn_internal(), then
re-acquire for modifications.

Added regression tests:
- test_hnsw_multiple_inserts_no_deadlock
- test_hnsw_concurrent_inserts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: bump versions for v2.0.1 release

- Rust workspace: 2.0.0 -> 2.0.1
- npm @ruvector/router: 0.1.25 -> 0.1.26
- npm platform packages: -> 0.1.26
- Added darwin-x64 to optional dependencies

Contains fix for HNSW deadlock issue #133

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:30:59 -05:00
..
benches fix: Resolve CI build failures 2025-11-26 15:25:47 +00:00
src feat(prime-radiant): Advanced Mathematical Frameworks + fix(router): VectorDb Deadlock (#133) (#132) 2026-01-24 12:30:59 -05:00
Cargo.toml chore: Update workspace version to 0.1.2 and simplify CI workflow 2025-11-26 17:43:34 +00:00
README.md chore: Rename router-* crates to ruvector-router-* and publish all 2025-11-21 15:13:26 +00:00

Router Core

Rust License: MIT Performance

High-performance vector database and neural routing inference engine built in Rust.

Core engine powering Ruvector's intelligent request distribution, model selection, and sub-millisecond vector similarity search. Combines advanced indexing algorithms with SIMD-optimized distance calculations for maximum performance.

🎯 Overview

Router Core is the foundation of Ruvector's vector database capabilities, providing:

  • Neural Routing: Intelligent request distribution across multiple models and endpoints
  • Vector Database: High-performance storage and retrieval with HNSW indexing
  • Model Selection: Adaptive routing strategies for multi-model AI systems
  • SIMD Acceleration: Hardware-optimized vector operations via simsimd
  • Memory Efficiency: Advanced quantization techniques (4-32x compression)
  • Zero Dependencies: Pure Rust implementation with minimal external dependencies

Key Features

Core Capabilities

  • Sub-Millisecond Search: <0.5ms p50 latency with HNSW indexing
  • HNSW Indexing: Hierarchical Navigable Small World for fast approximate nearest neighbor search
  • Multiple Distance Metrics: Euclidean, Cosine, Dot Product, Manhattan
  • Advanced Quantization: Scalar (4x), Product (8-16x), Binary (32x) compression
  • SIMD Optimizations: Hardware-accelerated distance calculations
  • Zero-Copy I/O: Memory-mapped files for efficient data access
  • Thread-Safe: Concurrent read/write operations with minimal locking
  • Persistent Storage: Durable vector storage with redb backend

Neural Routing Features

  • Intelligent Request Distribution: Route queries to optimal model endpoints
  • Load Balancing: Distribute workload across multiple inference servers
  • Model Selection: Automatically select best model based on query characteristics
  • Adaptive Strategies: Learn and optimize routing decisions over time
  • Latency Optimization: Minimize end-to-end inference time
  • Failover Support: Automatic fallback to backup endpoints

📦 Installation

Add to your Cargo.toml:

[dependencies]
router-core = "0.1.0"

Or use the full ruvector package:

[dependencies]
ruvector-core = "0.1.0"

🚀 Quick Start

Basic Vector Database

use router_core::{VectorDB, VectorEntry, SearchQuery, DistanceMetric};
use std::collections::HashMap;

// Create database with builder pattern
let db = VectorDB::builder()
    .dimensions(384)           // Vector dimensions
    .distance_metric(DistanceMetric::Cosine)
    .hnsw_m(32)               // HNSW connections per node
    .hnsw_ef_construction(200) // Construction accuracy
    .storage_path("./vectors.db")
    .build()?;

// Insert vectors
let entry = VectorEntry {
    id: "doc1".to_string(),
    vector: vec![0.1; 384],
    metadata: HashMap::new(),
    timestamp: chrono::Utc::now().timestamp(),
};

db.insert(entry)?;

// Search for similar vectors
let query = SearchQuery {
    vector: vec![0.1; 384],
    k: 10,                     // Top 10 results
    filters: None,
    threshold: Some(0.8),      // Minimum similarity
    ef_search: Some(100),      // Search accuracy
};

let results = db.search(query)?;
for result in results {
    println!("{}: {}", result.id, result.score);
}

Batch Operations

use router_core::{VectorDB, VectorEntry};

// Insert multiple vectors efficiently
let entries: Vec<VectorEntry> = (0..1000)
    .map(|i| VectorEntry {
        id: format!("doc{}", i),
        vector: vec![0.1; 384],
        metadata: HashMap::new(),
        timestamp: chrono::Utc::now().timestamp(),
    })
    .collect();

// Batch insert (much faster than individual inserts)
db.insert_batch(entries)?;

// Check statistics
let stats = db.stats();
println!("Total vectors: {}", stats.total_vectors);
println!("Avg latency: {:.2}μs", stats.avg_query_latency_us);

Advanced Configuration

use router_core::{VectorDB, DistanceMetric, QuantizationType};

let db = VectorDB::builder()
    .dimensions(768)                          // Larger embeddings
    .max_elements(10_000_000)                 // 10M vectors
    .distance_metric(DistanceMetric::Cosine)  // Cosine similarity
    .hnsw_m(64)                               // More connections = higher recall
    .hnsw_ef_construction(400)                // Higher accuracy during build
    .hnsw_ef_search(200)                      // Search-time accuracy
    .quantization(QuantizationType::Scalar)   // 4x memory compression
    .mmap_vectors(true)                       // Memory-mapped storage
    .storage_path("./large_db.redb")
    .build()?;

🧠 Neural Routing Strategies

Router Core supports multiple routing strategies for intelligent request distribution:

1. Round-Robin Routing

Simple load balancing across endpoints:

use router_core::routing::{Router, RoundRobinStrategy};

let router = Router::new(RoundRobinStrategy::new(vec![
    "http://model1:8080",
    "http://model2:8080",
    "http://model3:8080",
]));

let endpoint = router.select_endpoint(&query)?;

2. Latency-Based Routing

Route to fastest available endpoint:

use router_core::routing::{Router, LatencyBasedStrategy};

let router = Router::new(LatencyBasedStrategy::new(vec![
    ("http://model1:8080", 50),  // 50ms avg latency
    ("http://model2:8080", 30),  // 30ms avg latency (preferred)
    ("http://model3:8080", 100), // 100ms avg latency
]));

3. Semantic Routing

Route based on query similarity to model specializations:

use router_core::routing::{Router, SemanticStrategy};

// Define model specializations with example vectors
let models = vec![
    ("general-model", vec![0.1; 384]),  // General queries
    ("code-model", vec![0.8, 0.2, ...]), // Code-related queries
    ("math-model", vec![0.3, 0.9, ...]), // Math queries
];

let router = Router::new(SemanticStrategy::new(models));

// Routes to most appropriate model based on query vector
let endpoint = router.select_endpoint(&query_vector)?;

4. Adaptive Routing

Learn optimal routing decisions over time:

use router_core::routing::{Router, AdaptiveStrategy};

let mut router = Router::new(AdaptiveStrategy::new());

// Router learns from feedback
router.record_request(&query, &endpoint, latency, success)?;

// Routing improves with more data
let best_endpoint = router.select_endpoint(&query)?;

🎨 Distance Metrics

Router Core supports multiple distance metrics with SIMD optimization:

Cosine Similarity

Best for normalized embeddings (recommended for most AI applications):

use router_core::{DistanceMetric, distance::calculate_distance};

let a = vec![1.0, 0.0, 0.0];
let b = vec![0.9, 0.1, 0.0];

let dist = calculate_distance(&a, &b, DistanceMetric::Cosine)?;
// Returns 1 - cosine_similarity (0 = identical, 2 = opposite)

Euclidean Distance (L2)

Measures absolute geometric distance:

let dist = calculate_distance(&a, &b, DistanceMetric::Euclidean)?;
// Returns sqrt(sum((a[i] - b[i])^2))

Dot Product

Fast similarity for pre-normalized vectors:

let dist = calculate_distance(&a, &b, DistanceMetric::DotProduct)?;
// Returns -sum(a[i] * b[i]) (negated for distance)

Manhattan Distance (L1)

Sum of absolute differences:

let dist = calculate_distance(&a, &b, DistanceMetric::Manhattan)?;
// Returns sum(|a[i] - b[i]|)

🗜️ Quantization Techniques

Reduce memory usage with minimal accuracy loss:

Scalar Quantization (4x compression)

Compress float32 to int8:

use router_core::{QuantizationType, VectorDB};

let db = VectorDB::builder()
    .dimensions(384)
    .quantization(QuantizationType::Scalar)
    .build()?;

// Automatic quantization on insert
// 384 dims × 4 bytes = 1536 bytes → 384 bytes + overhead

Product Quantization (8-16x compression)

Divide vector into subspaces and quantize independently:

let db = VectorDB::builder()
    .dimensions(384)
    .quantization(QuantizationType::Product {
        subspaces: 8,    // Divide into 8 subspaces
        k: 256,          // 256 centroids per subspace
    })
    .build()?;

// 384 dims × 4 bytes = 1536 bytes → 8 bytes + overhead

Binary Quantization (32x compression)

Compress to 1 bit per dimension:

let db = VectorDB::builder()
    .dimensions(384)
    .quantization(QuantizationType::Binary)
    .build()?;

// 384 dims × 4 bytes = 1536 bytes → 48 bytes + overhead
// Fast Hamming distance for similarity

Compression Ratio Comparison

use router_core::quantization::calculate_compression_ratio;

let dims = 384;

let none_ratio = calculate_compression_ratio(dims, QuantizationType::None);
// 1x - no compression

let scalar_ratio = calculate_compression_ratio(dims, QuantizationType::Scalar);
// ~4x compression

let product_ratio = calculate_compression_ratio(
    dims,
    QuantizationType::Product { subspaces: 8, k: 256 }
);
// ~8-16x compression

let binary_ratio = calculate_compression_ratio(dims, QuantizationType::Binary);
// ~32x compression

📊 HNSW Index Configuration

Tune the HNSW index for your performance/accuracy requirements:

M Parameter (Connections per Node)

Controls graph connectivity and search accuracy:

// Low M = faster build, less memory, lower recall
let db_fast = VectorDB::builder()
    .hnsw_m(16)  // Minimal connections
    .build()?;

// Medium M = balanced (default)
let db_balanced = VectorDB::builder()
    .hnsw_m(32)  // Default setting
    .build()?;

// High M = slower build, more memory, higher recall
let db_accurate = VectorDB::builder()
    .hnsw_m(64)  // Maximum accuracy
    .build()?;

ef_construction (Build-Time Accuracy)

Controls accuracy during index construction:

// Fast build, lower recall
let db_fast = VectorDB::builder()
    .hnsw_ef_construction(100)
    .build()?;

// Balanced (default)
let db_balanced = VectorDB::builder()
    .hnsw_ef_construction(200)
    .build()?;

// Slow build, maximum recall
let db_accurate = VectorDB::builder()
    .hnsw_ef_construction(400)
    .build()?;

ef_search (Query-Time Accuracy)

Can be adjusted per query for dynamic performance/accuracy tradeoff:

// Fast search, lower recall
let query_fast = SearchQuery {
    vector: query_vec,
    k: 10,
    ef_search: Some(50),  // Override default
    ..Default::default()
};

// Accurate search
let query_accurate = SearchQuery {
    vector: query_vec,
    k: 10,
    ef_search: Some(200),  // Higher accuracy
    ..Default::default()
};

🎯 Use Cases

Multi-Model AI Systems

Route queries to specialized models based on content:

// Route code questions to code model, math to math model, etc.
let router = SemanticRouter::new(vec![
    ("gpt-4-code", code_specialization_vector),
    ("gpt-4-math", math_specialization_vector),
    ("gpt-4-general", general_specialization_vector),
]);

let best_model = router.route(&user_query_embedding)?;

Load Balancing

Distribute inference load across multiple servers:

// Balance load across 10 GPU servers
let router = LoadBalancer::new(vec![
    "gpu-0.internal:8080",
    "gpu-1.internal:8080",
    // ... gpu-9
]);

let endpoint = router.next_endpoint()?;

RAG (Retrieval-Augmented Generation)

Fast context retrieval for LLMs:

// Store document embeddings
for doc in documents {
    let embedding = embed_model.encode(&doc.text)?;
    db.insert(VectorEntry {
        id: doc.id,
        vector: embedding,
        metadata: doc.metadata,
        timestamp: now(),
    })?;
}

// Retrieve relevant context for query
let query_embedding = embed_model.encode(&user_query)?;
let context_docs = db.search(SearchQuery {
    vector: query_embedding,
    k: 5,  // Top 5 most relevant
    threshold: Some(0.7),
    ..Default::default()
})?;

Build intelligent search engines:

// Index product catalog
for product in catalog {
    let embedding = encode_product(&product)?;
    db.insert(VectorEntry {
        id: product.sku,
        vector: embedding,
        metadata: product.to_metadata(),
        timestamp: now(),
    })?;
}

// Search by natural language
let search_embedding = encode_query("comfortable running shoes")?;
let results = db.search(SearchQuery {
    vector: search_embedding,
    k: 20,
    filters: Some(HashMap::from([
        ("category", "footwear"),
        ("in_stock", true),
    ])),
    ..Default::default()
})?;

Agent Memory Systems

Store and retrieve agent experiences:

// Store agent observations
struct AgentMemory {
    db: VectorDB,
}

impl AgentMemory {
    pub fn remember(&self, observation: &str, context: Vec<f32>) -> Result<()> {
        self.db.insert(VectorEntry {
            id: uuid::Uuid::new_v4().to_string(),
            vector: context,
            metadata: HashMap::from([
                ("observation", observation.into()),
                ("timestamp", now().into()),
            ]),
            timestamp: now(),
        })
    }

    pub fn recall(&self, query_context: Vec<f32>, k: usize) -> Result<Vec<String>> {
        let results = self.db.search(SearchQuery {
            vector: query_context,
            k,
            ..Default::default()
        })?;

        Ok(results.iter()
            .filter_map(|r| r.metadata.get("observation"))
            .map(|v| v.as_str().unwrap().to_string())
            .collect())
    }
}

🔧 Configuration Guide

Optimizing for Different Workloads

High Throughput (Batch Processing)

let db = VectorDB::builder()
    .dimensions(384)
    .hnsw_m(16)                  // Lower M for faster queries
    .hnsw_ef_construction(100)   // Faster build
    .hnsw_ef_search(50)          // Lower default search accuracy
    .quantization(QuantizationType::Scalar)  // Compress for speed
    .mmap_vectors(true)          // Reduce memory pressure
    .build()?;

High Accuracy (Research/Analysis)

let db = VectorDB::builder()
    .dimensions(768)
    .hnsw_m(64)                  // Maximum connections
    .hnsw_ef_construction(400)   // High build accuracy
    .hnsw_ef_search(200)         // High search accuracy
    .quantization(QuantizationType::None)  // No compression
    .build()?;

Memory Constrained (Edge Devices)

let db = VectorDB::builder()
    .dimensions(256)             // Smaller embeddings
    .max_elements(100_000)       // Limit dataset size
    .hnsw_m(16)                  // Fewer connections
    .quantization(QuantizationType::Binary)  // 32x compression
    .mmap_vectors(true)          // Use disk instead of RAM
    .build()?;

Balanced (Production Default)

let db = VectorDB::builder()
    .dimensions(384)
    .hnsw_m(32)
    .hnsw_ef_construction(200)
    .hnsw_ef_search(100)
    .quantization(QuantizationType::Scalar)
    .mmap_vectors(true)
    .build()?;

📈 Performance Characteristics

Latency Benchmarks

Configuration          Query Latency (p50)    Recall@10
─────────────────────────────────────────────────────────
Uncompressed, M=64     0.3ms                  98.5%
Scalar Quant, M=32     0.4ms                  96.2%
Product Quant, M=32    0.5ms                  94.8%
Binary Quant, M=16     0.6ms                  91.3%

Memory Usage (1M vectors @ 384 dims)

Quantization           Memory Usage    Compression Ratio
───────────────────────────────────────────────────────
None (float32)         1536 MB         1x
Scalar (int8)          392 MB          3.9x
Product (8 subspaces)  120 MB          12.8x
Binary (1 bit/dim)     52 MB           29.5x

Throughput (1M vectors)

Operation              Throughput      Notes
─────────────────────────────────────────────────────────
Single Insert          ~100K/sec       Sequential
Batch Insert           ~500K/sec       Parallel (rayon)
Query (k=10)           ~50K QPS        ef_search=100
Query (k=100)          ~20K QPS        ef_search=100

🏗️ Integration with Vector Database

Router Core integrates seamlessly with the main Ruvector database:

use ruvector_core::VectorDB as MainDB;
use router_core::VectorDB as RouterDB;

// Use router-core for specialized routing logic
let router_db = RouterDB::builder()
    .dimensions(384)
    .build()?;

// Or use main ruvector-core for full features
let main_db = MainDB::builder()
    .dimensions(384)
    .build()?;

// Both share the same API!

🧪 Building and Testing

Build

# Build library
cargo build --release -p router-core

# Build with all features
cargo build --release -p router-core --all-features

# Build static library
cargo build --release -p router-core --lib

Test

# Run all tests
cargo test -p router-core

# Run specific test
cargo test -p router-core test_hnsw_insert_and_search

# Run with logging
RUST_LOG=debug cargo test -p router-core

Benchmark

# Run benchmarks
cargo bench -p router-core

# Run specific benchmark
cargo bench -p router-core --bench vector_search

# With criterion output
cargo bench -p router-core -- --output-format verbose

📚 API Documentation

Core Types

  • VectorDB: Main database interface
  • VectorEntry: Vector with ID, data, and metadata
  • SearchQuery: Query parameters for similarity search
  • SearchResult: Search result with ID, score, and metadata
  • DistanceMetric: Enum for distance calculation methods
  • QuantizationType: Enum for compression methods

Key Methods

// VectorDB
pub fn new(config: VectorDbConfig) -> Result<Self>
pub fn builder() -> VectorDbBuilder
pub fn insert(&self, entry: VectorEntry) -> Result<String>
pub fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<String>>
pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>
pub fn delete(&self, id: &str) -> Result<bool>
pub fn get(&self, id: &str) -> Result<Option<VectorEntry>>
pub fn stats(&self) -> VectorDbStats
pub fn count(&self) -> Result<usize>

// Distance calculations
pub fn calculate_distance(a: &[f32], b: &[f32], metric: DistanceMetric) -> Result<f32>
pub fn batch_distance(query: &[f32], vectors: &[Vec<f32>], metric: DistanceMetric) -> Result<Vec<f32>>

// Quantization
pub fn quantize(vector: &[f32], qtype: QuantizationType) -> Result<QuantizedVector>
pub fn dequantize(quantized: &QuantizedVector) -> Vec<f32>
pub fn calculate_compression_ratio(original_dims: usize, qtype: QuantizationType) -> f32
  • ruvector-core: Full-featured vector database (superset of router-core)
  • ruvector-node: Node.js bindings via NAPI-RS
  • ruvector-wasm: WebAssembly bindings for browsers
  • router-cli: Command-line interface for router operations
  • router-ffi: Foreign function interface for C/C++
  • router-wasm: WebAssembly bindings for router

🤝 Contributing

Contributions are welcome! Please see:

📜 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Built with battle-tested technologies:

  • HNSW: Hierarchical Navigable Small World algorithm
  • Product Quantization: Memory-efficient vector compression
  • simsimd: SIMD-accelerated similarity computations
  • redb: Embedded database for persistent storage
  • rayon: Data parallelism for batch operations
  • parking_lot: High-performance synchronization primitives

Part of the Ruvector ecosystem

Built by rUv • Production Ready • MIT Licensed

DocumentationAPI ReferenceExamplesBenchmarks