- Run cargo fmt --all to fix formatting in 362 files across the entire workspace - Add PGDG repository for PostgreSQL 17 in CI test-all-features and benchmark jobs - Add missing rvf dependency crates to standalone Dockerfile for domain-expansion - Add sona-learning and domain-expansion features to standalone Dockerfile build - Create npu.rs stub for ruvector-sparse-inference (fixes rustfmt resolution error) Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|---|---|---|
| .. | ||
| benches | ||
| src | ||
| tests | ||
| build.rs | ||
| Cargo.toml | ||
| README.md | ||
ruvector-solver
Sublinear-time sparse solvers for RuVector — O(log n) to O(sqrt(n)) algorithms that power graph analytics, spectral methods, and AI coherence.
| Dense Solvers (nalgebra) | ruvector-solver | |
|---|---|---|
| Complexity | O(n^3) | O(nnz * log n) to O(log n) |
| Memory | O(n^2) dense | O(nnz) sparse CSR |
| SIMD | Partial | AVX2 8-wide + fused kernels |
| Algorithms | LU, QR | 7 specialized + auto router |
| WASM | No | Full wasm-bindgen bindings |
| PageRank | Not supported | 3 sublinear algorithms |
All solvers operate on a shared CSR (Compressed Sparse Row) matrix representation and
expose a uniform SolverEngine trait for seamless algorithm swapping and
automatic routing.
Algorithms
| Algorithm | Module | Complexity | Applicable to |
|---|---|---|---|
| Jacobi-preconditioned Neumann series | neumann |
O(nnz * log(1/eps)) | Diagonally dominant Ax = b |
| Conjugate Gradient (Hestenes-Stiefel) | cg |
O(nnz * sqrt(kappa)) | Symmetric positive-definite Ax = b |
| Forward Push (Andersen-Chung-Lang) | forward_push |
O(1/epsilon) | Single-source Personalized PageRank |
| Backward Push | backward_push |
O(1/epsilon) | Reverse relevance / target-centric PPR |
| Hybrid Random Walk | random_walk |
O(sqrt(n)/epsilon) | Large-graph PPR with push initialisation |
| TRUE (JL + sparsification + Neumann) | true_solver |
O(nnz * log n) | Batch linear systems with shared A |
| BMSSP Multigrid (V-cycle + Jacobi) | bmssp |
O(n log n) | Ill-conditioned / graph Laplacian systems |
Quick Start
Add the crate to your Cargo.toml:
[dependencies]
ruvector-solver = "0.1"
Solve a diagonally dominant 3x3 system with the Neumann solver:
use ruvector_solver::types::CsrMatrix;
use ruvector_solver::neumann::NeumannSolver;
// Build a diagonally dominant 3x3 matrix in COO format
// [ 2.0 -0.5 0.0]
// A = [-0.5 2.0 -0.5]
// [ 0.0 -0.5 2.0]
let a = CsrMatrix::<f32>::from_coo(3, 3, vec![
(0, 0, 2.0_f32), (0, 1, -0.5),
(1, 0, -0.5), (1, 1, 2.0), (1, 2, -0.5),
(2, 1, -0.5), (2, 2, 2.0),
]);
let b = vec![1.0_f32, 0.0, 1.0];
let solver = NeumannSolver::new(1e-6, 500);
let result = solver.solve(&a, &b).unwrap();
println!("solution: {:?}", result.solution);
println!("iterations: {}", result.iterations);
println!("residual: {:.2e}", result.residual_norm);
assert!(result.residual_norm < 1e-4);
Use the SolverEngine trait for algorithm-agnostic code:
use ruvector_solver::types::{ComputeBudget, CsrMatrix};
use ruvector_solver::traits::SolverEngine;
use ruvector_solver::neumann::NeumannSolver;
let engine: Box<dyn SolverEngine> = Box::new(NeumannSolver::new(1e-6, 500));
let a = CsrMatrix::<f64>::from_coo(3, 3, vec![
(0, 0, 2.0), (0, 1, -0.5),
(1, 0, -0.5), (1, 1, 2.0), (1, 2, -0.5),
(2, 1, -0.5), (2, 2, 2.0),
]);
let b = vec![1.0_f64, 0.0, 1.0];
let budget = ComputeBudget::default();
let result = engine.solve(&a, &b, &budget).unwrap();
assert!(result.residual_norm < 1e-4);
Feature Flags
| Feature | Default | Description |
|---|---|---|
neumann |
Yes | Jacobi-preconditioned Neumann series solver |
cg |
Yes | Conjugate Gradient (Hestenes-Stiefel) solver |
forward-push |
Yes | Forward push for single-source PPR |
backward-push |
No | Backward push for reverse relevance computation |
hybrid-random-walk |
No | Hybrid random walk with push initialisation (enables getrandom) |
true-solver |
No | TRUE batch solver (implies neumann) |
bmssp |
No | BMSSP multigrid solver (V-cycle with Jacobi smoothing) |
all-algorithms |
No | Enables every algorithm above |
simd |
No | AVX2 SIMD-accelerated SpMV (x86_64 only) |
wasm |
No | WebAssembly target support |
parallel |
No | Multi-threaded SpMV and solver loops (enables rayon, crossbeam) |
full |
No | All algorithms + parallel + nalgebra-backend |
Enable all algorithms:
[dependencies]
ruvector-solver = { version = "0.1", features = ["all-algorithms"] }
Performance Optimisations
Bounds-check-free SpMV (spmv_unchecked)
The inner SpMV loop is the single hottest path in every iterative solver. The
spmv_unchecked method on CsrMatrix<f32> and CsrMatrix<f64> uses raw
pointers to eliminate per-element bounds checks, relying on a one-time CSR
structural validation (validation::validate_csr_matrix) performed before
entering the solve loop.
Fused residual + norm computation (fused_residual_norm_sq)
Standard implementations compute the residual r = b - Ax and its squared norm
||r||^2 in separate passes (SpMV, vector subtraction, dot product -- three
full memory traversals). fused_residual_norm_sq computes both in a single
pass, reducing memory traffic by roughly 3x per iteration.
AVX2 8-wide SIMD SpMV
When the simd feature is enabled on x86_64, spmv_simd dispatches to an
AVX2 kernel that processes 8 f32 values per instruction using _mm256
intrinsics with a horizontal sum reduction at the end of each row. Falls back
to a portable scalar loop on other architectures.
4-wide unrolled Jacobi update
The Neumann iteration's update step x[j] += d_inv[j] * r[j] is manually
unrolled 4-wide for instruction-level parallelism, with a scalar remainder loop
for dimensions not divisible by 4.
Arena allocator
SolverArena provides bump allocation for per-solve scratch buffers. All
temporary vectors are allocated from a single contiguous backing buffer and
reclaimed in O(1) via arena.reset(), eliminating heap allocation overhead
inside the iteration loop.
Architecture
+-------------------+
| SolverRouter |
| (algorithm select)|
+--------+----------+
|
+----------+-----------+-----------+----------+
| | | | |
+----v---+ +----v---+ +----v------+ +--v----+ +---v----+
|Neumann | | CG | |ForwardPush| | TRUE | | BMSSP |
|Solver | | Solver | | Solver | |Solver | |Solver |
+----+---+ +----+---+ +-----+-----+ +--+----+ +---+----+
| | | | |
+-----+----+-----+-----+-----+----+-----+---+
| | | |
+----v---+ +---v----+ +----v----+ +----v-----+
|types.rs| |simd.rs | |arena.rs | |budget.rs |
|CsrMatrix| |AVX2 | |Bump | |ComputeBudget|
+--------+ |SpMV | |Alloc | |enforcement|
+-------+ +--------+ +----------+
| | |
+----v---+ +---v------+ +--v---------+
|traits.rs| |validate.rs| |error.rs |
|SolverEngine| |CSR check| |SolverError |
+--------+ +---------+ +-----------+
The SolverRouter analyses the matrix SparsityProfile and QueryType
to select the optimal algorithm. When the selected algorithm fails,
SolverOrchestrator::solve_with_fallback tries a deterministic fallback
chain: selected -> CG -> Dense.
API Overview
Core types (types.rs)
| Type | Description |
|---|---|
CsrMatrix<T> |
Compressed Sparse Row matrix with spmv, spmv_unchecked, from_coo, transpose |
SolverResult |
Solution vector, iteration count, residual norm, wall time, convergence history |
ComputeBudget |
Maximum time, max iterations, target tolerance |
Algorithm |
Enum of all solver algorithms (Neumann, CG, ForwardPush, ...) |
SparsityProfile |
Matrix structural analysis (density, diagonal dominance, spectral radius estimate) |
QueryType |
What the caller wants to solve (LinearSystem, PageRankSingle, Batch, ...) |
ComplexityEstimate |
Predicted flops, iterations, memory, and complexity class |
Traits (traits.rs)
| Trait | Description |
|---|---|
SolverEngine |
Core trait: solve(matrix, rhs, budget) -> SolverResult |
SparseLaplacianSolver |
Extension for graph Laplacian systems and effective resistance |
SublinearPageRank |
Extension for sublinear PPR: ppr(matrix, source, alpha, epsilon) |
Error hierarchy (error.rs)
| Error | Cause |
|---|---|
SolverError::NonConvergence |
Iteration budget exhausted without reaching tolerance |
SolverError::NumericalInstability |
NaN/Inf or residual growth > 2x detected |
SolverError::SpectralRadiusExceeded |
Spectral radius >= 1.0 (Neumann series would diverge) |
SolverError::BudgetExhausted |
Wall-clock time limit exceeded |
SolverError::InvalidInput |
Dimension mismatch, non-finite values, index out of bounds |
SolverError::BackendError |
Backend-specific failure (nalgebra, BLAS) |
Infrastructure modules
| Module | Description |
|---|---|
router.rs |
SolverRouter for automatic algorithm selection; SolverOrchestrator with fallback |
simd.rs |
AVX2-accelerated SpMV with runtime feature detection |
validation.rs |
CSR structural validation (index bounds, monotonic row_ptr, NaN/Inf) |
arena.rs |
SolverArena bump allocator for zero per-iteration heap allocation |
budget.rs |
ComputeBudget enforcement during solve |
audit.rs |
Audit logging for solver invocations |
events.rs |
Event system for solver lifecycle hooks |
Testing
The crate includes 177 tests (138 unit tests + 39 integration/doctests):
# Run all tests
cargo test -p ruvector-solver
# Run tests with all algorithms enabled
cargo test -p ruvector-solver --features all-algorithms
# Run a specific test module
cargo test -p ruvector-solver -- neumann::tests
Benchmarks
Five Criterion benchmark groups are provided:
# Run all benchmarks
cargo bench -p ruvector-solver
# Run a specific benchmark
cargo bench -p ruvector-solver --bench solver_neumann
| Benchmark | Description |
|---|---|
solver_baseline |
Baseline SpMV and vector operations |
solver_neumann |
Neumann solver convergence on tridiagonal systems |
solver_cg |
Conjugate Gradient on SPD matrices |
solver_push |
Forward/backward push on graph adjacency matrices |
solver_e2e |
End-to-end solve through the router with algorithm selection |
Tutorial: Solving a Sparse Linear System
Step 1: Build a CSR matrix
use ruvector_solver::types::CsrMatrix;
// 4x4 tridiagonal system (diagonally dominant)
let a = CsrMatrix::<f32>::from_coo(4, 4, vec![
(0, 0, 3.0), (0, 1, -1.0),
(1, 0, -1.0), (1, 1, 3.0), (1, 2, -1.0),
(2, 1, -1.0), (2, 2, 3.0), (2, 3, -1.0),
(3, 2, -1.0), (3, 3, 3.0),
]);
let b = vec![2.0f32, 1.0, 1.0, 2.0];
Step 2: Choose a solver
use ruvector_solver::neumann::NeumannSolver;
let solver = NeumannSolver::new(1e-6, 500);
let result = solver.solve(&a, &b).unwrap();
println!("Solution: {:?}", result.solution);
println!("Iterations: {}", result.iterations);
println!("Residual: {:.2e}", result.residual_norm);
Step 3: Use the automatic router
use ruvector_solver::router::{SolverRouter, QueryType};
use ruvector_solver::types::{CsrMatrix, ComputeBudget};
let a64 = CsrMatrix::<f64>::from_coo(4, 4, vec![/* ... */]);
let b64 = vec![2.0, 1.0, 1.0, 2.0];
let budget = ComputeBudget::default();
let router = SolverRouter::new();
let (algo, result) = router.solve(&a64, &b64, &budget, QueryType::LinearSystem).unwrap();
println!("Router selected: {:?}", algo);
Step 4: Validate input
use ruvector_solver::validation::validate_csr_matrix;
let errors = validate_csr_matrix(&a);
assert!(errors.is_empty(), "CSR validation failed: {:?}", errors);
Step 5: Benchmark
cargo bench -p ruvector-solver --bench solver_neumann
cargo bench -p ruvector-solver --bench solver_e2e
Tutorial: PageRank with Forward Push
use ruvector_solver::forward_push::ForwardPushSolver;
use ruvector_solver::types::CsrMatrix;
// Build adjacency matrix for a small graph
let adj = CsrMatrix::<f32>::from_coo(4, 4, vec![
(0, 1, 1.0), (1, 0, 1.0),
(1, 2, 1.0), (2, 1, 1.0),
(2, 3, 1.0), (3, 2, 1.0),
(0, 3, 1.0), (3, 0, 1.0),
]);
let solver = ForwardPushSolver::new(0.85, 1e-6); // alpha=0.85
let ppr = solver.ppr(&adj, 0); // PPR from node 0
println!("PPR scores: {:?}", ppr);
Related Crates
| Crate | Role |
|---|---|
ruvector-attn-mincut |
Min-cut gating using graph solvers |
ruvector-coherence |
Coherence metrics for attention comparison |
ruvector-profiler |
Benchmarking memory, power, latency |
Minimum Supported Rust Version
Rust 1.77 or later.
License
Licensed under the MIT License.