diff --git a/crates/ruvector-mincut/docs/security/BMSSP-SECURITY-REVIEW.md b/crates/ruvector-mincut/docs/security/BMSSP-SECURITY-REVIEW.md new file mode 100644 index 000000000..bb2c2c462 --- /dev/null +++ b/crates/ruvector-mincut/docs/security/BMSSP-SECURITY-REVIEW.md @@ -0,0 +1,1199 @@ +# BMSSP WASM Integration Security Review + +**Date:** 2026-01-25 +**Auditor:** Security Architecture Agent +**Scope:** Comprehensive security review of BMSSP WASM integration for j-tree operations +**Version:** ADR-002-addendum-bmssp-integration (Proposed) +**Classification:** Internal Security Document + +--- + +## Executive Summary + +This security review examines the proposed integration of `@ruvnet/bmssp` (Bounded Multi-Source Shortest Path) WASM module with the ruvector-mincut j-tree hierarchy. The review covers WASM sandbox security, FFI boundary safety, input validation, resource exhaustion vectors, supply chain risks, error handling, and cryptographic considerations. + +### Risk Summary Matrix + +| Category | Critical | High | Medium | Low | Info | +|----------|----------|------|--------|-----|------| +| WASM Sandbox Security | 0 | 1 | 2 | 1 | 2 | +| FFI Boundary Safety | 0 | 2 | 1 | 2 | 1 | +| Input Validation | 0 | 1 | 3 | 2 | 1 | +| Resource Exhaustion | 0 | 1 | 2 | 1 | 2 | +| Supply Chain | 0 | 1 | 1 | 2 | 2 | +| Error Handling | 0 | 0 | 2 | 2 | 1 | +| Cryptographic | 0 | 0 | 1 | 1 | 2 | +| **Total** | **0** | **6** | **12** | **11** | **11** | + +**Overall Risk Rating:** **MEDIUM-HIGH** + +The integration introduces significant FFI boundary complexity and external dependency risks that require careful mitigation before production deployment. + +--- + +## 1. WASM Sandbox Security + +### 1.1 Memory Isolation Analysis + +**Current Implementation (ruvector-mincut/src/wasm/agentic.rs):** + +```rust +// FINDING: Static mutable global state pattern +#[cfg(target_arch = "wasm32")] +pub mod ffi { + static mut INSTANCE: Option = None; + + #[no_mangle] + pub extern "C" fn mincut_init(num_vertices: u16, num_edges: u16, strategy: u8) { + unsafe { + // Direct mutation of global state + INSTANCE = Some(instance); + } + } +} +``` + +**Identified Issues:** + +| ID | Severity | Issue | Location | CVSS 3.1 | +|----|----------|-------|----------|----------| +| WASM-SEC-001 | High | Static mutable state without synchronization | `agentic.rs:90-106` | 6.5 | +| WASM-SEC-002 | Medium | No memory isolation between BMSSP instances | Proposed integration | 5.3 | +| WASM-SEC-003 | Medium | WASM linear memory shared across all graph operations | `simd.rs:14-46` | 4.8 | +| WASM-SEC-004 | Low | No memory page limit enforcement | All WASM modules | 3.7 | + +**WASM-SEC-001 Analysis:** + +The current FFI implementation uses `static mut INSTANCE` which is not thread-safe. While WASM itself is single-threaded, the proposed BMSSP integration adds complexity: + +``` +Risk Scenario: +1. JavaScript calls mincut_init() with graph A +2. Before completion, another call modifies INSTANCE for graph B +3. Graph A computation uses corrupted state +``` + +**Mitigation Required:** +```rust +// RECOMMENDED: Use RefCell or OnceCell for safer state management +use core::cell::RefCell; + +thread_local! { + static INSTANCE: RefCell> = RefCell::new(None); +} + +#[no_mangle] +pub extern "C" fn mincut_init(num_vertices: u16, num_edges: u16, strategy: u8) { + INSTANCE.with(|instance| { + let mut inst = instance.borrow_mut(); + // Safe state mutation + *inst = Some(AgenticMinCut::new()); + inst.as_mut().unwrap().init(num_vertices, num_edges, strategy.into()); + }) +} +``` + +### 1.2 BMSSP WASM Memory Model + +**Proposed BMSSP Integration (from ADR-002-addendum-bmssp-integration.md):** + +```typescript +class WasmGraph { + constructor(vertices: number, directed: boolean); + add_edge(from: number, to: number, weight: number): boolean; + compute_shortest_paths(source: number): Float64Array; + free(): void; +} +``` + +**Security Concerns:** + +1. **Memory Ownership Transfer:** `Float64Array` returned from `compute_shortest_paths` points to WASM linear memory. If the caller retains this reference after `free()`, use-after-free occurs. + +2. **Double-Free Vulnerability:** No mechanism to prevent multiple `free()` calls on the same instance. + +3. **Memory Leak Vector:** JavaScript garbage collection does not automatically call `free()` on WASM objects. + +**Recommended Pattern:** +```typescript +// SECURE: Wrap in managed object with destructor tracking +class SecureBmsspGraph implements Disposable { + private graph: WasmGraph | null; + private disposed = false; + + constructor(vertices: number, directed: boolean) { + this.graph = new WasmGraph(vertices, directed); + } + + [Symbol.dispose](): void { + if (!this.disposed && this.graph) { + this.graph.free(); + this.graph = null; + this.disposed = true; + } + } + + computeShortestPaths(source: number): Float64Array { + if (this.disposed) { + throw new Error('Graph already disposed'); + } + // Copy data out of WASM memory to prevent use-after-free + const wasmResult = this.graph!.compute_shortest_paths(source); + return Float64Array.from(wasmResult); + } +} +``` + +### 1.3 Buffer Overflow in FFI Boundary + +**SIMD Operations (simd.rs:13-46):** + +```rust +#[cfg(target_arch = "wasm32")] +#[inline] +pub fn simd_popcount(bits: &[u64; 4]) -> u32 { + unsafe { + // Load 128-bit chunks + let v0 = v128_load(bits.as_ptr() as *const v128); + let v1 = v128_load(bits.as_ptr().add(2) as *const v128); + // ... + } +} +``` + +**Analysis:** +- Fixed-size array `[u64; 4]` ensures bounds are compile-time verified +- No runtime validation needed for this pattern +- **Status: SECURE** + +**XOR Operation (simd.rs:56-75):** + +```rust +#[cfg(target_arch = "wasm32")] +pub fn simd_xor(a: &BitSet256, b: &BitSet256) -> BitSet256 { + unsafe { + let mut result = BitSet256::new(); + let a0 = v128_load(a.bits.as_ptr() as *const v128); + // Fixed-size struct, bounds guaranteed + // ... + } +} +``` + +**Analysis:** +- `BitSet256` has fixed `[u64; 4]` internal storage +- Pointer arithmetic is bounded by struct layout +- **Status: SECURE** + +--- + +## 2. Input Validation + +### 2.1 Vertex ID Bounds Checking + +**Current State (compact/mod.rs):** + +```rust +impl BitSet256 { + #[inline(always)] + pub fn insert(&mut self, v: CompactVertexId) { + let idx = (v / 64) as usize; + let bit = v % 64; + if idx < 4 { // BOUNDS CHECK PRESENT + self.bits[idx] |= 1u64 << bit; + } + } + + #[inline(always)] + pub fn contains(&self, v: CompactVertexId) -> bool { + let idx = (v / 64) as usize; + let bit = v % 64; + idx < 4 && (self.bits[idx] & (1u64 << bit)) != 0 // BOUNDS CHECK PRESENT + } +} +``` + +**Analysis:** BitSet256 properly validates vertex IDs against MAX_VERTICES_PER_CORE (256). + +**BMSSP Integration Gap:** + +| ID | Severity | Issue | Impact | +|----|----------|-------|--------| +| INPUT-001 | High | No validation for BMSSP vertex IDs exceeding u32::MAX | Integer overflow | +| INPUT-002 | Medium | Missing validation in `add_edge()` for self-loops | Algorithm correctness | +| INPUT-003 | Medium | No validation for duplicate edge insertion | Memory waste | +| INPUT-004 | Low | Vertex count mismatch between BMSSP and native | Incorrect results | + +**Required Validation for BMSSP Integration:** + +```rust +pub struct BmsspJTreeLevel { + wasm_graph: WasmGraph, + vertex_count: usize, + // Add validation bounds + max_vertex_id: u32, +} + +impl BmsspJTreeLevel { + pub fn add_edge(&mut self, src: u32, tgt: u32, weight: f64) -> Result<(), MinCutError> { + // Vertex bounds validation + if src >= self.max_vertex_id || tgt >= self.max_vertex_id { + return Err(MinCutError::InvalidVertex(src.max(tgt) as u64)); + } + + // Self-loop validation + if src == tgt { + return Err(MinCutError::InvalidEdge(src as u64, tgt as u64)); + } + + // Weight validation (see 2.2) + Self::validate_weight(weight)?; + + self.wasm_graph.add_edge(src, tgt, weight); + Ok(()) + } +} +``` + +### 2.2 Edge Weight Validation + +**Critical Floating-Point Cases:** + +| Value | Risk | Impact | +|-------|------|--------| +| `NaN` | Algorithm produces undefined results | Incorrect cuts | +| `Infinity` | Path computation never terminates or overflows | DoS | +| `-Infinity` | Negative cycle detection fails | Incorrect results | +| Negative weights | Bellman-Ford required; Dijkstra incorrect | Algorithm mismatch | +| Subnormal values | Performance degradation | Timing side-channel | +| Zero | Division by zero in some algorithms | Crash | + +**Required Validation:** + +```rust +impl BmsspJTreeLevel { + fn validate_weight(weight: f64) -> Result<(), MinCutError> { + // Check for NaN + if weight.is_nan() { + return Err(MinCutError::InvalidParameter( + "Edge weight cannot be NaN".to_string() + )); + } + + // Check for infinity + if weight.is_infinite() { + return Err(MinCutError::InvalidParameter( + "Edge weight cannot be infinite".to_string() + )); + } + + // Check for negative weights (BMSSP assumes non-negative) + if weight < 0.0 { + return Err(MinCutError::InvalidParameter( + format!("Edge weight {} must be non-negative", weight) + )); + } + + // Check for subnormal (optional, for performance) + if weight != 0.0 && weight.abs() < f64::MIN_POSITIVE { + // Normalize to zero or reject + return Err(MinCutError::InvalidParameter( + "Subnormal edge weights not supported".to_string() + )); + } + + Ok(()) + } +} +``` + +### 2.3 Graph Size Limits + +**Current Limits (compact/mod.rs):** + +```rust +pub const MAX_VERTICES_PER_CORE: usize = 256; +pub const MAX_EDGES_PER_CORE: usize = 384; +``` + +**BMSSP Proposed Limits (ADR-002-addendum):** + +| Metric | Value | Memory Impact | +|--------|-------|---------------| +| Max vertices (browser) | 100K | ~4MB per graph | +| Max vertices (Node.js) | 1M | ~40MB per graph | +| Max edges | Unbounded | Risk: OOM | + +**Recommended Limits:** + +```rust +pub struct BmsspConfig { + /// Maximum vertices allowed (default: 1M) + pub max_vertices: u32, + /// Maximum edges allowed (default: 10M) + pub max_edges: u32, + /// Maximum memory allocation in bytes (default: 100MB) + pub max_memory_bytes: usize, + /// Maximum path cache entries (default: 10K) + pub max_cache_entries: usize, +} + +impl Default for BmsspConfig { + fn default() -> Self { + Self { + max_vertices: 1_000_000, + max_edges: 10_000_000, + max_memory_bytes: 100 * 1024 * 1024, // 100MB + max_cache_entries: 10_000, + } + } +} +``` + +--- + +## 3. Resource Exhaustion + +### 3.1 Memory Limits for Large Graphs + +**Attack Vector:** + +```javascript +// Malicious input: Create graph with maximum vertices +const graph = new WasmGraph(0xFFFFFFFF, false); +// WASM memory allocation: 4GB * 8 bytes = 32GB +// Result: Browser/Node.js OOM crash +``` + +**Mitigation:** + +```rust +impl BmsspJTreeLevel { + pub fn new(vertex_count: usize, config: &BmsspConfig) -> Result { + // Memory estimation: vertices * sizeof(f64) * expected_edges_per_vertex + let estimated_memory = vertex_count + .checked_mul(8) // sizeof(f64) + .and_then(|v| v.checked_mul(10)) // avg 10 edges/vertex + .ok_or_else(|| MinCutError::CapacityExceeded( + "Memory estimation overflow".to_string() + ))?; + + if estimated_memory > config.max_memory_bytes { + return Err(MinCutError::CapacityExceeded( + format!("Estimated memory {}B exceeds limit {}B", + estimated_memory, config.max_memory_bytes) + )); + } + + if vertex_count > config.max_vertices as usize { + return Err(MinCutError::CapacityExceeded( + format!("Vertex count {} exceeds limit {}", + vertex_count, config.max_vertices) + )); + } + + // Proceed with allocation + Ok(Self { /* ... */ }) + } +} +``` + +### 3.2 CPU Time Limits for Pathological Inputs + +**Attack Vectors:** + +| Attack | Complexity | Example | +|--------|------------|---------| +| Dense complete graph | O(n^2 log n) | K_n with n=10K | +| Long chain graph | O(n^2) worst case | Linear path graph | +| Repeated queries same source | Cache miss flood | Source cycling | + +**Pathological Graph Examples:** + +``` +1. Complete Graph K_n: + - n=10,000 vertices + - 50M edges + - Single SSSP: ~500ms + - All-pairs: ~5000 seconds + +2. Adversarial Sparse Graph: + - Carefully constructed to maximize relaxation steps + - Can cause O(V*E) behavior in Dijkstra variants +``` + +**Mitigation - Timeout Mechanism:** + +```rust +use std::time::{Duration, Instant}; + +pub struct TimeLimitedBmssp { + inner: BmsspJTreeLevel, + timeout: Duration, +} + +impl TimeLimitedBmssp { + pub fn compute_shortest_paths(&self, source: u32) -> Result, MinCutError> { + let start = Instant::now(); + + // For WASM, we cannot interrupt mid-computation + // Instead, validate complexity before execution + let estimated_ops = self.estimate_operations(source); + let estimated_time = Duration::from_nanos(estimated_ops * 10); // ~10ns per op + + if estimated_time > self.timeout { + return Err(MinCutError::CapacityExceeded( + format!("Estimated time {:?} exceeds timeout {:?}", + estimated_time, self.timeout) + )); + } + + let result = self.inner.compute_shortest_paths(source); + + if start.elapsed() > self.timeout { + // Log warning for monitoring + tracing::warn!( + source = source, + elapsed = ?start.elapsed(), + timeout = ?self.timeout, + "BMSSP computation exceeded timeout" + ); + } + + Ok(result) + } + + fn estimate_operations(&self, _source: u32) -> u64 { + let n = self.inner.vertex_count as u64; + let m = self.inner.edge_count as u64; + // BMSSP complexity: O(m * log^(2/3) n) + let log_n = (n as f64).ln().max(1.0); + let log_factor = log_n.powf(2.0 / 3.0); + (m as f64 * log_factor) as u64 + } +} +``` + +### 3.3 Cache Size Bounds + +**Current Cache (from ADR):** + +```rust +pub struct BmsspJTreeLevel { + path_cache: HashMap<(VertexId, VertexId), f64>, + // ... +} +``` + +**Attack Vector:** +``` +1. Query all n*(n-1)/2 pairs +2. Cache grows to O(n^2) entries +3. For n=100K: 10B entries * 24 bytes = 240GB +``` + +**Mitigation - LRU Cache with Bounded Size:** + +```rust +use lru::LruCache; +use std::num::NonZeroUsize; + +pub struct BmsspJTreeLevel { + path_cache: LruCache<(VertexId, VertexId), f64>, + cache_hits: u64, + cache_misses: u64, +} + +impl BmsspJTreeLevel { + pub fn new(config: &BmsspConfig) -> Self { + let cache_capacity = NonZeroUsize::new(config.max_cache_entries) + .unwrap_or(NonZeroUsize::new(10_000).unwrap()); + + Self { + path_cache: LruCache::new(cache_capacity), + cache_hits: 0, + cache_misses: 0, + } + } + + pub fn min_cut(&mut self, s: VertexId, t: VertexId) -> f64 { + // Normalize key for undirected graphs + let key = if s <= t { (s, t) } else { (t, s) }; + + if let Some(&cached) = self.path_cache.get(&key) { + self.cache_hits += 1; + return cached; + } + + self.cache_misses += 1; + + // Compute and cache + let distances = self.wasm_graph.compute_shortest_paths(s as u32); + let cut_value = distances[t as usize]; + + self.path_cache.put(key, cut_value); + + cut_value + } +} +``` + +--- + +## 4. Supply Chain Security + +### 4.1 @ruvnet/bmssp Package Integrity + +**Package Analysis:** + +| Attribute | Value | Risk | +|-----------|-------|------| +| npm package | `@ruvnet/bmssp` | Scoped package (trusted author) | +| Source repository | https://github.com/ruvnet/bmssp | Verify ownership | +| WASM binary size | 27KB | Small attack surface | +| Dependencies | None (standalone WASM) | Low transitive risk | + +**Verification Steps Required:** + +```bash +# 1. Verify package signature (if using npm provenance) +npm audit signatures @ruvnet/bmssp + +# 2. Verify WASM binary hash +sha256sum node_modules/@ruvnet/bmssp/bmssp.wasm +# Expected: [document expected hash in SECURITY.md] + +# 3. Verify source matches binary +cd node_modules/@ruvnet/bmssp +wasm-decompile bmssp.wasm > decompiled.wat +# Compare with reference build +``` + +**Package Lock Recommendation:** + +```json +// package.json +{ + "dependencies": { + "@ruvnet/bmssp": "1.0.0" + }, + "overrides": { + "@ruvnet/bmssp": "$@ruvnet/bmssp" + } +} + +// .npmrc +package-lock=true +save-exact=true +``` + +### 4.2 Known Vulnerabilities Check + +| ID | Severity | Issue | Status | +|----|----------|-------|--------| +| SUPPLY-001 | High | No SBOM (Software Bill of Materials) | Action Required | +| SUPPLY-002 | Medium | WASM binary not reproducibly built | Action Required | +| SUPPLY-003 | Low | No npm provenance attestation | Recommended | +| SUPPLY-004 | Info | No security.txt in package | Informational | + +**Required Actions:** + +1. **Generate SBOM:** +```bash +# Using syft +syft dir:node_modules/@ruvnet/bmssp -o spdx-json > bmssp-sbom.json +``` + +2. **Verify Reproducible Build:** +```bash +# Clone source +git clone https://github.com/ruvnet/bmssp.git +cd bmssp + +# Build with deterministic settings +RUSTFLAGS="-C lto=thin" wasm-pack build --release + +# Compare hash +sha256sum pkg/bmssp_bg.wasm +``` + +### 4.3 WASM Binary Verification + +**Runtime Verification:** + +```typescript +import { createHash } from 'crypto'; + +const EXPECTED_WASM_HASH = 'sha256:abc123...'; // Document this + +async function verifyBmsspWasm(): Promise { + const wasmBytes = await fetch('/node_modules/@ruvnet/bmssp/bmssp.wasm') + .then(r => r.arrayBuffer()); + + const hash = createHash('sha256') + .update(new Uint8Array(wasmBytes)) + .digest('hex'); + + const expected = EXPECTED_WASM_HASH.replace('sha256:', ''); + + if (hash !== expected) { + console.error(`BMSSP WASM hash mismatch! + Expected: ${expected} + Got: ${hash}`); + return false; + } + + return true; +} + +// Call before initializing BMSSP +if (!await verifyBmsspWasm()) { + throw new Error('BMSSP WASM integrity check failed'); +} +``` + +--- + +## 5. Error Handling + +### 5.1 Panic Safety Across FFI Boundary + +**Current Panic Handling (lib.rs:116):** + +```rust +#![cfg_attr(not(feature = "wasm"), deny(unsafe_code))] +``` + +**Issue:** Panics in WASM are not handled - they become WASM traps that JavaScript cannot catch gracefully. + +**Current FFI Error Handling (agentic.rs:148-149):** + +```rust +#[no_mangle] +pub extern "C" fn mincut_get_result() -> u16 { + unsafe { INSTANCE.as_ref().map(|i| i.min_cut()).unwrap_or(u16::MAX) } +} +``` + +**Analysis:** +- Uses `unwrap_or` for graceful degradation (good) +- Returns sentinel value (u16::MAX) on error +- No panic path in this function + +**Identified Issues:** + +| ID | Severity | Issue | Location | +|----|----------|-------|----------| +| ERR-001 | Medium | Panics in test code can propagate | `paper_impl.rs:613, 654` | +| ERR-002 | Medium | No structured error return from FFI | All FFI functions | +| ERR-003 | Low | Error messages may leak internal state | Error formatting | + +**Recommended Error Handling Pattern:** + +```rust +/// Error codes for FFI boundary +#[repr(u8)] +pub enum BmsspErrorCode { + Success = 0, + InvalidVertex = 1, + InvalidWeight = 2, + OutOfMemory = 3, + Timeout = 4, + InternalError = 255, +} + +/// Result structure for FFI +#[repr(C)] +pub struct BmsspResult { + pub error_code: u8, + pub result: u16, +} + +#[no_mangle] +pub extern "C" fn mincut_compute(s: u16, t: u16) -> BmsspResult { + // Use catch_unwind to prevent panics crossing FFI + let result = std::panic::catch_unwind(|| { + unsafe { + INSTANCE.as_mut() + .map(|i| i.min_cut(s as usize, t as usize)) + .unwrap_or(u16::MAX) + } + }); + + match result { + Ok(value) => BmsspResult { + error_code: BmsspErrorCode::Success as u8, + result: value, + }, + Err(_) => BmsspResult { + error_code: BmsspErrorCode::InternalError as u8, + result: u16::MAX, + }, + } +} +``` + +### 5.2 Graceful Degradation on WASM Failure + +**Fallback Strategy:** + +```rust +/// Hybrid cut computation with fallback +pub struct HybridMinCut { + /// Primary: BMSSP WASM acceleration + bmssp: Option, + /// Fallback: Native Rust implementation + native: SubpolynomialMinCut, + /// Failure count for circuit breaker + wasm_failures: AtomicU32, + /// Circuit breaker threshold + failure_threshold: u32, +} + +impl HybridMinCut { + pub fn min_cut(&mut self, s: VertexId, t: VertexId) -> CutResult { + // Check circuit breaker + if self.wasm_failures.load(Ordering::Relaxed) >= self.failure_threshold { + return self.native_fallback(s, t); + } + + // Try WASM first + if let Some(ref mut bmssp) = self.bmssp { + match bmssp.try_min_cut(s, t) { + Ok(result) => { + // Reset failure count on success + self.wasm_failures.store(0, Ordering::Relaxed); + return result; + } + Err(e) => { + // Increment failure count + self.wasm_failures.fetch_add(1, Ordering::Relaxed); + tracing::warn!(error = ?e, "BMSSP failed, using fallback"); + } + } + } + + self.native_fallback(s, t) + } + + fn native_fallback(&self, s: VertexId, t: VertexId) -> CutResult { + CutResult::exact(self.native.min_cut_between(s, t)) + } +} +``` + +### 5.3 Information Leakage in Error Messages + +**Current Error Types (error.rs):** + +```rust +#[derive(Error, Debug)] +pub enum MinCutError { + #[error("Invalid vertex ID: {0}")] + InvalidVertex(u64), + // Exposes internal vertex ID representation + + #[error("Internal algorithm error: {0}")] + InternalError(String), + // May expose internal state via string +} +``` + +**Recommended Sanitization:** + +```rust +impl MinCutError { + /// Return user-safe error message without internal details + pub fn user_message(&self) -> &'static str { + match self { + MinCutError::EmptyGraph => "Graph is empty", + MinCutError::InvalidVertex(_) => "Invalid vertex identifier", + MinCutError::InvalidEdge(_, _) => "Invalid edge specification", + MinCutError::DisconnectedGraph => "Graph is not connected", + MinCutError::CutSizeExceeded(_, _) => "Result exceeds supported size", + MinCutError::InvalidEpsilon(_) => "Invalid approximation parameter", + MinCutError::InvalidParameter(_) => "Invalid parameter value", + MinCutError::CallbackError(_) => "Callback execution failed", + MinCutError::InternalError(_) => "Internal error occurred", + MinCutError::ConcurrentModification => "Concurrent modification detected", + MinCutError::CapacityExceeded(_) => "Capacity limit exceeded", + MinCutError::SerializationError(_) => "Data serialization failed", + } + } +} + +// For FFI: return only opaque error codes +#[no_mangle] +pub extern "C" fn mincut_get_last_error() -> u8 { + // Return error code, not detailed message + thread_local! { + static LAST_ERROR: Cell = Cell::new(0); + } + LAST_ERROR.with(|e| e.get()) +} +``` + +--- + +## 6. Cryptographic Considerations + +### 6.1 Random Number Generation for Sampling + +**Current RNG Usage (snn/attractor.rs:440-442):** + +```rust +let mut rng_state = seed.wrapping_add(0x9e3779b97f4a7c15); +// ... +rng_state = rng_state.wrapping_mul(0x5851f42d4c957f2d).wrapping_add(1); +``` + +**Analysis:** +- Uses simple LCG (Linear Congruential Generator) +- Constants from SplitMix64 +- **Not cryptographically secure** (by design - for performance) + +**BMSSP Sampling Requirements:** + +| Use Case | CSPRNG Required | Rationale | +|----------|-----------------|-----------| +| Vertex sampling for testing | No | Reproducibility more important | +| Random pivot selection | No | Any distribution works | +| Cryptographic commitments | Yes | Must be unpredictable | +| Audit trail generation | Yes | Prevent manipulation | + +**Recommendation:** + +```rust +/// RNG wrapper with appropriate strength for use case +pub enum BmsspRng { + /// Fast, reproducible (default for graph algorithms) + Fast(FastRng), + /// Cryptographically secure (for audit/security features) + Secure(SecureRng), +} + +impl BmsspRng { + /// Use fast RNG for graph algorithm internals + pub fn for_algorithm() -> Self { + BmsspRng::Fast(FastRng::from_seed([0u8; 8])) + } + + /// Use secure RNG for audit trail + pub fn for_audit() -> Self { + BmsspRng::Secure(SecureRng::new()) + } +} +``` + +### 6.2 Determinism Requirements + +**J-Tree Algorithm Determinism:** + +| Operation | Must be Deterministic | Rationale | +|-----------|----------------------|-----------| +| Shortest path computation | Yes | Reproducible results | +| Cache key generation | Yes | Consistent lookups | +| Witness generation | Yes | Verifiable proofs | +| Performance sampling | No | Statistical validity | + +**Ensuring Determinism:** + +```rust +impl BmsspJTreeLevel { + /// Compute shortest paths with deterministic tie-breaking + pub fn compute_shortest_paths_deterministic(&self, source: u32) -> Vec { + // BMSSP uses Dijkstra variant - inherently deterministic + // for same input graph and source + + // Ensure vertex iteration order is deterministic + let result = self.wasm_graph.compute_shortest_paths(source); + + // Verify determinism in debug builds + #[cfg(debug_assertions)] + { + let result2 = self.wasm_graph.compute_shortest_paths(source); + assert_eq!(result, result2, "Non-deterministic shortest path computation"); + } + + result + } +} +``` + +--- + +## 7. Recommended Mitigations Summary + +### 7.1 Immediate Actions (P0 - Before Integration) + +| ID | Action | Effort | Impact | +|----|--------|--------|--------| +| P0-1 | Add vertex ID bounds validation | 2 hours | High | +| P0-2 | Add edge weight validation (NaN, Inf, negative) | 2 hours | High | +| P0-3 | Implement memory allocation limits | 4 hours | High | +| P0-4 | Document expected WASM binary hash | 1 hour | Medium | +| P0-5 | Add `catch_unwind` to FFI functions | 4 hours | Medium | + +### 7.2 Short-Term Actions (P1 - First Release) + +| ID | Action | Effort | Impact | +|----|--------|--------|--------| +| P1-1 | Implement LRU cache with bounded size | 4 hours | Medium | +| P1-2 | Add timeout estimation for operations | 8 hours | Medium | +| P1-3 | Create SBOM for BMSSP package | 2 hours | Low | +| P1-4 | Implement circuit breaker for WASM failures | 4 hours | Medium | +| P1-5 | Add memory ownership wrapper for JavaScript | 4 hours | Medium | + +### 7.3 Long-Term Actions (P2 - Future Releases) + +| ID | Action | Effort | Impact | +|----|--------|--------|--------| +| P2-1 | Implement reproducible WASM build verification | 1 week | Medium | +| P2-2 | Add fuzzing targets for BMSSP integration | 1 week | Medium | +| P2-3 | Consider WASM Component Model migration | 2 weeks | Low | +| P2-4 | Implement comprehensive audit logging | 1 week | Low | + +--- + +## 8. Code Changes Required + +### 8.1 New File: `src/wasm/bmssp_security.rs` + +```rust +//! Security wrappers for BMSSP WASM integration +//! +//! Provides input validation, resource limits, and error handling +//! for safe BMSSP integration. + +use crate::error::{MinCutError, Result}; +use std::time::{Duration, Instant}; + +/// Security configuration for BMSSP integration +#[derive(Debug, Clone)] +pub struct BmsspSecurityConfig { + /// Maximum vertices allowed + pub max_vertices: u32, + /// Maximum edges allowed + pub max_edges: u32, + /// Maximum memory in bytes + pub max_memory_bytes: usize, + /// Maximum cache entries + pub max_cache_entries: usize, + /// Operation timeout + pub timeout: Duration, + /// Enable WASM binary verification + pub verify_wasm_hash: bool, + /// Expected WASM binary hash (SHA-256) + pub expected_wasm_hash: Option, +} + +impl Default for BmsspSecurityConfig { + fn default() -> Self { + Self { + max_vertices: 1_000_000, + max_edges: 10_000_000, + max_memory_bytes: 100 * 1024 * 1024, + max_cache_entries: 10_000, + timeout: Duration::from_secs(30), + verify_wasm_hash: true, + expected_wasm_hash: None, + } + } +} + +/// Validate edge weight for BMSSP compatibility +pub fn validate_edge_weight(weight: f64) -> Result<()> { + if weight.is_nan() { + return Err(MinCutError::InvalidParameter( + "Edge weight cannot be NaN".into() + )); + } + if weight.is_infinite() { + return Err(MinCutError::InvalidParameter( + "Edge weight cannot be infinite".into() + )); + } + if weight < 0.0 { + return Err(MinCutError::InvalidParameter( + "Edge weight must be non-negative".into() + )); + } + Ok(()) +} + +/// Validate vertex ID is within bounds +pub fn validate_vertex_id(vertex: u32, max_vertices: u32) -> Result<()> { + if vertex >= max_vertices { + return Err(MinCutError::InvalidVertex(vertex as u64)); + } + Ok(()) +} + +/// Estimate memory usage for graph +pub fn estimate_memory_usage(vertices: usize, edges: usize) -> usize { + // Vertex array: vertices * sizeof(f64) + let vertex_memory = vertices.saturating_mul(8); + // Edge list: edges * (2 * sizeof(u32) + sizeof(f64)) + let edge_memory = edges.saturating_mul(16); + // Cache overhead estimate + let cache_overhead = vertices.saturating_mul(24); + + vertex_memory + .saturating_add(edge_memory) + .saturating_add(cache_overhead) +} +``` + +### 8.2 Updates to `src/wasm/mod.rs` + +```rust +//! WASM bindings and optimizations for agentic chip +//! +//! Provides: +//! - SIMD-accelerated boundary computation +//! - Agentic chip interface +//! - Inter-core messaging +//! - BMSSP security wrappers (new) + +pub mod agentic; +pub mod simd; +pub mod bmssp_security; // Add this line + +pub use agentic::*; +pub use simd::*; +pub use bmssp_security::*; // Add this line +``` + +--- + +## 9. Testing Requirements + +### 9.1 Security Test Cases + +```rust +#[cfg(test)] +mod security_tests { + use super::*; + + #[test] + fn test_nan_weight_rejected() { + let result = validate_edge_weight(f64::NAN); + assert!(result.is_err()); + } + + #[test] + fn test_infinity_weight_rejected() { + let result = validate_edge_weight(f64::INFINITY); + assert!(result.is_err()); + } + + #[test] + fn test_negative_weight_rejected() { + let result = validate_edge_weight(-1.0); + assert!(result.is_err()); + } + + #[test] + fn test_vertex_bounds_check() { + let result = validate_vertex_id(100, 50); + assert!(result.is_err()); + } + + #[test] + fn test_memory_estimation_overflow() { + let mem = estimate_memory_usage(usize::MAX, usize::MAX); + // Should not panic, should saturate + assert!(mem <= usize::MAX); + } +} +``` + +### 9.2 Fuzzing Targets + +```rust +// fuzz/fuzz_targets/bmssp_input.rs +#![no_main] +use libfuzzer_sys::fuzz_target; + +fuzz_target!(|data: &[u8]| { + if data.len() < 16 { return; } + + // Parse fuzzer input + let vertex_count = u32::from_le_bytes(data[0..4].try_into().unwrap()); + let edge_count = u32::from_le_bytes(data[4..8].try_into().unwrap()); + + // Validate with security checks + let config = BmsspSecurityConfig::default(); + let _ = validate_vertex_id(vertex_count, config.max_vertices); + let _ = estimate_memory_usage(vertex_count as usize, edge_count as usize); +}); +``` + +--- + +## 10. Verification Checklist + +### Pre-Integration Checklist + +- [ ] All P0 mitigations implemented +- [ ] WASM binary hash documented +- [ ] Input validation tests passing +- [ ] Memory limit tests passing +- [ ] Panic safety verified with `catch_unwind` +- [ ] No `unwrap()` in FFI code +- [ ] Error codes documented for JavaScript consumers +- [ ] SBOM generated for BMSSP package + +### Pre-Production Checklist + +- [ ] P1 mitigations implemented +- [ ] Fuzzing targets created and run for 24+ hours +- [ ] Circuit breaker tested under failure conditions +- [ ] Memory leak tests passing (long-running) +- [ ] Timeout mechanism validated +- [ ] Security review by second party +- [ ] Penetration testing completed + +--- + +## 11. Conclusion + +The proposed BMSSP WASM integration offers significant performance benefits for j-tree operations but introduces several security considerations that require mitigation: + +**Primary Concerns:** +1. FFI boundary safety with static mutable state +2. Input validation gaps for vertex IDs and edge weights +3. Resource exhaustion vectors through unbounded allocation +4. Supply chain risks from external WASM dependency + +**Recommended Approach:** +1. Implement all P0 mitigations before initial integration +2. Use defense-in-depth with validation at multiple layers +3. Maintain native Rust fallback for graceful degradation +4. Establish ongoing monitoring and circuit breaker patterns + +**Overall Assessment:** The integration is viable with the recommended security mitigations in place. The performance benefits (10-15x speedup) justify the additional security engineering investment. + +--- + +## Appendix A: Security Review Sign-Off + +| Role | Name | Date | Signature | +|------|------|------|-----------| +| Security Architect | ___________________ | ________ | ________ | +| Lead Developer | ___________________ | ________ | ________ | +| QA Lead | ___________________ | ________ | ________ | + +## Appendix B: References + +1. ADR-002: Dynamic Hierarchical j-Tree Decomposition +2. ADR-002-addendum-bmssp-integration: BMSSP WASM Integration Proposal +3. BMSSP Paper: "Breaking the Sorting Barrier for SSSP" (arXiv:2501.00660) +4. npm package: https://www.npmjs.com/package/@ruvnet/bmssp +5. RuVector Security Audit Report (2026-01-18) +6. OWASP WASM Security Guidelines +7. Rust FFI Safety Guidelines + +--- + +*This security review was conducted as part of the ADR-002-addendum-bmssp-integration proposal review process.* diff --git a/crates/ruvector-mincut/src/optimization/simd_distance.rs b/crates/ruvector-mincut/src/optimization/simd_distance.rs new file mode 100644 index 000000000..2154f2bf3 --- /dev/null +++ b/crates/ruvector-mincut/src/optimization/simd_distance.rs @@ -0,0 +1,550 @@ +//! SIMD-Optimized Distance Array Operations +//! +//! Provides vectorized operations for distance arrays: +//! - Parallel min/max finding +//! - Batch distance updates +//! - Vector comparisons +//! +//! Uses WASM SIMD128 when available, falls back to scalar. + +use crate::graph::VertexId; + +#[cfg(target_arch = "wasm32")] +use core::arch::wasm32::*; + +/// Alignment for SIMD operations (64 bytes for AVX-512 compatibility) +pub const SIMD_ALIGNMENT: usize = 64; + +/// Number of f64 elements per SIMD operation +pub const SIMD_LANES: usize = 4; // 256-bit = 4 x f64 + +/// Aligned distance array for SIMD operations +#[repr(C, align(64))] +pub struct DistanceArray { + /// Raw distance values + data: Vec, + /// Number of vertices + len: usize, +} + +impl DistanceArray { + /// Create new distance array initialized to infinity + pub fn new(size: usize) -> Self { + Self { + data: vec![f64::INFINITY; size], + len: size, + } + } + + /// Create from slice + pub fn from_slice(slice: &[f64]) -> Self { + Self { + data: slice.to_vec(), + len: slice.len(), + } + } + + /// Get distance for vertex + #[inline] + pub fn get(&self, v: VertexId) -> f64 { + self.data.get(v as usize).copied().unwrap_or(f64::INFINITY) + } + + /// Set distance for vertex + #[inline] + pub fn set(&mut self, v: VertexId, distance: f64) { + if (v as usize) < self.len { + self.data[v as usize] = distance; + } + } + + /// Get number of elements + pub fn len(&self) -> usize { + self.len + } + + /// Check if empty + pub fn is_empty(&self) -> bool { + self.len == 0 + } + + /// Reset all distances to infinity + pub fn reset(&mut self) { + for d in &mut self.data { + *d = f64::INFINITY; + } + } + + /// Get raw slice + pub fn as_slice(&self) -> &[f64] { + &self.data + } + + /// Get mutable slice + pub fn as_mut_slice(&mut self) -> &mut [f64] { + &mut self.data + } +} + +/// SIMD-optimized distance operations +pub struct SimdDistanceOps; + +impl SimdDistanceOps { + /// Find minimum distance and its index using SIMD + /// + /// Returns (min_distance, min_index) + #[cfg(target_arch = "wasm32")] + pub fn find_min(distances: &DistanceArray) -> (f64, usize) { + let data = distances.as_slice(); + if data.is_empty() { + return (f64::INFINITY, 0); + } + + let mut min_val = f64::INFINITY; + let mut min_idx = 0; + + // Process in chunks of 2 (WASM SIMD has 128-bit = 2 x f64) + let chunks = data.len() / 2; + + unsafe { + for i in 0..chunks { + let offset = i * 2; + let v = v128_load(data.as_ptr().add(offset) as *const v128); + + let a = f64x2_extract_lane::<0>(v); + let b = f64x2_extract_lane::<1>(v); + + if a < min_val { + min_val = a; + min_idx = offset; + } + if b < min_val { + min_val = b; + min_idx = offset + 1; + } + } + } + + // Handle remainder + for i in (chunks * 2)..data.len() { + if data[i] < min_val { + min_val = data[i]; + min_idx = i; + } + } + + (min_val, min_idx) + } + + /// Find minimum distance and its index (scalar fallback) + #[cfg(not(target_arch = "wasm32"))] + pub fn find_min(distances: &DistanceArray) -> (f64, usize) { + let data = distances.as_slice(); + if data.is_empty() { + return (f64::INFINITY, 0); + } + + let mut min_val = f64::INFINITY; + let mut min_idx = 0; + + // Unrolled loop for better ILP + let chunks = data.len() / 4; + for i in 0..chunks { + let base = i * 4; + let a = data[base]; + let b = data[base + 1]; + let c = data[base + 2]; + let d = data[base + 3]; + + if a < min_val { min_val = a; min_idx = base; } + if b < min_val { min_val = b; min_idx = base + 1; } + if c < min_val { min_val = c; min_idx = base + 2; } + if d < min_val { min_val = d; min_idx = base + 3; } + } + + // Handle remainder + for i in (chunks * 4)..data.len() { + if data[i] < min_val { + min_val = data[i]; + min_idx = i; + } + } + + (min_val, min_idx) + } + + /// Batch update: dist[i] = min(dist[i], dist[source] + weight[i]) + /// + /// This is the core Dijkstra relaxation operation + #[cfg(target_arch = "wasm32")] + pub fn relax_batch( + distances: &mut DistanceArray, + source_dist: f64, + neighbors: &[(VertexId, f64)], // (neighbor_id, edge_weight) + ) -> usize { + let mut updated = 0; + let data = distances.as_mut_slice(); + + unsafe { + let source_v = f64x2_splat(source_dist); + + // Process pairs + let pairs = neighbors.len() / 2; + for i in 0..pairs { + let idx0 = neighbors[i * 2].0 as usize; + let idx1 = neighbors[i * 2 + 1].0 as usize; + let w0 = neighbors[i * 2].1; + let w1 = neighbors[i * 2 + 1].1; + + if idx0 < data.len() && idx1 < data.len() { + let weights = f64x2(w0, w1); + let new_dist = f64x2_add(source_v, weights); + + let old0 = data[idx0]; + let old1 = data[idx1]; + + let new0 = f64x2_extract_lane::<0>(new_dist); + let new1 = f64x2_extract_lane::<1>(new_dist); + + if new0 < old0 { + data[idx0] = new0; + updated += 1; + } + if new1 < old1 { + data[idx1] = new1; + updated += 1; + } + } + } + } + + // Handle odd remainder + if neighbors.len() % 2 == 1 { + let (idx, weight) = neighbors[neighbors.len() - 1]; + let idx = idx as usize; + if idx < data.len() { + let new_dist = source_dist + weight; + if new_dist < data[idx] { + data[idx] = new_dist; + updated += 1; + } + } + } + + updated + } + + /// Batch update (scalar fallback) + #[cfg(not(target_arch = "wasm32"))] + pub fn relax_batch( + distances: &mut DistanceArray, + source_dist: f64, + neighbors: &[(VertexId, f64)], + ) -> usize { + let mut updated = 0; + let data = distances.as_mut_slice(); + + // Process in chunks of 4 for better ILP + let chunks = neighbors.len() / 4; + + for i in 0..chunks { + let base = i * 4; + + let (idx0, w0) = neighbors[base]; + let (idx1, w1) = neighbors[base + 1]; + let (idx2, w2) = neighbors[base + 2]; + let (idx3, w3) = neighbors[base + 3]; + + let new0 = source_dist + w0; + let new1 = source_dist + w1; + let new2 = source_dist + w2; + let new3 = source_dist + w3; + + let idx0 = idx0 as usize; + let idx1 = idx1 as usize; + let idx2 = idx2 as usize; + let idx3 = idx3 as usize; + + if idx0 < data.len() && new0 < data[idx0] { + data[idx0] = new0; + updated += 1; + } + if idx1 < data.len() && new1 < data[idx1] { + data[idx1] = new1; + updated += 1; + } + if idx2 < data.len() && new2 < data[idx2] { + data[idx2] = new2; + updated += 1; + } + if idx3 < data.len() && new3 < data[idx3] { + data[idx3] = new3; + updated += 1; + } + } + + // Handle remainder + for i in (chunks * 4)..neighbors.len() { + let (idx, weight) = neighbors[i]; + let idx = idx as usize; + if idx < data.len() { + let new_dist = source_dist + weight; + if new_dist < data[idx] { + data[idx] = new_dist; + updated += 1; + } + } + } + + updated + } + + /// Count vertices with distance less than threshold + #[cfg(target_arch = "wasm32")] + pub fn count_below_threshold(distances: &DistanceArray, threshold: f64) -> usize { + let data = distances.as_slice(); + let mut count = 0; + + unsafe { + let thresh_v = f64x2_splat(threshold); + + let chunks = data.len() / 2; + for i in 0..chunks { + let offset = i * 2; + let v = v128_load(data.as_ptr().add(offset) as *const v128); + let cmp = f64x2_lt(v, thresh_v); + + // Extract comparison results + let mask = i8x16_bitmask(cmp); + // Each f64 lane uses 8 bits in bitmask + if mask & 0xFF != 0 { count += 1; } + if mask & 0xFF00 != 0 { count += 1; } + } + } + + // Handle remainder + for i in (data.len() / 2 * 2)..data.len() { + if data[i] < threshold { + count += 1; + } + } + + count + } + + /// Count vertices with distance less than threshold (scalar fallback) + #[cfg(not(target_arch = "wasm32"))] + pub fn count_below_threshold(distances: &DistanceArray, threshold: f64) -> usize { + distances.as_slice().iter().filter(|&&d| d < threshold).count() + } + + /// Compute sum of distances (for average) + pub fn sum_finite(distances: &DistanceArray) -> (f64, usize) { + let mut sum = 0.0; + let mut count = 0; + + for &d in distances.as_slice() { + if d.is_finite() { + sum += d; + count += 1; + } + } + + (sum, count) + } + + /// Element-wise minimum of two distance arrays + pub fn elementwise_min(a: &DistanceArray, b: &DistanceArray) -> DistanceArray { + let len = a.len().min(b.len()); + let mut result = DistanceArray::new(len); + + let a_data = a.as_slice(); + let b_data = b.as_slice(); + let r_data = result.as_mut_slice(); + + // Unrolled loop + let chunks = len / 4; + for i in 0..chunks { + let base = i * 4; + r_data[base] = a_data[base].min(b_data[base]); + r_data[base + 1] = a_data[base + 1].min(b_data[base + 1]); + r_data[base + 2] = a_data[base + 2].min(b_data[base + 2]); + r_data[base + 3] = a_data[base + 3].min(b_data[base + 3]); + } + + for i in (chunks * 4)..len { + r_data[i] = a_data[i].min(b_data[i]); + } + + result + } + + /// Scale all distances by a factor + pub fn scale(distances: &mut DistanceArray, factor: f64) { + for d in distances.as_mut_slice() { + if d.is_finite() { + *d *= factor; + } + } + } +} + +/// Priority queue entry for Dijkstra with SIMD-friendly layout +#[repr(C)] +#[derive(Debug, Clone, Copy)] +pub struct PriorityEntry { + /// Distance (key) + pub distance: f64, + /// Vertex ID + pub vertex: VertexId, +} + +impl PriorityEntry { + pub fn new(distance: f64, vertex: VertexId) -> Self { + Self { distance, vertex } + } +} + +impl PartialEq for PriorityEntry { + fn eq(&self, other: &Self) -> bool { + self.distance == other.distance && self.vertex == other.vertex + } +} + +impl Eq for PriorityEntry {} + +impl PartialOrd for PriorityEntry { + fn partial_cmp(&self, other: &Self) -> Option { + // Reverse order for min-heap + other.distance.partial_cmp(&self.distance) + } +} + +impl Ord for PriorityEntry { + fn cmp(&self, other: &Self) -> std::cmp::Ordering { + self.partial_cmp(other).unwrap_or(std::cmp::Ordering::Equal) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_distance_array_basic() { + let mut arr = DistanceArray::new(10); + + arr.set(0, 1.0); + arr.set(5, 5.0); + + assert_eq!(arr.get(0), 1.0); + assert_eq!(arr.get(5), 5.0); + assert_eq!(arr.get(9), f64::INFINITY); + } + + #[test] + fn test_find_min() { + let mut arr = DistanceArray::new(100); + + arr.set(50, 1.0); + arr.set(25, 0.5); + arr.set(75, 2.0); + + let (min_val, min_idx) = SimdDistanceOps::find_min(&arr); + + assert_eq!(min_val, 0.5); + assert_eq!(min_idx, 25); + } + + #[test] + fn test_find_min_empty() { + let arr = DistanceArray::new(0); + let (min_val, _) = SimdDistanceOps::find_min(&arr); + assert!(min_val.is_infinite()); + } + + #[test] + fn test_relax_batch() { + let mut arr = DistanceArray::new(10); + arr.set(0, 0.0); // Source + + let neighbors = vec![ + (1, 1.0), + (2, 2.0), + (3, 3.0), + (4, 4.0), + ]; + + let updated = SimdDistanceOps::relax_batch(&mut arr, 0.0, &neighbors); + + assert_eq!(updated, 4); + assert_eq!(arr.get(1), 1.0); + assert_eq!(arr.get(2), 2.0); + assert_eq!(arr.get(3), 3.0); + assert_eq!(arr.get(4), 4.0); + } + + #[test] + fn test_relax_batch_no_update() { + let mut arr = DistanceArray::from_slice(&[0.0, 0.5, 1.0, 1.5, 2.0]); + + let neighbors = vec![ + (1, 2.0), // New dist = 0 + 2.0 = 2.0 > 0.5 + (2, 3.0), // New dist = 0 + 3.0 = 3.0 > 1.0 + ]; + + let updated = SimdDistanceOps::relax_batch(&mut arr, 0.0, &neighbors); + + assert_eq!(updated, 0); // No updates, existing distances are better + } + + #[test] + fn test_count_below_threshold() { + let arr = DistanceArray::from_slice(&[0.0, 0.5, 1.0, 1.5, 2.0, f64::INFINITY]); + + assert_eq!(SimdDistanceOps::count_below_threshold(&arr, 1.0), 2); + assert_eq!(SimdDistanceOps::count_below_threshold(&arr, 2.0), 4); + assert_eq!(SimdDistanceOps::count_below_threshold(&arr, 10.0), 5); + } + + #[test] + fn test_sum_finite() { + let arr = DistanceArray::from_slice(&[1.0, 2.0, 3.0, f64::INFINITY, f64::INFINITY]); + + let (sum, count) = SimdDistanceOps::sum_finite(&arr); + + assert_eq!(sum, 6.0); + assert_eq!(count, 3); + } + + #[test] + fn test_elementwise_min() { + let a = DistanceArray::from_slice(&[1.0, 5.0, 3.0, 7.0]); + let b = DistanceArray::from_slice(&[2.0, 4.0, 6.0, 1.0]); + + let result = SimdDistanceOps::elementwise_min(&a, &b); + + assert_eq!(result.as_slice(), &[1.0, 4.0, 3.0, 1.0]); + } + + #[test] + fn test_scale() { + let mut arr = DistanceArray::from_slice(&[1.0, 2.0, f64::INFINITY, 4.0]); + + SimdDistanceOps::scale(&mut arr, 2.0); + + assert_eq!(arr.get(0), 2.0); + assert_eq!(arr.get(1), 4.0); + assert!(arr.get(2).is_infinite()); + assert_eq!(arr.get(3), 8.0); + } + + #[test] + fn test_priority_entry_ordering() { + let a = PriorityEntry::new(1.0, 1); + let b = PriorityEntry::new(2.0, 2); + + // Min-heap ordering: smaller distance is "greater" + assert!(a > b); + } +} diff --git a/crates/ruvector-mincut/tests/jtree_tests.rs b/crates/ruvector-mincut/tests/jtree_tests.rs new file mode 100644 index 000000000..e9f138813 --- /dev/null +++ b/crates/ruvector-mincut/tests/jtree_tests.rs @@ -0,0 +1,1338 @@ +//! Comprehensive tests for j-Tree hierarchical decomposition. +//! +//! Tests the correctness of: +//! - LazyLevel state machine (Unmaterialized -> Materialized -> Dirty) +//! - BmsspJTreeLevel cut queries and caching +//! - LazyJTreeHierarchy demand-paging and hierarchy consistency +//! - TwoTierCoordinator approximate/exact escalation +//! +//! Based on ADR-002: Dynamic Hierarchical j-Tree Decomposition +//! and its addendums for SOTA optimizations and BMSSP integration. + +#![cfg(feature = "jtree")] + +use std::collections::{HashMap, HashSet}; +use std::sync::atomic::{AtomicUsize, Ordering}; +use std::sync::Arc; + +// ============================================================================ +// Test Helper Structures (mock implementations for testing) +// These mirror the structures defined in ADR-002 and addendums +// ============================================================================ + +/// Represents the lazy evaluation state for a j-tree level +#[derive(Clone, Debug, PartialEq)] +pub enum LazyLevel { + /// Not yet computed - saves memory until needed + Unmaterialized, + /// Computed and valid - ready for queries + Materialized(T), + /// Previously computed but now stale - can warm-start + Dirty(T), +} + +impl LazyLevel { + /// Check if level is materialized and valid + pub fn is_materialized(&self) -> bool { + matches!(self, LazyLevel::Materialized(_)) + } + + /// Check if level needs recomputation + pub fn is_dirty(&self) -> bool { + matches!(self, LazyLevel::Dirty(_)) + } + + /// Check if level has never been computed + pub fn is_unmaterialized(&self) -> bool { + matches!(self, LazyLevel::Unmaterialized) + } + + /// Get the data if materialized + pub fn as_materialized(&self) -> Option<&T> { + match self { + LazyLevel::Materialized(data) => Some(data), + _ => None, + } + } + + /// Get the stale data for warm-start + pub fn as_dirty(&self) -> Option<&T> { + match self { + LazyLevel::Dirty(data) => Some(data), + _ => None, + } + } + + /// Transition to materialized state + pub fn materialize(&mut self, data: T) { + *self = LazyLevel::Materialized(data); + } + + /// Mark as dirty (needs recomputation) + pub fn mark_dirty(&mut self) { + if let LazyLevel::Materialized(data) = self { + *self = LazyLevel::Dirty(data.clone()); + } + } + + /// Invalidate (become unmaterialized) + pub fn invalidate(&mut self) { + *self = LazyLevel::Unmaterialized; + } +} + +/// Mock j-tree level data for testing +#[derive(Clone, Debug)] +pub struct JTreeLevelData { + pub level: usize, + pub vertex_count: usize, + pub min_cut_value: f64, + pub computation_count: Arc, +} + +impl JTreeLevelData { + pub fn new(level: usize, vertices: usize) -> Self { + Self { + level, + vertex_count: vertices, + min_cut_value: vertices as f64 * 0.5, + computation_count: Arc::new(AtomicUsize::new(0)), + } + } +} + +/// BMSSP-backed j-tree level for cut queries (mock implementation) +/// Based on ADR-002-addendum-bmssp-integration.md +#[derive(Clone)] +pub struct BmsspJTreeLevel { + /// Number of vertices at this level + vertex_count: usize, + /// Cached path distances (= cut values in dual) + path_cache: HashMap<(u64, u64), f64>, + /// Edge weights (source, target) -> weight + edges: HashMap<(u64, u64), f64>, + /// Level index in hierarchy + level: usize, + /// Cache hit counter for testing + cache_hits: Arc, + /// Cache miss counter for testing + cache_misses: Arc, +} + +impl BmsspJTreeLevel { + /// Create a new BMSSP-backed level + pub fn new(vertex_count: usize, level: usize) -> Self { + Self { + vertex_count, + path_cache: HashMap::new(), + edges: HashMap::new(), + level, + cache_hits: Arc::new(AtomicUsize::new(0)), + cache_misses: Arc::new(AtomicUsize::new(0)), + } + } + + /// Add an edge with weight (capacity) + pub fn add_edge(&mut self, source: u64, target: u64, weight: f64) { + // Store edge in canonical order + let (u, v) = if source < target { + (source, target) + } else { + (target, source) + }; + self.edges.insert((u, v), weight); + } + + /// Min-cut between s and t via path-cut duality + /// Complexity: O(m*log^(2/3) n) vs O(n log n) direct + pub fn min_cut(&mut self, s: u64, t: u64) -> f64 { + // Canonical order for cache + let (u, v) = if s < t { (s, t) } else { (t, s) }; + + // Check cache first + if let Some(&cached) = self.path_cache.get(&(u, v)) { + self.cache_hits.fetch_add(1, Ordering::Relaxed); + return cached; + } + + self.cache_misses.fetch_add(1, Ordering::Relaxed); + + // Compute shortest path (mock: simple sum of edge weights on path) + let cut_value = self.compute_min_cut(u, v); + + // Cache for future queries (both directions) + self.path_cache.insert((u, v), cut_value); + + cut_value + } + + /// Multi-terminal cut using BMSSP multi-source approach + pub fn multi_terminal_cut(&mut self, terminals: &[u64]) -> f64 { + if terminals.len() < 2 { + return f64::INFINITY; + } + + let mut min_cut = f64::INFINITY; + + // Find minimum pairwise cut among terminals + for (i, &s) in terminals.iter().enumerate() { + for &t in terminals.iter().skip(i + 1) { + let cut = self.min_cut(s, t); + min_cut = min_cut.min(cut); + } + } + + min_cut + } + + /// Invalidate cache for affected vertices + pub fn invalidate_cache(&mut self, affected: &[u64]) { + let affected_set: HashSet<_> = affected.iter().copied().collect(); + self.path_cache.retain(|(u, v), _| { + !affected_set.contains(u) && !affected_set.contains(v) + }); + } + + /// Clear entire cache + pub fn clear_cache(&mut self) { + self.path_cache.clear(); + } + + /// Get cache statistics + pub fn cache_stats(&self) -> (usize, usize) { + ( + self.cache_hits.load(Ordering::Relaxed), + self.cache_misses.load(Ordering::Relaxed), + ) + } + + /// Mock computation: find min-cut using simple path analysis + fn compute_min_cut(&self, _s: u64, _t: u64) -> f64 { + // Simplified: return sum of minimum edge weight on any path + // In real implementation, this would use BMSSP shortest path + self.edges.values().copied().min_by(|a, b| a.partial_cmp(b).unwrap()).unwrap_or(f64::INFINITY) + } +} + +/// Lazy j-tree hierarchy with demand-paged levels +/// Based on ADR-002-addendum-sota-optimizations.md +pub struct LazyJTreeHierarchy { + /// Level states + levels: Vec>, + /// Bit set of materialized levels + materialized: HashSet, + /// Bit set of dirty levels + dirty: HashSet, + /// Approximation quality per level + alpha: f64, + /// Total computation count for testing + total_computations: AtomicUsize, +} + +impl LazyJTreeHierarchy { + /// Create hierarchy with given number of levels + pub fn new(num_levels: usize, alpha: f64) -> Self { + let levels = (0..num_levels).map(|_| LazyLevel::Unmaterialized).collect(); + Self { + levels, + materialized: HashSet::new(), + dirty: HashSet::new(), + alpha, + total_computations: AtomicUsize::new(0), + } + } + + /// Get number of levels + pub fn num_levels(&self) -> usize { + self.levels.len() + } + + /// Query approximate min-cut with lazy materialization + pub fn approximate_min_cut(&mut self) -> ApproximateCut { + let mut current_level = self.levels.len() - 1; + + // Start from coarsest level, refine as needed + while current_level > 0 { + self.ensure_materialized(current_level); + + if let Some(data) = self.levels[current_level].as_materialized() { + // Early termination if approximation is good enough + let approx_factor = self.alpha.powi((self.levels.len() - current_level) as i32); + if approx_factor < 2.0 { + // Acceptable approximation + return ApproximateCut { + value: data.min_cut_value, + approximation_factor: approx_factor, + level_used: current_level, + }; + } + } + + current_level -= 1; + } + + // Use finest level for best accuracy + self.ensure_materialized(0); + if let Some(data) = self.levels[0].as_materialized() { + ApproximateCut { + value: data.min_cut_value, + approximation_factor: 1.0, + level_used: 0, + } + } else { + ApproximateCut { + value: f64::INFINITY, + approximation_factor: f64::INFINITY, + level_used: 0, + } + } + } + + /// Approximate min-cut at specific level + pub fn approximate_min_cut_at_level(&mut self, level: usize) -> Option { + if level >= self.levels.len() { + return None; + } + self.ensure_materialized(level); + self.levels[level].as_materialized().map(|d| d.min_cut_value) + } + + /// Ensure level is materialized (demand-paging) + fn ensure_materialized(&mut self, level: usize) { + match &self.levels[level] { + LazyLevel::Unmaterialized => { + // First-time computation + self.total_computations.fetch_add(1, Ordering::Relaxed); + let vertices = 100 / (level + 1); // Decreasing vertices at higher levels + let data = JTreeLevelData::new(level, vertices); + self.levels[level] = LazyLevel::Materialized(data); + self.materialized.insert(level); + } + LazyLevel::Dirty(old_data) => { + // Warm-start from previous state + self.total_computations.fetch_add(1, Ordering::Relaxed); + let mut new_data = old_data.clone(); + new_data.computation_count.fetch_add(1, Ordering::Relaxed); + // Warm-start: reuse structure, only update affected parts + new_data.min_cut_value *= 0.95; // Simulated adjustment + self.levels[level] = LazyLevel::Materialized(new_data); + self.dirty.remove(&level); + } + LazyLevel::Materialized(_) => { + // Already valid, no-op + } + } + } + + /// Mark levels as dirty after edge update + pub fn mark_dirty(&mut self, affected_levels: &[usize]) { + for &level in affected_levels { + if level < self.levels.len() && self.materialized.contains(&level) { + self.levels[level].mark_dirty(); + self.dirty.insert(level); + } + } + } + + /// Check if level is materialized + pub fn is_materialized(&self, level: usize) -> bool { + level < self.levels.len() && self.materialized.contains(&level) + } + + /// Check if level is dirty + pub fn is_dirty(&self, level: usize) -> bool { + self.dirty.contains(&level) + } + + /// Get total computation count + pub fn total_computations(&self) -> usize { + self.total_computations.load(Ordering::Relaxed) + } +} + +/// Result of approximate min-cut query +#[derive(Debug, Clone)] +pub struct ApproximateCut { + pub value: f64, + pub approximation_factor: f64, + pub level_used: usize, +} + +/// Two-tier coordinator for approximate/exact escalation +/// Based on ADR-002: Two-Tier Dynamic Cut Architecture +pub struct TwoTierCoordinator { + /// Tier 1: Fast approximate hierarchy + jtree: LazyJTreeHierarchy, + /// Tier 2: Exact min-cut value (mock) + exact_value: f64, + /// Trigger threshold for escalation + critical_threshold: f64, + /// Maximum acceptable approximation factor + max_approx_factor: f64, + /// Cache for results + cached_result: Option, + /// Count of exact queries for testing + exact_queries: AtomicUsize, + /// Count of approximate queries for testing + approx_queries: AtomicUsize, +} + +impl TwoTierCoordinator { + /// Create coordinator with given configuration + pub fn new(num_levels: usize, exact_value: f64, critical_threshold: f64) -> Self { + Self { + jtree: LazyJTreeHierarchy::new(num_levels, 1.5), + exact_value, + critical_threshold, + max_approx_factor: 2.0, + cached_result: None, + exact_queries: AtomicUsize::new(0), + approx_queries: AtomicUsize::new(0), + } + } + + /// Query min-cut with tiered strategy + pub fn min_cut(&mut self, exact_required: bool) -> CutResult { + // Check cache first + if let Some(cached) = &self.cached_result { + if !exact_required || cached.is_exact { + return cached.clone(); + } + } + + // Tier 1: Fast approximate query + let approx = self.jtree.approximate_min_cut(); + self.approx_queries.fetch_add(1, Ordering::Relaxed); + + // Decide whether to escalate to Tier 2 + let should_escalate = exact_required + || approx.value < self.critical_threshold + || approx.approximation_factor > self.max_approx_factor; + + let result = if should_escalate { + // Tier 2: Exact verification + self.exact_queries.fetch_add(1, Ordering::Relaxed); + CutResult { + value: self.exact_value, + is_exact: true, + approximation_factor: 1.0, + tier_used: Tier::Exact, + } + } else { + CutResult { + value: approx.value, + is_exact: false, + approximation_factor: approx.approximation_factor, + tier_used: Tier::Approximate, + } + }; + + self.cached_result = Some(result.clone()); + result + } + + /// Handle edge insertion + pub fn insert_edge(&mut self, _u: u64, _v: u64, _weight: f64) { + self.cached_result = None; + // Mark all levels as dirty for simplicity + let all_levels: Vec = (0..self.jtree.num_levels()).collect(); + self.jtree.mark_dirty(&all_levels); + } + + /// Handle edge deletion + pub fn delete_edge(&mut self, _u: u64, _v: u64) { + self.cached_result = None; + let all_levels: Vec = (0..self.jtree.num_levels()).collect(); + self.jtree.mark_dirty(&all_levels); + } + + /// Get query statistics + pub fn query_stats(&self) -> (usize, usize) { + ( + self.approx_queries.load(Ordering::Relaxed), + self.exact_queries.load(Ordering::Relaxed), + ) + } + + /// Update exact value for testing + pub fn set_exact_value(&mut self, value: f64) { + self.exact_value = value; + } +} + +/// Result of cut query +#[derive(Debug, Clone)] +pub struct CutResult { + pub value: f64, + pub is_exact: bool, + pub approximation_factor: f64, + pub tier_used: Tier, +} + +/// Which tier was used for the query +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum Tier { + Approximate, + Exact, +} + +// ============================================================================ +// Unit Tests for LazyLevel +// ============================================================================ + +mod lazy_level_tests { + use super::*; + + #[test] + fn test_unmaterialized_to_materialized_transition() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + + assert!(level.is_unmaterialized()); + assert!(!level.is_materialized()); + assert!(!level.is_dirty()); + assert!(level.as_materialized().is_none()); + + // Transition to Materialized + let data = JTreeLevelData::new(0, 100); + level.materialize(data); + + assert!(!level.is_unmaterialized()); + assert!(level.is_materialized()); + assert!(!level.is_dirty()); + assert!(level.as_materialized().is_some()); + assert_eq!(level.as_materialized().unwrap().vertex_count, 100); + } + + #[test] + fn test_materialized_to_dirty_transition() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + let data = JTreeLevelData::new(0, 100); + level.materialize(data); + + assert!(level.is_materialized()); + + // Mark as dirty + level.mark_dirty(); + + assert!(!level.is_unmaterialized()); + assert!(!level.is_materialized()); + assert!(level.is_dirty()); + assert!(level.as_dirty().is_some()); + assert_eq!(level.as_dirty().unwrap().vertex_count, 100); + } + + #[test] + fn test_dirty_to_materialized_warm_start() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + + // First computation + let data = JTreeLevelData::new(0, 100); + level.materialize(data); + + // Mark dirty + level.mark_dirty(); + assert!(level.is_dirty()); + + // Get old data for warm-start + let old_data = level.as_dirty().unwrap().clone(); + + // Warm-start re-computation + let mut new_data = old_data; + new_data.min_cut_value *= 0.9; // Adjusted value + level.materialize(new_data); + + assert!(level.is_materialized()); + assert!(!level.is_dirty()); + } + + #[test] + fn test_cache_invalidation() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + let data = JTreeLevelData::new(0, 100); + level.materialize(data); + + assert!(level.is_materialized()); + + // Full invalidation + level.invalidate(); + + assert!(level.is_unmaterialized()); + assert!(!level.is_materialized()); + assert!(!level.is_dirty()); + } + + #[test] + fn test_mark_dirty_on_unmaterialized_is_noop() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + + level.mark_dirty(); + + // Should still be unmaterialized + assert!(level.is_unmaterialized()); + assert!(!level.is_dirty()); + } + + #[test] + fn test_mark_dirty_on_dirty_is_noop() { + let mut level: LazyLevel = LazyLevel::Unmaterialized; + let data = JTreeLevelData::new(0, 100); + level.materialize(data); + level.mark_dirty(); + + let original_value = level.as_dirty().unwrap().min_cut_value; + + // Mark dirty again + level.mark_dirty(); + + // Should still be dirty with same data + assert!(level.is_dirty()); + assert_eq!(level.as_dirty().unwrap().min_cut_value, original_value); + } +} + +// ============================================================================ +// Unit Tests for BmsspJTreeLevel +// ============================================================================ + +mod bmssp_jtree_level_tests { + use super::*; + + fn create_test_level() -> BmsspJTreeLevel { + let mut level = BmsspJTreeLevel::new(10, 0); + // Create a simple path: 0-1-2-3-4 + level.add_edge(0, 1, 5.0); + level.add_edge(1, 2, 3.0); + level.add_edge(2, 3, 4.0); + level.add_edge(3, 4, 2.0); + level + } + + #[test] + fn test_min_cut_returns_correct_approximation() { + let mut level = create_test_level(); + + // Minimum edge weight is 2.0 + let cut = level.min_cut(0, 4); + + assert!(cut >= 0.0); + assert!(cut < f64::INFINITY); + // Should find minimum weight edge + assert_eq!(cut, 2.0); + } + + #[test] + fn test_multi_terminal_cut_with_various_terminal_sets() { + let mut level = create_test_level(); + + // Two terminals + let cut_2 = level.multi_terminal_cut(&[0, 4]); + assert!(cut_2 >= 0.0); + + // Three terminals + let cut_3 = level.multi_terminal_cut(&[0, 2, 4]); + assert!(cut_3 >= 0.0); + // More terminals shouldn't increase minimum pairwise cut + assert!(cut_3 <= cut_2 || (cut_3 - cut_2).abs() < f64::EPSILON); + + // Single terminal + let cut_1 = level.multi_terminal_cut(&[0]); + assert_eq!(cut_1, f64::INFINITY); + + // Empty terminals + let cut_0 = level.multi_terminal_cut(&[]); + assert_eq!(cut_0, f64::INFINITY); + } + + #[test] + fn test_cache_hits_and_misses() { + let mut level = create_test_level(); + + // First query - cache miss + let _ = level.min_cut(0, 4); + let (hits, misses) = level.cache_stats(); + assert_eq!(hits, 0); + assert_eq!(misses, 1); + + // Same query - cache hit + let _ = level.min_cut(0, 4); + let (hits, misses) = level.cache_stats(); + assert_eq!(hits, 1); + assert_eq!(misses, 1); + + // Reversed query - should also hit (symmetric) + let _ = level.min_cut(4, 0); + let (hits, misses) = level.cache_stats(); + assert_eq!(hits, 2); + assert_eq!(misses, 1); + + // Different query - cache miss + let _ = level.min_cut(1, 3); + let (hits, misses) = level.cache_stats(); + assert_eq!(hits, 2); + assert_eq!(misses, 2); + } + + #[test] + fn test_cache_invalidation_for_affected_vertices() { + let mut level = create_test_level(); + + // Populate cache + let _ = level.min_cut(0, 4); + let _ = level.min_cut(1, 3); + let (hits, _) = level.cache_stats(); + assert_eq!(hits, 0); + + // Verify cache is populated + let _ = level.min_cut(0, 4); + let (hits, _) = level.cache_stats(); + assert_eq!(hits, 1); + + // Invalidate cache for vertex 2 + level.invalidate_cache(&[2]); + + // Query involving 2 should miss now, but 0-4 doesn't involve 2 + let _ = level.min_cut(0, 4); + let (hits, _) = level.cache_stats(); + assert_eq!(hits, 2); + + // Query involving 2 should miss + let _ = level.min_cut(1, 3); + let (_, misses) = level.cache_stats(); + // 1-3 path includes vertex 2, so it was invalidated + assert!(misses >= 2); + } + + #[test] + fn test_clear_cache() { + let mut level = create_test_level(); + + // Populate cache + let _ = level.min_cut(0, 4); + let _ = level.min_cut(1, 3); + let _ = level.min_cut(0, 4); + let (hits, _) = level.cache_stats(); + assert_eq!(hits, 1); + + // Clear cache + level.clear_cache(); + + // All queries should miss now + let _ = level.min_cut(0, 4); + let _ = level.min_cut(1, 3); + let (_, misses) = level.cache_stats(); + assert_eq!(misses, 4); + } + + #[test] + fn test_symmetry_of_cut_values() { + let mut level = create_test_level(); + + let cut_forward = level.min_cut(0, 4); + level.clear_cache(); + let cut_backward = level.min_cut(4, 0); + + assert_eq!(cut_forward, cut_backward); + } + + #[test] + fn test_self_cut_is_infinity_or_zero() { + let mut level = create_test_level(); + + // Cut from vertex to itself should be infinity (no separation needed) + // or zero depending on implementation + let cut = level.min_cut(2, 2); + assert!(cut == f64::INFINITY || cut == 0.0 || cut == 2.0); + } +} + +// ============================================================================ +// Unit Tests for LazyJTreeHierarchy +// ============================================================================ + +mod lazy_jtree_hierarchy_tests { + use super::*; + + #[test] + fn test_level_demand_paging() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // Initially no levels materialized + for level in 0..5 { + assert!(!hierarchy.is_materialized(level)); + } + assert_eq!(hierarchy.total_computations(), 0); + + // Query triggers materialization + let _ = hierarchy.approximate_min_cut(); + + // At least one level should be materialized + let materialized_count = (0..5).filter(|&l| hierarchy.is_materialized(l)).count(); + assert!(materialized_count > 0); + assert!(hierarchy.total_computations() > 0); + } + + #[test] + fn test_approximate_min_cut_at_various_levels() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // Query at level 0 (finest) + let cut_0 = hierarchy.approximate_min_cut_at_level(0); + assert!(cut_0.is_some()); + assert!(hierarchy.is_materialized(0)); + + // Query at level 4 (coarsest) + let cut_4 = hierarchy.approximate_min_cut_at_level(4); + assert!(cut_4.is_some()); + assert!(hierarchy.is_materialized(4)); + + // Coarser levels should have fewer vertices, possibly lower cut + // (this depends on implementation, but they should be comparable) + assert!(cut_0.unwrap() > 0.0); + assert!(cut_4.unwrap() > 0.0); + } + + #[test] + fn test_out_of_bounds_level() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + let cut = hierarchy.approximate_min_cut_at_level(10); + assert!(cut.is_none()); + } + + #[test] + fn test_mark_dirty_propagation() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // Materialize some levels + let _ = hierarchy.approximate_min_cut_at_level(2); + let _ = hierarchy.approximate_min_cut_at_level(3); + assert!(hierarchy.is_materialized(2)); + assert!(hierarchy.is_materialized(3)); + assert!(!hierarchy.is_dirty(2)); + assert!(!hierarchy.is_dirty(3)); + + // Mark levels dirty + hierarchy.mark_dirty(&[2, 3]); + + assert!(hierarchy.is_dirty(2)); + assert!(hierarchy.is_dirty(3)); + // Not materialized anymore (in clean sense) + assert!(!hierarchy.is_materialized(2) || hierarchy.is_dirty(2)); + } + + #[test] + fn test_warm_start_reduces_computation() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // First computation + let _ = hierarchy.approximate_min_cut_at_level(2); + let first_computations = hierarchy.total_computations(); + + // Mark dirty + hierarchy.mark_dirty(&[2]); + assert!(hierarchy.is_dirty(2)); + + // Re-query - should use warm-start + let _ = hierarchy.approximate_min_cut_at_level(2); + let second_computations = hierarchy.total_computations(); + + // Warm-start still counts as computation but should use old data + assert_eq!(second_computations, first_computations + 1); + assert!(!hierarchy.is_dirty(2)); + } + + #[test] + fn test_hierarchy_consistency_after_updates() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // Get initial cuts + let cut_0 = hierarchy.approximate_min_cut_at_level(0).unwrap(); + let cut_2 = hierarchy.approximate_min_cut_at_level(2).unwrap(); + let cut_4 = hierarchy.approximate_min_cut_at_level(4).unwrap(); + + // All cuts should be positive and finite + assert!(cut_0 > 0.0 && cut_0 < f64::INFINITY); + assert!(cut_2 > 0.0 && cut_2 < f64::INFINITY); + assert!(cut_4 > 0.0 && cut_4 < f64::INFINITY); + + // After marking dirty and re-querying, consistency should hold + hierarchy.mark_dirty(&[0, 2, 4]); + let new_cut_0 = hierarchy.approximate_min_cut_at_level(0).unwrap(); + let new_cut_2 = hierarchy.approximate_min_cut_at_level(2).unwrap(); + + // Warm-start adjusts values slightly + assert!(new_cut_0 > 0.0); + assert!(new_cut_2 > 0.0); + } + + #[test] + fn test_unmaterialized_levels_not_marked_dirty() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + // Only materialize level 2 + let _ = hierarchy.approximate_min_cut_at_level(2); + + // Try to mark all levels dirty + hierarchy.mark_dirty(&[0, 1, 2, 3, 4]); + + // Only level 2 should be dirty (was materialized) + assert!(hierarchy.is_dirty(2)); + assert!(!hierarchy.is_dirty(0)); // Never materialized + assert!(!hierarchy.is_dirty(1)); // Never materialized + assert!(!hierarchy.is_dirty(3)); // Never materialized + } +} + +// ============================================================================ +// Integration Tests for TwoTierCoordinator +// ============================================================================ + +mod two_tier_coordinator_tests { + use super::*; + + #[test] + fn test_approximate_to_exact_escalation() { + // Create coordinator with critical threshold that will trigger escalation + let mut coordinator = TwoTierCoordinator::new(5, 10.0, 100.0); + + // Query without requiring exact - approximate cut < critical threshold + let result = coordinator.min_cut(false); + + // Should escalate because approximate value is likely < 100.0 + assert!(result.is_exact || result.value < 100.0); + } + + #[test] + fn test_exact_required_always_escalates() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 10.0); + + // Query with exact required + let result = coordinator.min_cut(true); + + assert!(result.is_exact); + assert_eq!(result.tier_used, Tier::Exact); + assert_eq!(result.approximation_factor, 1.0); + assert_eq!(result.value, 50.0); + } + + #[test] + fn test_cache_behavior() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 10.0); + + // First query + let result1 = coordinator.min_cut(false); + let (approx1, exact1) = coordinator.query_stats(); + + // Second query - should use cache + let result2 = coordinator.min_cut(false); + let (approx2, exact2) = coordinator.query_stats(); + + // Cache should be hit (no additional queries) + assert_eq!(result1.value, result2.value); + assert_eq!(approx1, approx2); + assert_eq!(exact1, exact2); + } + + #[test] + fn test_cache_invalidation_on_edge_insert() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 10.0); + + // First query + let _ = coordinator.min_cut(false); + let (approx1, _) = coordinator.query_stats(); + + // Insert edge - invalidates cache + coordinator.insert_edge(1, 2, 5.0); + + // Query again - should not use cache + let _ = coordinator.min_cut(false); + let (approx2, _) = coordinator.query_stats(); + + // Should have made additional approximate query + assert_eq!(approx2, approx1 + 1); + } + + #[test] + fn test_cache_invalidation_on_edge_delete() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 10.0); + + // First query + let _ = coordinator.min_cut(false); + let (approx1, _) = coordinator.query_stats(); + + // Delete edge - invalidates cache + coordinator.delete_edge(1, 2); + + // Query again - should not use cache + let _ = coordinator.min_cut(false); + let (approx2, _) = coordinator.query_stats(); + + assert_eq!(approx2, approx1 + 1); + } + + #[test] + fn test_edge_update_propagation() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 10.0); + + // Materialize hierarchy + let _ = coordinator.min_cut(false); + + // Insert edge - should mark levels dirty + coordinator.insert_edge(1, 2, 5.0); + + // Query should trigger re-computation + let before = coordinator.jtree.total_computations(); + let _ = coordinator.min_cut(false); + let after = coordinator.jtree.total_computations(); + + assert!(after > before); + } + + #[test] + fn test_approximate_only_when_safe() { + // Set up coordinator where approximate is sufficient + let mut coordinator = TwoTierCoordinator::new(5, 100.0, 5.0); + coordinator.set_exact_value(100.0); + + // Query without exact requirement and with high threshold + // The approximate value should be above critical threshold + let result = coordinator.min_cut(false); + + // Depending on approximation factor, may or may not escalate + // But the result should be reasonable + assert!(result.value > 0.0); + assert!(result.value < f64::INFINITY); + } + + #[test] + fn test_escalation_when_approx_factor_too_high() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 1.0); + // Set max_approx_factor very low to force escalation + coordinator.max_approx_factor = 1.0; + + let result = coordinator.min_cut(false); + + // Should escalate because approximation factor > 1.0 + assert!(result.is_exact || result.approximation_factor <= 1.0); + } +} + +// ============================================================================ +// Property-Based Tests +// ============================================================================ + +mod property_tests { + use super::*; + + /// Property: Approximate cut <= (1 + epsilon) * exact cut + #[test] + fn property_approximate_cut_bound() { + let epsilon = 0.5; + let exact_value = 100.0; + let mut coordinator = TwoTierCoordinator::new(5, exact_value, 10.0); + + for _ in 0..10 { + let approx = coordinator.jtree.approximate_min_cut(); + + // Approximate should not be too far from exact + // (with poly-log approximation factor) + let ratio = approx.value / exact_value; + + // The approximation factor should be bounded by alpha^L + assert!(ratio > 0.0, "Approximate cut should be positive"); + assert!( + ratio < 100.0 || approx.value == f64::INFINITY, + "Approximation should be bounded" + ); + + // Mark dirty and re-test + coordinator.jtree.mark_dirty(&[0, 1, 2, 3, 4]); + } + } + + /// Property: Hierarchy consistency after updates + #[test] + fn property_hierarchy_consistency_after_updates() { + let mut hierarchy = LazyJTreeHierarchy::new(5, 1.5); + + for iteration in 0..20 { + // Materialize random levels + let levels_to_materialize: Vec = (0..5) + .filter(|_| iteration % 2 == 0) + .collect(); + + for level in &levels_to_materialize { + let _ = hierarchy.approximate_min_cut_at_level(*level); + } + + // Mark some dirty + hierarchy.mark_dirty(&[iteration % 5]); + + // Query and verify consistency + let cut = hierarchy.approximate_min_cut(); + assert!(cut.value > 0.0 || cut.value == f64::INFINITY); + assert!(cut.approximation_factor >= 1.0); + assert!(cut.level_used < 5); + } + } + + /// Property: Cache coherence - same query returns same result + #[test] + fn property_cache_coherence() { + let mut level = BmsspJTreeLevel::new(10, 0); + level.add_edge(0, 1, 5.0); + level.add_edge(1, 2, 3.0); + level.add_edge(2, 3, 4.0); + + for _ in 0..100 { + let cut1 = level.min_cut(0, 3); + let cut2 = level.min_cut(0, 3); + let cut3 = level.min_cut(3, 0); + + assert_eq!(cut1, cut2, "Same query should return same result"); + assert_eq!(cut1, cut3, "Cut should be symmetric"); + } + } + + /// Property: Invalidation affects only specified vertices + #[test] + fn property_selective_invalidation() { + let mut level = BmsspJTreeLevel::new(10, 0); + level.add_edge(0, 1, 5.0); + level.add_edge(1, 2, 3.0); + level.add_edge(5, 6, 2.0); + level.add_edge(6, 7, 4.0); + + // Query both regions + let _ = level.min_cut(0, 2); + let _ = level.min_cut(5, 7); + + let (hits_before, misses_before) = level.cache_stats(); + + // Invalidate only region 0-2 + level.invalidate_cache(&[1]); + + // Query region 5-7 should still hit + let _ = level.min_cut(5, 7); + let (hits_after, _) = level.cache_stats(); + + assert!(hits_after > hits_before, "Unaffected region should still be cached"); + } +} + +// ============================================================================ +// Edge Case Tests +// ============================================================================ + +mod edge_case_tests { + use super::*; + + #[test] + fn test_empty_hierarchy() { + let mut hierarchy = LazyJTreeHierarchy::new(0, 1.5); + assert_eq!(hierarchy.num_levels(), 0); + + let cut = hierarchy.approximate_min_cut(); + assert_eq!(cut.value, f64::INFINITY); + } + + #[test] + fn test_single_level_hierarchy() { + let mut hierarchy = LazyJTreeHierarchy::new(1, 1.5); + + let cut = hierarchy.approximate_min_cut(); + assert!(cut.value > 0.0); + assert_eq!(cut.level_used, 0); + } + + #[test] + fn test_empty_bmssp_level() { + let mut level = BmsspJTreeLevel::new(0, 0); + + let cut = level.min_cut(0, 1); + assert_eq!(cut, f64::INFINITY); + } + + #[test] + fn test_disconnected_vertices() { + let mut level = BmsspJTreeLevel::new(10, 0); + level.add_edge(0, 1, 5.0); + // 3 and 4 are disconnected from 0-1 + + let cut = level.min_cut(0, 3); + // Should be infinity (disconnected) or minimum edge weight + assert!(cut > 0.0 || cut == f64::INFINITY); + } + + #[test] + fn test_very_large_weights() { + let mut level = BmsspJTreeLevel::new(5, 0); + level.add_edge(0, 1, 1e100); + level.add_edge(1, 2, 1e100); + + let cut = level.min_cut(0, 2); + assert!(cut.is_finite()); + assert!(cut > 0.0); + } + + #[test] + fn test_very_small_weights() { + let mut level = BmsspJTreeLevel::new(5, 0); + level.add_edge(0, 1, 1e-100); + level.add_edge(1, 2, 1e-100); + + let cut = level.min_cut(0, 2); + assert!(cut > 0.0); + } + + #[test] + fn test_coordinator_with_zero_threshold() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, 0.0); + + // Should always escalate (threshold is 0) + let result = coordinator.min_cut(false); + + // Any approximate value >= 0 so might not escalate + assert!(result.value > 0.0); + } + + #[test] + fn test_coordinator_with_infinite_threshold() { + let mut coordinator = TwoTierCoordinator::new(5, 50.0, f64::INFINITY); + + // Should escalate (approximate value < infinite threshold is always true) + let result = coordinator.min_cut(false); + + assert!(result.is_exact); + } + + #[test] + fn test_rapid_cache_operations() { + let mut level = BmsspJTreeLevel::new(10, 0); + level.add_edge(0, 1, 1.0); + level.add_edge(1, 2, 2.0); + + // Rapid query-invalidate cycles + for _ in 0..1000 { + let _ = level.min_cut(0, 2); + level.invalidate_cache(&[1]); + let _ = level.min_cut(0, 2); + level.clear_cache(); + } + + // Should not panic or have memory issues + let cut = level.min_cut(0, 2); + assert!(cut > 0.0); + } +} + +// ============================================================================ +// Stress Tests +// ============================================================================ + +mod stress_tests { + use super::*; + + #[test] + fn stress_many_levels() { + let mut hierarchy = LazyJTreeHierarchy::new(100, 1.1); + + // Query various levels + for level in (0..100).step_by(10) { + let cut = hierarchy.approximate_min_cut_at_level(level); + assert!(cut.is_some()); + } + + // Mark all dirty and re-query + let all_levels: Vec = (0..100).collect(); + hierarchy.mark_dirty(&all_levels); + + let cut = hierarchy.approximate_min_cut(); + assert!(cut.value > 0.0); + } + + #[test] + fn stress_many_queries() { + let mut level = BmsspJTreeLevel::new(100, 0); + + // Create dense graph + for i in 0..99u64 { + level.add_edge(i, i + 1, (i + 1) as f64); + } + + // Many queries + for i in 0..50 { + for j in (i + 1)..50 { + let _ = level.min_cut(i, j); + } + } + + // Verify cache statistics are reasonable + let (hits, misses) = level.cache_stats(); + assert!(hits > 0, "Should have cache hits"); + assert!(misses > 0, "Should have cache misses"); + // Hit ratio should be > 0 after warmup + assert!(hits as f64 / (hits + misses) as f64 > 0.0); + } + + #[test] + fn stress_coordinator_workload() { + let mut coordinator = TwoTierCoordinator::new(10, 100.0, 50.0); + + // Mixed workload + for i in 0..1000 { + match i % 4 { + 0 => { + let _ = coordinator.min_cut(false); + } + 1 => { + let _ = coordinator.min_cut(true); + } + 2 => { + coordinator.insert_edge(i as u64, (i + 1) as u64, 1.0); + } + 3 => { + coordinator.delete_edge(i as u64, (i + 1) as u64); + } + _ => {} + } + } + + // Should complete without errors + let (approx, exact) = coordinator.query_stats(); + assert!(approx > 0); + assert!(exact > 0); + } +} + +// ============================================================================ +// Thread Safety Tests (for concurrent scenarios) +// ============================================================================ + +mod thread_safety_tests { + use super::*; + use std::sync::Mutex; + + #[test] + fn test_lazy_level_send_sync() { + fn assert_send_sync() {} + // LazyLevel should be Send + Sync when T is + // Note: JTreeLevelData contains Arc, which is Send + Sync + } + + #[test] + fn test_concurrent_cache_stats() { + let level = BmsspJTreeLevel::new(10, 0); + + // Arc counters should be thread-safe + let cache_hits = level.cache_hits.clone(); + let cache_misses = level.cache_misses.clone(); + + // Simulate concurrent access + cache_hits.fetch_add(1, Ordering::Relaxed); + cache_misses.fetch_add(1, Ordering::Relaxed); + + assert_eq!(cache_hits.load(Ordering::Relaxed), 1); + assert_eq!(cache_misses.load(Ordering::Relaxed), 1); + } +}