ruvector

vrr/ruvector

Fork 0

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-23 04:27:11 +00:00

Commit graph

Author	SHA1	Message	Date
rUv	f0f33625f3	feat(prime-radiant): Universal Coherence Engine with Sheaf Laplacian AI Safety (#131 ) * docs(coherence-engine): add ADR-014 and DDD for sheaf Laplacian coherence engine Add comprehensive architecture documentation for ruvector-coherence crate: - ADR-014: Sheaf Laplacian-based coherence witnessing architecture - Universal coherence object with domain-agnostic interpretation - 5-layer architecture (Application → Gate → Computation → Governance → Storage) - 4-tier compute ladder (Reflex → Retrieval → Heavy → Human) - Full ruvector ecosystem integration (10+ crates) - 15 internal architectural decisions - DDD: Domain-Driven Design with 10 bounded contexts - Tile Fabric (cognitum-gate-kernel) - Adaptive Learning (sona) - Neural Gating (ruvector-nervous-system) - Learned Restriction Maps (ruvector-gnn) - Hyperbolic Coherence (ruvector-hyperbolic-hnsw) - Incoherence Isolation (ruvector-mincut) - Attention-Weighted Coherence (ruvector-attention) - Distributed Consensus (ruvector-raft) Key concept: "This is not prediction. It is a continuously updated field of coherence that shows where action is safe and where action must stop." Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement sheaf Laplacian coherence engine Implement the complete Prime-Radiant crate based on ADR-014: Core Modules: - substrate/: SheafGraph, SheafNode, SheafEdge, RestrictionMap (SIMD-optimized) - coherence/: CoherenceEngine, energy computation, spectral drift detection - governance/: PolicyBundle, WitnessRecord, LineageRecord (Blake3 hashing) - execution/: CoherenceGate, ComputeLane, ActionExecutor Ecosystem Integrations (feature-gated): - tiles/: cognitum-gate-kernel 256-tile WASM fabric adapter - sona_tuning/: Adaptive threshold learning with EWC++ - neural_gate/: Biologically-inspired gating with HDC encoding - learned_rho/: GNN-based learned restriction maps - attention/: Topology-gated attention, MoE routing, PDE diffusion - distributed/: Raft-based multi-node coherence Testing: - 138 tests (integration, property-based, chaos) - 8 benchmarks covering ADR-014 performance targets Stats: 91 files, ~30K lines of Rust code "This is not prediction. It is a continuously updated field of coherence that shows where action is safe and where action must stop." Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add RuvLLM integration to ADR-014 v0.4 - Add coherence-gated LLM inference architecture diagram - Add 5 integration modules with code examples: - SheafCoherenceValidator (replaces heuristic scoring) - UnifiedWitnessLog (merged audit trail) - PatternToRestrictionBridge (ReasoningBank → learned ρ) - MemoryCoherenceLayer (context as sheaf nodes) - CoherenceConfidence (energy → confidence mapping) - Add 7 integration ADRs (ADR-CE-016 through ADR-CE-022) - Add ruvllm to crate integration matrix and dependencies - Add 4 LLM-specific benefits to consequences - Add ruvllm feature flag Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add 22 coherence engine internal ADRs Create detailed ADR files for all internal coherence engine decisions: Core Architecture (ADR-CE-001 to ADR-CE-008): - 001: Sheaf Laplacian defines coherence witness - 002: Incremental computation with stored residuals - 003: PostgreSQL + ruvector hybrid storage - 004: Signed event log with deterministic replay - 005: First-class governance objects - 006: Coherence gate controls compute ladder - 007: Thresholds auto-tuned from traces - 008: Multi-tenant isolation boundaries Universal Coherence (ADR-CE-009 to ADR-CE-015): - 009: Single coherence object (one math, many interpretations) - 010: Domain-agnostic nodes and edges - 011: Residual = contradiction energy - 012: Gate = refusal mechanism with witness - 013: Not prediction (coherence field, not forecasting) - 014: Reflex lane default (most ops stay fast) - 015: Adapt without losing control RuvLLM Integration (ADR-CE-016 to ADR-CE-022): - 016: CoherenceValidator uses sheaf energy - 017: Unified audit trail (WitnessLog + governance) - 018: Pattern-to-restriction bridge (ReasoningBank) - 019: Memory as nodes (agentic, working, episodic) - 020: Confidence from energy (sigmoid mapping) - 021: Shared SONA between ruvllm and prime-radiant - 022: Failure learning (ErrorPatternLearner → ρ maps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement RuvLLM integration layer (ADR-014 v0.4) Implement complete Prime-Radiant + RuvLLM integration per ADR-CE-016 through ADR-CE-022: Core Integration Modules: - coherence_validator.rs: SheafCoherenceValidator using sheaf energy - witness_log.rs: UnifiedWitnessLog with hash chain for tamper evidence - pattern_bridge.rs: PatternToRestrictionBridge learning from verdicts - memory_layer.rs: MemoryCoherenceLayer tracking context as sheaf nodes - confidence.rs: CoherenceConfidence with sigmoid energy→confidence mapping Supporting Infrastructure: - mod.rs: Public API, re-exports, convenience constructors - error.rs: Comprehensive error types for each ADR - config.rs: LlmCoherenceConfig, thresholds, policies - gate.rs: LlmCoherenceGate high-level interface - adapter.rs: RuvLlmAdapter bridging type systems - bridge.rs: PolicyBridge, SonaBridge for synchronization - witness.rs: WitnessAdapter for correlation - traits.rs: Trait definitions for loose coupling Testing: - 22 integration tests covering all modules - Self-contained mock implementations - Feature-gated with #[cfg(feature = "ruvllm")] Feature Flags: - ruvllm feature in Cargo.toml - Optional dependency on ruvllm crate - Added to "full" feature set Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(prime-radiant): add comprehensive README with examples Add user-friendly documentation covering: - Introduction explaining coherence vs confidence - Core concepts (coherence field, compute ladder) - Features overview (engine, governance, RuvLLM integration) - Quick start code examples: - Basic coherence check - LLM response validation - Memory consistency tracking - Confidence from energy - Application tiers (today, near-term, future) - Domain examples (AI, finance, medical, robotics, security) - Feature flags reference - Performance targets - Architecture diagram Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(adr): add ADR-015 Coherence-Gated Transformer (Sheaf Attention) Propose novel low-latency transformer architecture using coherence energy: Core Innovation: - Route tokens to compute lanes based on coherence energy, not confidence - Sparse attention using residual energy (skip coherent pairs) - Early exit when energy converges (not confidence threshold) - Restriction maps replace QKV projections Architecture: - Lane 0 (Reflex): 1-2 layers, local attention, <0.1ms - Lane 1 (Standard): 6 layers, sparse sheaf attention, ~1ms - Lane 2 (Deep): 12+ layers, full + MoE, ~5ms - Lane 3 (Escalate): Return uncertainty Performance Targets: - 5-10x latency reduction (10ms → 1-2ms for 128 tokens) - 2.5x memory reduction - <5% quality degradation - Provable coherence bound on output Mathematical Foundation: - Attention weight ∝ exp(-β × residual_energy) - Token routing via E(t) = Σ w_e \|\|ρ_t(x) - ρ_ctx(x)\|\|² - Early exit when ΔE < ε (energy converged) Target: ruvector-attention crate with sheaf/ and coherence_gated/ modules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): implement coherence engine with CGT attention Complete implementation of Prime-Radiant coherence engine and Coherence-Gated Transformer (CGT) sheaf attention module. Core Features: - Sheaf Laplacian energy computation with restriction maps - 4-lane compute ladder (Reflex/Retrieval/Heavy/Human) - Cryptographic witness chains for audit trails - Policy bundles with multi-party approval Storage Backends: - InMemoryStorage with KNN search - FileStorage with Write-Ahead Logging (WAL) - PostgresStorage with full schema (feature-gated) - HybridStorage combining file + optional PostgreSQL CGT Sheaf Attention (ruvector-attention): - RestrictionMap with residual/energy computation - SheafAttention layer: A_ij = exp(-β×E_ij)/Z - TokenRouter with compute lane routing - SparseResidualAttention with energy-based masking - EarlyExit with energy convergence detection Performance Optimizations: - Zero-allocation hot paths (apply_into, compute_residual_norm_sq) - SIMD-friendly 4-way unrolled loops - Branchless lane routing - Pre-allocated buffers for batch operations RuvLLM Integration: - SheafCoherenceValidator for LLM response validation - UnifiedWitnessLog linking inference + coherence - MemoryCoherenceLayer for contradiction detection - CoherenceConfidence for interpretable uncertainty Tests: 202 passing in ruvector-attention, 180+ in prime-radiant Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(prime-radiant): add GPU acceleration, SIMD optimizations, and benchmarks GPU Acceleration (wgpu-rs): - GpuCoherenceEngine with automatic CPU fallback - GpuDevice: adapter/device management with high-perf selection - GpuDispatcher: kernel execution with pipeline caching and buffer pooling - GpuBufferManager: typed buffer management with pooling - Compute kernels: residuals, energy reduction, sheaf attention, token routing WGSL Compute Shaders (6 files, 1,412 lines): - compute_residuals.wgsl: parallel edge residual computation - compute_energy.wgsl: two-phase parallel reduction - sheaf_attention.wgsl: energy-based attention weights A_ij = exp(-beta * E_ij) - token_routing.wgsl: branchless lane assignment - sparse_mask.wgsl: sparse attention mask generation - types.wgsl: shared GPU struct definitions SIMD Optimizations (wide crate): - Runtime CPU feature detection (AVX2, AVX-512, SSE4.2, NEON) - f32x8 vectorized operations - simd/vectors.rs: dot_product_simd, norm_squared_simd, subtract_simd - simd/matrix.rs: matmul_simd, matvec_simd, transpose_simd - simd/energy.rs: batch_residuals_simd, weighted_energy_sum_simd - 38 unit tests verifying SIMD correctness Benchmarks (criterion): - coherence_benchmarks.rs: core operations, graph scaling - simd_benchmarks.rs: SIMD vs naive comparisons - gpu_benchmarks.rs: CPU vs GPU performance Tests: - 18 GPU coherence tests (16 active, 2 perf ignored) - GPU-CPU consistency within 1% relative error - Error handling and fallback verification README improvements: - "What Prime-Radiant is NOT" section - Concrete numeric example with arithmetic - Flagship LLM hallucination refusal walkthrough - Infrastructure positioning Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * perf(prime-radiant): optimize SIMD and core computation patterns SIMD Optimizations: - Replace element-by-element load_f32x8 with try_into for direct memory copy - Fix redundant SIMD comparisons in lane assignment (compute masks once, use blend) - Apply across vectors.rs, matrix.rs, and energy.rs Core Computation Patterns: - Replace i % 4 modulo with chunks_exact() for proper auto-vectorization - Fix edge.rs: residual_norm_squared, residual_with_energy - Fix node.rs: norm_squared, dot product Graph API: - Add get_node_ref() for zero-copy node access via DashMap reference - Add with_node() closure API for efficient read-only operations Benchmark findings: - Incremental updates meet target (<100us): 59us actual - Linear O(n) scaling confirmed - Further SIMD/parallelization needed for <1us/edge target Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * perf(prime-radiant): add CSR sparse matrix, GPU buffer prealloc, thread-local scratch Performance optimizations for Prime-Radiant coherence engine: CSR Sparse Matrix (restriction.rs): - Full CsrMatrix struct with row_ptr, col_indices, values - COO to CSR conversion with from_coo() and from_coo_arrays() - Zero-allocation matvec_into() and matvec_add_into() - SIMD-friendly 4-element loop unrolling - 13 new tests covering all CSR operations GPU Buffer Pre-allocation (engine.rs, kernels.rs): - Pre-allocated params, energy_params, partial_sums, staging buffers - Zero per-frame allocations in compute_energy() - New create_bind_group_raw() methods for raw buffer references - CSR matrix support in convert_restriction_map() Thread-Local Scratch Buffers (edge.rs): - EdgeScratch struct with 3 reusable Vec<f32> buffers - thread_local! SCRATCH for zero-allocation hot paths - residual_norm_squared_no_alloc() and weighted_residual_energy_no_alloc() - 7 new tests for allocation-free energy computation WGSL Vec4 Optimization (compute_residuals.wgsl): - vec4-based processing loop with dot(r_vec, r_vec) - store_residuals flag in GpuParams struct - ~4x GPU throughput improvement README Updates: - Root README: 40 attention mechanisms, Prime-Radiant section, CGT Sheaf Attention - WASM README: CGT Sheaf Attention API documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: SEO optimize package metadata for crates.io and npm - prime-radiant: Enhanced description, keywords, categories - ruvector-attention-wasm: Add version to path dep, SEO keywords - package.json: 23 keywords, better description, engines config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(hyperbolic-hnsw): SEO optimize for crates.io publish * chore(prime-radiant): add version numbers to path dependencies for crates.io publish * fix(prime-radiant): shorten keyword for crates.io compliance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(readme): add prime-radiant and ruvector-attention-wasm package references - Add prime-radiant to Quantum Coherence section (sheaf Laplacian AI safety) - Add ruvector-attention-wasm to npm WASM packages (Flash, MoE, Hyperbolic, CGT) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Reuven <cohen@ruv-mac-mini.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 21:27:27 -05:00

Author

SHA1

Message

Date

rUv

f0f33625f3

feat(prime-radiant): Universal Coherence Engine with Sheaf Laplacian AI Safety (#131 )

* docs(coherence-engine): add ADR-014 and DDD for sheaf Laplacian coherence engine

Add comprehensive architecture documentation for ruvector-coherence crate:

- ADR-014: Sheaf Laplacian-based coherence witnessing architecture
  - Universal coherence object with domain-agnostic interpretation
  - 5-layer architecture (Application → Gate → Computation → Governance → Storage)
  - 4-tier compute ladder (Reflex → Retrieval → Heavy → Human)
  - Full ruvector ecosystem integration (10+ crates)
  - 15 internal architectural decisions

- DDD: Domain-Driven Design with 10 bounded contexts
  - Tile Fabric (cognitum-gate-kernel)
  - Adaptive Learning (sona)
  - Neural Gating (ruvector-nervous-system)
  - Learned Restriction Maps (ruvector-gnn)
  - Hyperbolic Coherence (ruvector-hyperbolic-hnsw)
  - Incoherence Isolation (ruvector-mincut)
  - Attention-Weighted Coherence (ruvector-attention)
  - Distributed Consensus (ruvector-raft)

Key concept: "This is not prediction. It is a continuously updated field
of coherence that shows where action is safe and where action must stop."

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement sheaf Laplacian coherence engine

Implement the complete Prime-Radiant crate based on ADR-014:

Core Modules:
- substrate/: SheafGraph, SheafNode, SheafEdge, RestrictionMap (SIMD-optimized)
- coherence/: CoherenceEngine, energy computation, spectral drift detection
- governance/: PolicyBundle, WitnessRecord, LineageRecord (Blake3 hashing)
- execution/: CoherenceGate, ComputeLane, ActionExecutor

Ecosystem Integrations (feature-gated):
- tiles/: cognitum-gate-kernel 256-tile WASM fabric adapter
- sona_tuning/: Adaptive threshold learning with EWC++
- neural_gate/: Biologically-inspired gating with HDC encoding
- learned_rho/: GNN-based learned restriction maps
- attention/: Topology-gated attention, MoE routing, PDE diffusion
- distributed/: Raft-based multi-node coherence

Testing:
- 138 tests (integration, property-based, chaos)
- 8 benchmarks covering ADR-014 performance targets

Stats: 91 files, ~30K lines of Rust code

"This is not prediction. It is a continuously updated field of coherence
that shows where action is safe and where action must stop."

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add RuvLLM integration to ADR-014 v0.4

- Add coherence-gated LLM inference architecture diagram
- Add 5 integration modules with code examples:
  - SheafCoherenceValidator (replaces heuristic scoring)
  - UnifiedWitnessLog (merged audit trail)
  - PatternToRestrictionBridge (ReasoningBank → learned ρ)
  - MemoryCoherenceLayer (context as sheaf nodes)
  - CoherenceConfidence (energy → confidence mapping)
- Add 7 integration ADRs (ADR-CE-016 through ADR-CE-022)
- Add ruvllm to crate integration matrix and dependencies
- Add 4 LLM-specific benefits to consequences
- Add ruvllm feature flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add 22 coherence engine internal ADRs

Create detailed ADR files for all internal coherence engine decisions:

Core Architecture (ADR-CE-001 to ADR-CE-008):
- 001: Sheaf Laplacian defines coherence witness
- 002: Incremental computation with stored residuals
- 003: PostgreSQL + ruvector hybrid storage
- 004: Signed event log with deterministic replay
- 005: First-class governance objects
- 006: Coherence gate controls compute ladder
- 007: Thresholds auto-tuned from traces
- 008: Multi-tenant isolation boundaries

Universal Coherence (ADR-CE-009 to ADR-CE-015):
- 009: Single coherence object (one math, many interpretations)
- 010: Domain-agnostic nodes and edges
- 011: Residual = contradiction energy
- 012: Gate = refusal mechanism with witness
- 013: Not prediction (coherence field, not forecasting)
- 014: Reflex lane default (most ops stay fast)
- 015: Adapt without losing control

RuvLLM Integration (ADR-CE-016 to ADR-CE-022):
- 016: CoherenceValidator uses sheaf energy
- 017: Unified audit trail (WitnessLog + governance)
- 018: Pattern-to-restriction bridge (ReasoningBank)
- 019: Memory as nodes (agentic, working, episodic)
- 020: Confidence from energy (sigmoid mapping)
- 021: Shared SONA between ruvllm and prime-radiant
- 022: Failure learning (ErrorPatternLearner → ρ maps)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement RuvLLM integration layer (ADR-014 v0.4)

Implement complete Prime-Radiant + RuvLLM integration per ADR-CE-016 through ADR-CE-022:

Core Integration Modules:
- coherence_validator.rs: SheafCoherenceValidator using sheaf energy
- witness_log.rs: UnifiedWitnessLog with hash chain for tamper evidence
- pattern_bridge.rs: PatternToRestrictionBridge learning from verdicts
- memory_layer.rs: MemoryCoherenceLayer tracking context as sheaf nodes
- confidence.rs: CoherenceConfidence with sigmoid energy→confidence mapping

Supporting Infrastructure:
- mod.rs: Public API, re-exports, convenience constructors
- error.rs: Comprehensive error types for each ADR
- config.rs: LlmCoherenceConfig, thresholds, policies
- gate.rs: LlmCoherenceGate high-level interface
- adapter.rs: RuvLlmAdapter bridging type systems
- bridge.rs: PolicyBridge, SonaBridge for synchronization
- witness.rs: WitnessAdapter for correlation
- traits.rs: Trait definitions for loose coupling

Testing:
- 22 integration tests covering all modules
- Self-contained mock implementations
- Feature-gated with #[cfg(feature = "ruvllm")]

Feature Flags:
- ruvllm feature in Cargo.toml
- Optional dependency on ruvllm crate
- Added to "full" feature set

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(prime-radiant): add comprehensive README with examples

Add user-friendly documentation covering:
- Introduction explaining coherence vs confidence
- Core concepts (coherence field, compute ladder)
- Features overview (engine, governance, RuvLLM integration)
- Quick start code examples:
  - Basic coherence check
  - LLM response validation
  - Memory consistency tracking
  - Confidence from energy
- Application tiers (today, near-term, future)
- Domain examples (AI, finance, medical, robotics, security)
- Feature flags reference
- Performance targets
- Architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(adr): add ADR-015 Coherence-Gated Transformer (Sheaf Attention)

Propose novel low-latency transformer architecture using coherence energy:

Core Innovation:
- Route tokens to compute lanes based on coherence energy, not confidence
- Sparse attention using residual energy (skip coherent pairs)
- Early exit when energy converges (not confidence threshold)
- Restriction maps replace QKV projections

Architecture:
- Lane 0 (Reflex): 1-2 layers, local attention, <0.1ms
- Lane 1 (Standard): 6 layers, sparse sheaf attention, ~1ms
- Lane 2 (Deep): 12+ layers, full + MoE, ~5ms
- Lane 3 (Escalate): Return uncertainty

Performance Targets:
- 5-10x latency reduction (10ms → 1-2ms for 128 tokens)
- 2.5x memory reduction
- <5% quality degradation
- Provable coherence bound on output

Mathematical Foundation:
- Attention weight ∝ exp(-β × residual_energy)
- Token routing via E(t) = Σ w_e ||ρ_t(x) - ρ_ctx(x)||²
- Early exit when ΔE < ε (energy converged)

Target: ruvector-attention crate with sheaf/ and coherence_gated/ modules

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): implement coherence engine with CGT attention

Complete implementation of Prime-Radiant coherence engine and
Coherence-Gated Transformer (CGT) sheaf attention module.

Core Features:
- Sheaf Laplacian energy computation with restriction maps
- 4-lane compute ladder (Reflex/Retrieval/Heavy/Human)
- Cryptographic witness chains for audit trails
- Policy bundles with multi-party approval

Storage Backends:
- InMemoryStorage with KNN search
- FileStorage with Write-Ahead Logging (WAL)
- PostgresStorage with full schema (feature-gated)
- HybridStorage combining file + optional PostgreSQL

CGT Sheaf Attention (ruvector-attention):
- RestrictionMap with residual/energy computation
- SheafAttention layer: A_ij = exp(-β×E_ij)/Z
- TokenRouter with compute lane routing
- SparseResidualAttention with energy-based masking
- EarlyExit with energy convergence detection

Performance Optimizations:
- Zero-allocation hot paths (apply_into, compute_residual_norm_sq)
- SIMD-friendly 4-way unrolled loops
- Branchless lane routing
- Pre-allocated buffers for batch operations

RuvLLM Integration:
- SheafCoherenceValidator for LLM response validation
- UnifiedWitnessLog linking inference + coherence
- MemoryCoherenceLayer for contradiction detection
- CoherenceConfidence for interpretable uncertainty

Tests: 202 passing in ruvector-attention, 180+ in prime-radiant

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(prime-radiant): add GPU acceleration, SIMD optimizations, and benchmarks

GPU Acceleration (wgpu-rs):
- GpuCoherenceEngine with automatic CPU fallback
- GpuDevice: adapter/device management with high-perf selection
- GpuDispatcher: kernel execution with pipeline caching and buffer pooling
- GpuBufferManager: typed buffer management with pooling
- Compute kernels: residuals, energy reduction, sheaf attention, token routing

WGSL Compute Shaders (6 files, 1,412 lines):
- compute_residuals.wgsl: parallel edge residual computation
- compute_energy.wgsl: two-phase parallel reduction
- sheaf_attention.wgsl: energy-based attention weights A_ij = exp(-beta * E_ij)
- token_routing.wgsl: branchless lane assignment
- sparse_mask.wgsl: sparse attention mask generation
- types.wgsl: shared GPU struct definitions

SIMD Optimizations (wide crate):
- Runtime CPU feature detection (AVX2, AVX-512, SSE4.2, NEON)
- f32x8 vectorized operations
- simd/vectors.rs: dot_product_simd, norm_squared_simd, subtract_simd
- simd/matrix.rs: matmul_simd, matvec_simd, transpose_simd
- simd/energy.rs: batch_residuals_simd, weighted_energy_sum_simd
- 38 unit tests verifying SIMD correctness

Benchmarks (criterion):
- coherence_benchmarks.rs: core operations, graph scaling
- simd_benchmarks.rs: SIMD vs naive comparisons
- gpu_benchmarks.rs: CPU vs GPU performance

Tests:
- 18 GPU coherence tests (16 active, 2 perf ignored)
- GPU-CPU consistency within 1% relative error
- Error handling and fallback verification

README improvements:
- "What Prime-Radiant is NOT" section
- Concrete numeric example with arithmetic
- Flagship LLM hallucination refusal walkthrough
- Infrastructure positioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* perf(prime-radiant): optimize SIMD and core computation patterns

SIMD Optimizations:
- Replace element-by-element load_f32x8 with try_into for direct memory copy
- Fix redundant SIMD comparisons in lane assignment (compute masks once, use blend)
- Apply across vectors.rs, matrix.rs, and energy.rs

Core Computation Patterns:
- Replace i % 4 modulo with chunks_exact() for proper auto-vectorization
- Fix edge.rs: residual_norm_squared, residual_with_energy
- Fix node.rs: norm_squared, dot product

Graph API:
- Add get_node_ref() for zero-copy node access via DashMap reference
- Add with_node() closure API for efficient read-only operations

Benchmark findings:
- Incremental updates meet target (<100us): 59us actual
- Linear O(n) scaling confirmed
- Further SIMD/parallelization needed for <1us/edge target

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* perf(prime-radiant): add CSR sparse matrix, GPU buffer prealloc, thread-local scratch

Performance optimizations for Prime-Radiant coherence engine:

CSR Sparse Matrix (restriction.rs):
- Full CsrMatrix struct with row_ptr, col_indices, values
- COO to CSR conversion with from_coo() and from_coo_arrays()
- Zero-allocation matvec_into() and matvec_add_into()
- SIMD-friendly 4-element loop unrolling
- 13 new tests covering all CSR operations

GPU Buffer Pre-allocation (engine.rs, kernels.rs):
- Pre-allocated params, energy_params, partial_sums, staging buffers
- Zero per-frame allocations in compute_energy()
- New create_bind_group_raw() methods for raw buffer references
- CSR matrix support in convert_restriction_map()

Thread-Local Scratch Buffers (edge.rs):
- EdgeScratch struct with 3 reusable Vec<f32> buffers
- thread_local! SCRATCH for zero-allocation hot paths
- residual_norm_squared_no_alloc() and weighted_residual_energy_no_alloc()
- 7 new tests for allocation-free energy computation

WGSL Vec4 Optimization (compute_residuals.wgsl):
- vec4-based processing loop with dot(r_vec, r_vec)
- store_residuals flag in GpuParams struct
- ~4x GPU throughput improvement

README Updates:
- Root README: 40 attention mechanisms, Prime-Radiant section, CGT Sheaf Attention
- WASM README: CGT Sheaf Attention API documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: SEO optimize package metadata for crates.io and npm

- prime-radiant: Enhanced description, keywords, categories
- ruvector-attention-wasm: Add version to path dep, SEO keywords
- package.json: 23 keywords, better description, engines config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore(hyperbolic-hnsw): SEO optimize for crates.io publish

* chore(prime-radiant): add version numbers to path dependencies for crates.io publish

* fix(prime-radiant): shorten keyword for crates.io compliance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs(readme): add prime-radiant and ruvector-attention-wasm package references

- Add prime-radiant to Quantum Coherence section (sheaf Laplacian AI safety)
- Add ruvector-attention-wasm to npm WASM packages (Flash, MoE, Hyperbolic, CGT)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-22 21:27:27 -05:00

1 commit