ruvector/docs
rUv 9d8006ae26
ruvllm_sparse_attention v0.1.1 — FastGRNN-gated near-linear attention + no_std/ESP32-S3 + ADR-191/192 (#429)
* docs(sparse-attn): plain-language README intro, SEO, and tutorial gist

- Rewrite README opening for non-experts: what it is, why it matters,
  who it's for, what it is NOT. Adds a Table of Contents and an FAQ.
- Document the new FastGRNN-gated near-linear path with a measured
  scaling table and runnable example pointer.
- Add SEO-friendly keyword block at the bottom (rust llm inference,
  sparse attention rust, near-linear attention, edge ai rust,
  raspberry pi llm, gguf rust, mistral / llama / smollm2 / phi-2).
- New docs/TUTORIAL.md walks through the full pipeline end-to-end
  (Cargo.toml → forward → KvCache decode → FP16 KV → FastGRNN gate
  → cross-compile to Pi). Published as
  https://gist.github.com/ruvnet/790214c832928d6f2ec7ebe593bb3def

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore(sparse-attn): add crates.io metadata for v0.1.0 publish

- repository, documentation, homepage URLs
- keywords (llm, attention, transformer, inference, edge)
- categories (algorithms, science, mathematics)
- expanded description mentioning subquadratic + FastGRNN near-linear
- rust-version = 1.77 (matches workspace MSRV)

Published v0.1.0 to crates.io: https://crates.io/crates/ruvllm_sparse_attention

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(sparse-attn): FastGRNN salience gate + forward_gated for near-linear scale

Adds a recurrent O(N · D_h²) FastGRNN pass that produces a per-token
salience score, then prunes the sparse-attention candidate set against
that score. Combined cost is O(N · (D_h² + W + G + K_keep + dim)),
linear in seq when the gate budget K_keep is constant.

New module `fastgrnn_gate`:
  - FastGrnnGate cell (matches cognitum-agent's sparse_fastgrnn math
    so weights round-trip via from_weights / score_sequence)
  - score_sequence / score_kv: per-position salience over a sequence
  - keep_mask_quantile / keep_mask_top_k: turn salience into a binary
    keep-mask the attention candidate selector consumes
  - step_with_hidden: streaming variant for online inference

New methods on SubquadraticSparseAttention:
  - forward_gated(q, k, v, keep_mask) — drops below-threshold tokens
    from the long-range candidate set; window + globals + current
    are always retained (causality preservation)
  - forward_gated_with_fastgrnn(q, k, v, gate, top_k) — convenience
    wrapper that does FastGRNN scoring + top-K masking + gated forward

Tests (5 new + 8 gate tests, all passing alongside 25 baseline):
  - all-true mask is bit-identical to plain forward
  - all-false mask preserves window + globals + current, output finite
  - wrong mask length returns InvalidConfig
  - smaller top_k provably reduces total candidate count
  - end-to-end FastGRNN-driven path produces finite output

Scaling demo (examples/fastgrnn_gated_scaling.rs):
  seq | ungated/N | gated/N | growth ratio
  ----|-----------|---------|-------------
  128 |   0.0021  |  0.0029 |
  2048|   0.0029  |  0.0036 |
  ungated grows ~1.38× over 16× seq (log-linear);
  gated grows ~1.24× over 16× seq (sub-logarithmic, near-linear).

Zero new runtime dependencies (ADR-183 invariant preserved).

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(sparse-attn): no_std + alloc support, ESP32-S3 cross-compile verified

ADR-192 implementation. Crate is now no_std + alloc behind a default-on
`std` feature (purely additive — std consumers see zero behavioural change).

Changes:
- lib.rs: #![cfg_attr(not(feature = "std"), no_std)] + extern crate alloc
- F32Ext trait restores .exp/.sqrt/.tanh/.powi method syntax via libm
  in no_std mode; std mode uses inherent f32 methods unchanged
- attention.rs / fastgrnn_gate.rs / tensor.rs: replace std:: with
  core:: and alloc:: imports; HashSet → BTreeSet (no hashing in no_std)
- Error trait impl gated on std (core::error::Error needs MSRV bump)
- Cargo.toml: std default-on, parallel = ["std", "rayon"], libm always-on

Verified:
- cargo test --lib                                   38/38 pass
- cargo build --no-default-features                  clean
- cargo build --no-default-features --features fp16  clean
- cargo +esp build --target xtensa-esp32s3-none-elf  1.02s release,
                                                     376 KB rlib
- examples/esp32s3_smoke runs natively               all checks passed

Tested against attached hardware: ESP32-S3 v0.2, MAC ac:a7:04:e2:66:24,
16 MB flash, on /dev/ttyACM0 (USB-Serial-JTAG).

Bump version 0.1.0 → 0.1.1 (patch — additive). Adds "no-std" to crates.io
categories. Adds libm 0.2 as always-on dep (~60 KB, pure Rust).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-191 Pi Zero 2W production hardening for ruvllm_sparse_attention

Proposes four additive changes to the sparse-attention crate based on
production data from the cognitum-agent deployment on cognitum-v0
(Pi Zero 2W, SmolLM2-135M Q4_0, cognitum-one/seed PR #133):

1. decode_step_with_deadline / decode_step_f16_with_deadline /
   decode_batch_with_deadline — sub-step wall-clock deadline so
   integrators can bound latency at finer granularity than per-token.
   Returns AttentionError::DeadlineExceeded { elapsed_ms, checkpoint }.

2. SparseAttentionConfig::pi_zero_2w() — codify the empirically
   validated window=64, tile=16, FP16 KV preset that cognitum-agent
   currently records as a Cargo.toml comment.

3. SubquadraticSparseAttention::warm_up() — synthetic 1-token decode
   to prime caches and shrink the measured 99 s → 56 s cold→warm gap
   before the first user inference.

4. Stochastic Q4 dequant pass-through for KV cache reload (feature-gated,
   off by default). Reuses the splitmix64 seeding pattern from
   cognitum-agent commit 1675c20 — naive `seed | 1` xorshift collapses
   adjacent seeds 42 and 43 to the same state, an outright bug.

Status: proposed. Test plan covers correctness (deadline does not
perturb output), unbiasedness (mean within 0.06 of deterministic over
256 trials), and a cluster bench comparing pre/post cold first-decode
latency on cognitum-v0.

Co-Authored-By: claude-flow <ruv@ruv.net>

* style(sparse-attn): cargo fmt over crate sources after no_std refactor

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: ruvnet <ruvnet@gmail.com>
2026-05-07 11:14:16 -04:00
..
adr ruvllm_sparse_attention v0.1.1 — FastGRNN-gated near-linear attention + no_std/ESP32-S3 + ADR-191/192 (#429) 2026-05-07 11:14:16 -04:00
analysis fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
api fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
architecture fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
benchmarks fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
cloud-architecture fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
cnn feat(demo): add Self-Learning tab with 6 interactive training demos 2026-03-11 19:31:23 -04:00
code-reviews docs: reorganize into subfolders 2026-01-21 23:43:50 -05:00
dag docs(dag): add comprehensive Neural DAG Learning implementation plan 2025-12-29 22:15:55 +00:00
development feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
examples feat(musica): structure-first audio separation via dynamic mincut (#337) 2026-04-08 12:23:48 -05:00
gnn fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
guides docs: add missing capabilities to advanced features guide 2026-02-26 16:09:06 +00:00
hailo feat(ruvector-hailo): NPU embedding backend + multi-Pi cluster (ADRs 167-170) (#413) 2026-05-04 08:30:40 -04:00
hnsw fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
hooks feat(cli): Implement full hooks system in Rust CLI 2025-12-27 01:08:36 +00:00
implementation fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
integration fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
nervous-system docs: reorganize into subfolders 2026-01-21 23:43:50 -05:00
optimization fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
plans/subpolynomial-time-mincut chore(docs): Clean up and reorganize documentation structure 2025-12-25 19:39:44 +00:00
postgres fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
project-phases Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
publishing fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
research research(nightly): ACORN — predicate-agnostic filtered HNSW (#391) 2026-04-27 00:29:37 -04:00
reviews perf(ruvllm): optimize MoE routing with buffer reuse and optional metrics 2026-03-12 23:27:00 -04:00
ruvllm docs: reorganize into subfolders 2026-01-21 23:43:50 -05:00
rvagent feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
sdk docs(sdk): add deep planning review for ruvector Python SDK 2026-04-25 20:28:54 -04:00
security feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
sparse-inference feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
sql feat(postgres): Add ruvector-postgres extension with SIMD optimizations (#42) 2025-12-02 09:55:07 -05:00
testing Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
training fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
.gitkeep Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
.nojekyll fix: add .nojekyll to disable Jekyll processing 2026-03-11 17:53:19 -04:00
agi-container.md feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
C2-shell-execution-hardening.md feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
C8_RESULT_VALIDATION_IMPLEMENTATION.md feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
consciousness-api.md feat(consciousness): SOTA IIT Φ, causal emergence, quantum collapse crate (ADR-131) 2026-03-31 16:36:25 -04:00
IMPLEMENTATION-C5.md feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
index.html refactor: move CNN demo to docs/cnn/ for shorter URL 2026-03-11 17:52:13 -04:00
INDEX.md fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
moe-routing-optimization-analysis.md perf(ruvllm): optimize MoE routing with buffer reuse and optional metrics 2026-03-12 23:27:00 -04:00
README.md fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
REPO_STRUCTURE.md fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
research-openfang.md Add OpenFang project research document 2026-02-26 14:14:58 +00:00

RuVector Documentation

Complete documentation for RuVector, the high-performance Rust vector database with global scale capabilities.

📚 Documentation Structure

docs/
├── adr/                    # Architecture Decision Records
├── analysis/               # Research & analysis docs
├── api/                    # API references (Rust, Node.js, Cypher)
├── architecture/           # System design docs
├── benchmarks/             # Performance benchmarks & results
├── cloud-architecture/     # Cloud deployment guides
├── code-reviews/           # Code review documentation
├── dag/                    # DAG implementation
├── development/            # Developer guides
├── examples/               # SQL examples
├── gnn/                    # GNN/Graph implementation
├── guides/                 # User guides & tutorials
├── hnsw/                   # HNSW index documentation
├── hooks/                  # Hooks system documentation
├── implementation/         # Implementation details & summaries
├── integration/            # Integration guides
├── nervous-system/         # Nervous system architecture
├── optimization/           # Performance optimization guides
├── plans/                  # Implementation plans
├── postgres/               # PostgreSQL extension docs
├── project-phases/         # Development phases
├── publishing/             # NPM publishing guides
├── research/               # Research documentation
├── ruvllm/                 # RuVLLM documentation
├── security/               # Security audits & reports
├── sparse-inference/       # Sparse inference docs
├── sql/                    # SQL examples
├── testing/                # Testing documentation
└── training/               # Training & LoRA docs

Getting Started

Architecture & Design

API Reference

Performance & Benchmarks

Security

Implementation

Specialized Topics

Development

Research

  • research/ - Research documentation
    • cognitive-frontier/ - Cognitive frontier research
    • gnn-v2/ - GNN v2 research
    • latent-space/ - HNSW & attention research
    • mincut/ - MinCut algorithm research

For New Users

  1. Start with Getting Started Guide
  2. Try the Basic Tutorial
  3. Review API Documentation

For Cloud Deployment

  1. Read Architecture Overview
  2. Follow Deployment Guide
  3. Apply Performance Optimizations

For Contributors

  1. Read Contributing Guidelines
  2. Review Architecture Decisions
  3. Check Migration Guide

For Performance Tuning

  1. Review Optimization Guide
  2. Run Benchmarks
  3. Check Analysis

📊 Documentation Status

Category Directory Status
Getting Started guides/ Complete
Architecture architecture/, adr/ Complete
API Reference api/ Complete
Performance benchmarks/, optimization/, analysis/ Complete
Security security/ Complete
Implementation implementation/, integration/ Complete
Development development/, testing/ Complete
Research research/ 📚 Ongoing

Total Documentation: 460+ documents across 60+ directories


🔗 External Resources


Last Updated: 2026-02-26 | Version: 2.0.4 (core) / 0.1.100 (npm) | Status: Production Ready