mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-23 04:27:11 +00:00

History

rUv 9d8006ae26 ruvllm_sparse_attention v0.1.1 — FastGRNN-gated near-linear attention + no_std/ESP32-S3 + ADR-191/192 (#429 ) * docs(sparse-attn): plain-language README intro, SEO, and tutorial gist - Rewrite README opening for non-experts: what it is, why it matters, who it's for, what it is NOT. Adds a Table of Contents and an FAQ. - Document the new FastGRNN-gated near-linear path with a measured scaling table and runnable example pointer. - Add SEO-friendly keyword block at the bottom (rust llm inference, sparse attention rust, near-linear attention, edge ai rust, raspberry pi llm, gguf rust, mistral / llama / smollm2 / phi-2). - New docs/TUTORIAL.md walks through the full pipeline end-to-end (Cargo.toml → forward → KvCache decode → FP16 KV → FastGRNN gate → cross-compile to Pi). Published as https://gist.github.com/ruvnet/790214c832928d6f2ec7ebe593bb3def Co-Authored-By: claude-flow <ruv@ruv.net> * chore(sparse-attn): add crates.io metadata for v0.1.0 publish - repository, documentation, homepage URLs - keywords (llm, attention, transformer, inference, edge) - categories (algorithms, science, mathematics) - expanded description mentioning subquadratic + FastGRNN near-linear - rust-version = 1.77 (matches workspace MSRV) Published v0.1.0 to crates.io: https://crates.io/crates/ruvllm_sparse_attention Co-Authored-By: claude-flow <ruv@ruv.net> * feat(sparse-attn): FastGRNN salience gate + forward_gated for near-linear scale Adds a recurrent O(N · D_h²) FastGRNN pass that produces a per-token salience score, then prunes the sparse-attention candidate set against that score. Combined cost is O(N · (D_h² + W + G + K_keep + dim)), linear in seq when the gate budget K_keep is constant. New module `fastgrnn_gate`: - FastGrnnGate cell (matches cognitum-agent's sparse_fastgrnn math so weights round-trip via from_weights / score_sequence) - score_sequence / score_kv: per-position salience over a sequence - keep_mask_quantile / keep_mask_top_k: turn salience into a binary keep-mask the attention candidate selector consumes - step_with_hidden: streaming variant for online inference New methods on SubquadraticSparseAttention: - forward_gated(q, k, v, keep_mask) — drops below-threshold tokens from the long-range candidate set; window + globals + current are always retained (causality preservation) - forward_gated_with_fastgrnn(q, k, v, gate, top_k) — convenience wrapper that does FastGRNN scoring + top-K masking + gated forward Tests (5 new + 8 gate tests, all passing alongside 25 baseline): - all-true mask is bit-identical to plain forward - all-false mask preserves window + globals + current, output finite - wrong mask length returns InvalidConfig - smaller top_k provably reduces total candidate count - end-to-end FastGRNN-driven path produces finite output Scaling demo (examples/fastgrnn_gated_scaling.rs): seq \| ungated/N \| gated/N \| growth ratio ----\|-----------\|---------\|------------- 128 \| 0.0021 \| 0.0029 \| 2048\| 0.0029 \| 0.0036 \| ungated grows ~1.38× over 16× seq (log-linear); gated grows ~1.24× over 16× seq (sub-logarithmic, near-linear). Zero new runtime dependencies (ADR-183 invariant preserved). Co-Authored-By: claude-flow <ruv@ruv.net> * feat(sparse-attn): no_std + alloc support, ESP32-S3 cross-compile verified ADR-192 implementation. Crate is now no_std + alloc behind a default-on `std` feature (purely additive — std consumers see zero behavioural change). Changes: - lib.rs: #![cfg_attr(not(feature = "std"), no_std)] + extern crate alloc - F32Ext trait restores .exp/.sqrt/.tanh/.powi method syntax via libm in no_std mode; std mode uses inherent f32 methods unchanged - attention.rs / fastgrnn_gate.rs / tensor.rs: replace std:: with core:: and alloc:: imports; HashSet → BTreeSet (no hashing in no_std) - Error trait impl gated on std (core::error::Error needs MSRV bump) - Cargo.toml: std default-on, parallel = ["std", "rayon"], libm always-on Verified: - cargo test --lib 38/38 pass - cargo build --no-default-features clean - cargo build --no-default-features --features fp16 clean - cargo +esp build --target xtensa-esp32s3-none-elf 1.02s release, 376 KB rlib - examples/esp32s3_smoke runs natively all checks passed Tested against attached hardware: ESP32-S3 v0.2, MAC ac:a7:04:e2:66:24, 16 MB flash, on /dev/ttyACM0 (USB-Serial-JTAG). Bump version 0.1.0 → 0.1.1 (patch — additive). Adds "no-std" to crates.io categories. Adds libm 0.2 as always-on dep (~60 KB, pure Rust). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-191 Pi Zero 2W production hardening for ruvllm_sparse_attention Proposes four additive changes to the sparse-attention crate based on production data from the cognitum-agent deployment on cognitum-v0 (Pi Zero 2W, SmolLM2-135M Q4_0, cognitum-one/seed PR #133): 1. decode_step_with_deadline / decode_step_f16_with_deadline / decode_batch_with_deadline — sub-step wall-clock deadline so integrators can bound latency at finer granularity than per-token. Returns AttentionError::DeadlineExceeded { elapsed_ms, checkpoint }. 2. SparseAttentionConfig::pi_zero_2w() — codify the empirically validated window=64, tile=16, FP16 KV preset that cognitum-agent currently records as a Cargo.toml comment. 3. SubquadraticSparseAttention::warm_up() — synthetic 1-token decode to prime caches and shrink the measured 99 s → 56 s cold→warm gap before the first user inference. 4. Stochastic Q4 dequant pass-through for KV cache reload (feature-gated, off by default). Reuses the splitmix64 seeding pattern from cognitum-agent commit 1675c20 — naive `seed \| 1` xorshift collapses adjacent seeds 42 and 43 to the same state, an outright bug. Status: proposed. Test plan covers correctness (deadline does not perturb output), unbiasedness (mean within 0.06 of deterministic over 256 trials), and a cluster bench comparing pre/post cold first-decode latency on cognitum-v0. Co-Authored-By: claude-flow <ruv@ruv.net> * style(sparse-attn): cargo fmt over crate sources after no_std refactor Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: ruvnet <ruvnet@gmail.com>		2026-05-07 11:14:16 -04:00
..
adr	ruvllm_sparse_attention v0.1.1 — FastGRNN-gated near-linear attention + no_std/ESP32-S3 + ADR-191/192 (#429 )	2026-05-07 11:14:16 -04:00
analysis	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
api	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
architecture	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
benchmarks	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
cloud-architecture	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
cnn	feat(demo): add Self-Learning tab with 6 interactive training demos	2026-03-11 19:31:23 -04:00
code-reviews	docs: reorganize into subfolders	2026-01-21 23:43:50 -05:00
dag	docs(dag): add comprehensive Neural DAG Learning implementation plan	2025-12-29 22:15:55 +00:00
development	feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40 )	2025-12-01 22:30:15 -05:00
examples	feat(musica): structure-first audio separation via dynamic mincut (#337 )	2026-04-08 12:23:48 -05:00
gnn	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
guides	docs: add missing capabilities to advanced features guide	2026-02-26 16:09:06 +00:00
hailo	feat(ruvector-hailo): NPU embedding backend + multi-Pi cluster (ADRs 167-170) (#413 )	2026-05-04 08:30:40 -04:00
hnsw	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
hooks	feat(cli): Implement full hooks system in Rust CLI	2025-12-27 01:08:36 +00:00
implementation	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
integration	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
nervous-system	docs: reorganize into subfolders	2026-01-21 23:43:50 -05:00
optimization	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
plans/subpolynomial-time-mincut	chore(docs): Clean up and reorganize documentation structure	2025-12-25 19:39:44 +00:00
postgres	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
project-phases	Clean up repository structure and organize documentation	2025-11-20 19:50:03 +00:00
publishing	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
research	research(nightly): ACORN — predicate-agnostic filtered HNSW (#391 )	2026-04-27 00:29:37 -04:00
reviews	perf(ruvllm): optimize MoE routing with buffer reuse and optional metrics	2026-03-12 23:27:00 -04:00
ruvllm	docs: reorganize into subfolders	2026-01-21 23:43:50 -05:00
rvagent	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
sdk	docs(sdk): add deep planning review for ruvector Python SDK	2026-04-25 20:28:54 -04:00
security	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
sparse-inference	feat: Add PowerInfer-style sparse inference engine with precision lanes (#106 )	2026-01-04 23:40:31 -05:00
sql	feat(postgres): Add ruvector-postgres extension with SIMD optimizations (#42 )	2025-12-02 09:55:07 -05:00
testing	Clean up repository structure and organize documentation	2025-11-20 19:50:03 +00:00
training	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
.gitkeep	Clean up repository structure and organize documentation	2025-11-20 19:50:03 +00:00
.nojekyll	fix: add .nojekyll to disable Jekyll processing	2026-03-11 17:53:19 -04:00
agi-container.md	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
C2-shell-execution-hardening.md	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
C8_RESULT_VALIDATION_IMPLEMENTATION.md	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
consciousness-api.md	feat(consciousness): SOTA IIT Φ, causal emergence, quantum collapse crate (ADR-131)	2026-03-31 16:36:25 -04:00
IMPLEMENTATION-C5.md	feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262 )	2026-03-16 09:52:32 -04:00
index.html	refactor: move CNN demo to docs/cnn/ for shorter URL	2026-03-11 17:52:13 -04:00
INDEX.md	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
moe-routing-optimization-analysis.md	perf(ruvllm): optimize MoE routing with buffer reuse and optional metrics	2026-03-12 23:27:00 -04:00
README.md	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
REPO_STRUCTURE.md	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
research-openfang.md	Add OpenFang project research document	2026-02-26 14:14:58 +00:00

README.md

RuVector Documentation

Complete documentation for RuVector, the high-performance Rust vector database with global scale capabilities.

📚 Documentation Structure

docs/
├── adr/                    # Architecture Decision Records
├── analysis/               # Research & analysis docs
├── api/                    # API references (Rust, Node.js, Cypher)
├── architecture/           # System design docs
├── benchmarks/             # Performance benchmarks & results
├── cloud-architecture/     # Cloud deployment guides
├── code-reviews/           # Code review documentation
├── dag/                    # DAG implementation
├── development/            # Developer guides
├── examples/               # SQL examples
├── gnn/                    # GNN/Graph implementation
├── guides/                 # User guides & tutorials
├── hnsw/                   # HNSW index documentation
├── hooks/                  # Hooks system documentation
├── implementation/         # Implementation details & summaries
├── integration/            # Integration guides
├── nervous-system/         # Nervous system architecture
├── optimization/           # Performance optimization guides
├── plans/                  # Implementation plans
├── postgres/               # PostgreSQL extension docs
├── project-phases/         # Development phases
├── publishing/             # NPM publishing guides
├── research/               # Research documentation
├── ruvllm/                 # RuVLLM documentation
├── security/               # Security audits & reports
├── sparse-inference/       # Sparse inference docs
├── sql/                    # SQL examples
├── testing/                # Testing documentation
└── training/               # Training & LoRA docs

Getting Started

guides/GETTING_STARTED.md - Getting started guide
guides/BASIC_TUTORIAL.md - Basic tutorial
guides/INSTALLATION.md - Installation instructions
guides/AGENTICDB_QUICKSTART.md - AgenticDB quick start
guides/wasm-api.md - WebAssembly API documentation

Architecture & Design

architecture/ - System architecture details
cloud-architecture/ - Global cloud deployment
adr/ - Architecture Decision Records
nervous-system/ - Nervous system architecture

API Reference

api/RUST_API.md - Rust API reference
api/NODEJS_API.md - Node.js API reference
api/CYPHER_REFERENCE.md - Cypher query reference

Performance & Benchmarks

benchmarks/ - Performance benchmarks & results
optimization/ - Performance optimization guides
analysis/ - Research & analysis docs

Security

security/ - Security audits & reports

Implementation

implementation/ - Implementation details & summaries
integration/ - Integration guides
code-reviews/ - Code review documentation

Specialized Topics

gnn/ - GNN/Graph implementation
hnsw/ - HNSW index documentation
postgres/ - PostgreSQL extension docs
ruvllm/ - RuVLLM documentation
training/ - Training & LoRA docs

Development

development/CONTRIBUTING.md - Contribution guidelines
development/MIGRATION.md - Migration guide
testing/ - Testing documentation
publishing/ - NPM publishing guides

Research

research/ - Research documentation
- cognitive-frontier/ - Cognitive frontier research
- gnn-v2/ - GNN v2 research
- latent-space/ - HNSW & attention research
- mincut/ - MinCut algorithm research

🚀 Quick Links

For New Users

For Cloud Deployment

For Contributors

For Performance Tuning

📊 Documentation Status

Category	Directory	Status
Getting Started	guides/	✅ Complete
Architecture	architecture/, adr/	✅ Complete
API Reference	api/	✅ Complete
Performance	benchmarks/, optimization/, analysis/	✅ Complete
Security	security/	✅ Complete
Implementation	implementation/, integration/	✅ Complete
Development	development/, testing/	✅ Complete
Research	research/	📚 Ongoing

Total Documentation: 460+ documents across 60+ directories

🔗 External Resources

GitHub Repository: https://github.com/ruvnet/ruvector
Main README: ../README.md
Changelog: ../CHANGELOG.md
License: ../LICENSE

Last Updated: 2026-02-26 | Version: 2.0.4 (core) / 0.1.100 (npm) | Status: Production Ready