mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 04:27:11 +00:00
* docs(sparse-attn): plain-language README intro, SEO, and tutorial gist - Rewrite README opening for non-experts: what it is, why it matters, who it's for, what it is NOT. Adds a Table of Contents and an FAQ. - Document the new FastGRNN-gated near-linear path with a measured scaling table and runnable example pointer. - Add SEO-friendly keyword block at the bottom (rust llm inference, sparse attention rust, near-linear attention, edge ai rust, raspberry pi llm, gguf rust, mistral / llama / smollm2 / phi-2). - New docs/TUTORIAL.md walks through the full pipeline end-to-end (Cargo.toml → forward → KvCache decode → FP16 KV → FastGRNN gate → cross-compile to Pi). Published as https://gist.github.com/ruvnet/790214c832928d6f2ec7ebe593bb3def Co-Authored-By: claude-flow <ruv@ruv.net> * chore(sparse-attn): add crates.io metadata for v0.1.0 publish - repository, documentation, homepage URLs - keywords (llm, attention, transformer, inference, edge) - categories (algorithms, science, mathematics) - expanded description mentioning subquadratic + FastGRNN near-linear - rust-version = 1.77 (matches workspace MSRV) Published v0.1.0 to crates.io: https://crates.io/crates/ruvllm_sparse_attention Co-Authored-By: claude-flow <ruv@ruv.net> * feat(sparse-attn): FastGRNN salience gate + forward_gated for near-linear scale Adds a recurrent O(N · D_h²) FastGRNN pass that produces a per-token salience score, then prunes the sparse-attention candidate set against that score. Combined cost is O(N · (D_h² + W + G + K_keep + dim)), linear in seq when the gate budget K_keep is constant. New module `fastgrnn_gate`: - FastGrnnGate cell (matches cognitum-agent's sparse_fastgrnn math so weights round-trip via from_weights / score_sequence) - score_sequence / score_kv: per-position salience over a sequence - keep_mask_quantile / keep_mask_top_k: turn salience into a binary keep-mask the attention candidate selector consumes - step_with_hidden: streaming variant for online inference New methods on SubquadraticSparseAttention: - forward_gated(q, k, v, keep_mask) — drops below-threshold tokens from the long-range candidate set; window + globals + current are always retained (causality preservation) - forward_gated_with_fastgrnn(q, k, v, gate, top_k) — convenience wrapper that does FastGRNN scoring + top-K masking + gated forward Tests (5 new + 8 gate tests, all passing alongside 25 baseline): - all-true mask is bit-identical to plain forward - all-false mask preserves window + globals + current, output finite - wrong mask length returns InvalidConfig - smaller top_k provably reduces total candidate count - end-to-end FastGRNN-driven path produces finite output Scaling demo (examples/fastgrnn_gated_scaling.rs): seq | ungated/N | gated/N | growth ratio ----|-----------|---------|------------- 128 | 0.0021 | 0.0029 | 2048| 0.0029 | 0.0036 | ungated grows ~1.38× over 16× seq (log-linear); gated grows ~1.24× over 16× seq (sub-logarithmic, near-linear). Zero new runtime dependencies (ADR-183 invariant preserved). Co-Authored-By: claude-flow <ruv@ruv.net> * feat(sparse-attn): no_std + alloc support, ESP32-S3 cross-compile verified ADR-192 implementation. Crate is now no_std + alloc behind a default-on `std` feature (purely additive — std consumers see zero behavioural change). Changes: - lib.rs: #![cfg_attr(not(feature = "std"), no_std)] + extern crate alloc - F32Ext trait restores .exp/.sqrt/.tanh/.powi method syntax via libm in no_std mode; std mode uses inherent f32 methods unchanged - attention.rs / fastgrnn_gate.rs / tensor.rs: replace std:: with core:: and alloc:: imports; HashSet → BTreeSet (no hashing in no_std) - Error trait impl gated on std (core::error::Error needs MSRV bump) - Cargo.toml: std default-on, parallel = ["std", "rayon"], libm always-on Verified: - cargo test --lib 38/38 pass - cargo build --no-default-features clean - cargo build --no-default-features --features fp16 clean - cargo +esp build --target xtensa-esp32s3-none-elf 1.02s release, 376 KB rlib - examples/esp32s3_smoke runs natively all checks passed Tested against attached hardware: ESP32-S3 v0.2, MAC ac:a7:04:e2:66:24, 16 MB flash, on /dev/ttyACM0 (USB-Serial-JTAG). Bump version 0.1.0 → 0.1.1 (patch — additive). Adds "no-std" to crates.io categories. Adds libm 0.2 as always-on dep (~60 KB, pure Rust). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-191 Pi Zero 2W production hardening for ruvllm_sparse_attention Proposes four additive changes to the sparse-attention crate based on production data from the cognitum-agent deployment on cognitum-v0 (Pi Zero 2W, SmolLM2-135M Q4_0, cognitum-one/seed PR #133): 1. decode_step_with_deadline / decode_step_f16_with_deadline / decode_batch_with_deadline — sub-step wall-clock deadline so integrators can bound latency at finer granularity than per-token. Returns AttentionError::DeadlineExceeded { elapsed_ms, checkpoint }. 2. SparseAttentionConfig::pi_zero_2w() — codify the empirically validated window=64, tile=16, FP16 KV preset that cognitum-agent currently records as a Cargo.toml comment. 3. SubquadraticSparseAttention::warm_up() — synthetic 1-token decode to prime caches and shrink the measured 99 s → 56 s cold→warm gap before the first user inference. 4. Stochastic Q4 dequant pass-through for KV cache reload (feature-gated, off by default). Reuses the splitmix64 seeding pattern from cognitum-agent commit 1675c20 — naive `seed | 1` xorshift collapses adjacent seeds 42 and 43 to the same state, an outright bug. Status: proposed. Test plan covers correctness (deadline does not perturb output), unbiasedness (mean within 0.06 of deterministic over 256 trials), and a cluster bench comparing pre/post cold first-decode latency on cognitum-v0. Co-Authored-By: claude-flow <ruv@ruv.net> * style(sparse-attn): cargo fmt over crate sources after no_std refactor Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: ruvnet <ruvnet@gmail.com> |
||
|---|---|---|
| .. | ||
| adr | ||
| analysis | ||
| api | ||
| architecture | ||
| benchmarks | ||
| cloud-architecture | ||
| cnn | ||
| code-reviews | ||
| dag | ||
| development | ||
| examples | ||
| gnn | ||
| guides | ||
| hailo | ||
| hnsw | ||
| hooks | ||
| implementation | ||
| integration | ||
| nervous-system | ||
| optimization | ||
| plans/subpolynomial-time-mincut | ||
| postgres | ||
| project-phases | ||
| publishing | ||
| research | ||
| reviews | ||
| ruvllm | ||
| rvagent | ||
| sdk | ||
| security | ||
| sparse-inference | ||
| sql | ||
| testing | ||
| training | ||
| .gitkeep | ||
| .nojekyll | ||
| agi-container.md | ||
| C2-shell-execution-hardening.md | ||
| C8_RESULT_VALIDATION_IMPLEMENTATION.md | ||
| consciousness-api.md | ||
| IMPLEMENTATION-C5.md | ||
| index.html | ||
| INDEX.md | ||
| moe-routing-optimization-analysis.md | ||
| README.md | ||
| REPO_STRUCTURE.md | ||
| research-openfang.md | ||
RuVector Documentation
Complete documentation for RuVector, the high-performance Rust vector database with global scale capabilities.
📚 Documentation Structure
docs/
├── adr/ # Architecture Decision Records
├── analysis/ # Research & analysis docs
├── api/ # API references (Rust, Node.js, Cypher)
├── architecture/ # System design docs
├── benchmarks/ # Performance benchmarks & results
├── cloud-architecture/ # Cloud deployment guides
├── code-reviews/ # Code review documentation
├── dag/ # DAG implementation
├── development/ # Developer guides
├── examples/ # SQL examples
├── gnn/ # GNN/Graph implementation
├── guides/ # User guides & tutorials
├── hnsw/ # HNSW index documentation
├── hooks/ # Hooks system documentation
├── implementation/ # Implementation details & summaries
├── integration/ # Integration guides
├── nervous-system/ # Nervous system architecture
├── optimization/ # Performance optimization guides
├── plans/ # Implementation plans
├── postgres/ # PostgreSQL extension docs
├── project-phases/ # Development phases
├── publishing/ # NPM publishing guides
├── research/ # Research documentation
├── ruvllm/ # RuVLLM documentation
├── security/ # Security audits & reports
├── sparse-inference/ # Sparse inference docs
├── sql/ # SQL examples
├── testing/ # Testing documentation
└── training/ # Training & LoRA docs
Getting Started
- guides/GETTING_STARTED.md - Getting started guide
- guides/BASIC_TUTORIAL.md - Basic tutorial
- guides/INSTALLATION.md - Installation instructions
- guides/AGENTICDB_QUICKSTART.md - AgenticDB quick start
- guides/wasm-api.md - WebAssembly API documentation
Architecture & Design
- architecture/ - System architecture details
- cloud-architecture/ - Global cloud deployment
- adr/ - Architecture Decision Records
- nervous-system/ - Nervous system architecture
API Reference
- api/RUST_API.md - Rust API reference
- api/NODEJS_API.md - Node.js API reference
- api/CYPHER_REFERENCE.md - Cypher query reference
Performance & Benchmarks
- benchmarks/ - Performance benchmarks & results
- optimization/ - Performance optimization guides
- analysis/ - Research & analysis docs
Security
- security/ - Security audits & reports
Implementation
- implementation/ - Implementation details & summaries
- integration/ - Integration guides
- code-reviews/ - Code review documentation
Specialized Topics
- gnn/ - GNN/Graph implementation
- hnsw/ - HNSW index documentation
- postgres/ - PostgreSQL extension docs
- ruvllm/ - RuVLLM documentation
- training/ - Training & LoRA docs
Development
- development/CONTRIBUTING.md - Contribution guidelines
- development/MIGRATION.md - Migration guide
- testing/ - Testing documentation
- publishing/ - NPM publishing guides
Research
- research/ - Research documentation
- cognitive-frontier/ - Cognitive frontier research
- gnn-v2/ - GNN v2 research
- latent-space/ - HNSW & attention research
- mincut/ - MinCut algorithm research
🚀 Quick Links
For New Users
- Start with Getting Started Guide
- Try the Basic Tutorial
- Review API Documentation
For Cloud Deployment
- Read Architecture Overview
- Follow Deployment Guide
- Apply Performance Optimizations
For Contributors
- Read Contributing Guidelines
- Review Architecture Decisions
- Check Migration Guide
For Performance Tuning
- Review Optimization Guide
- Run Benchmarks
- Check Analysis
📊 Documentation Status
| Category | Directory | Status |
|---|---|---|
| Getting Started | guides/ | ✅ Complete |
| Architecture | architecture/, adr/ | ✅ Complete |
| API Reference | api/ | ✅ Complete |
| Performance | benchmarks/, optimization/, analysis/ | ✅ Complete |
| Security | security/ | ✅ Complete |
| Implementation | implementation/, integration/ | ✅ Complete |
| Development | development/, testing/ | ✅ Complete |
| Research | research/ | 📚 Ongoing |
Total Documentation: 460+ documents across 60+ directories
🔗 External Resources
- GitHub Repository: https://github.com/ruvnet/ruvector
- Main README: ../README.md
- Changelog: ../CHANGELOG.md
- License: ../LICENSE
Last Updated: 2026-02-26 | Version: 2.0.4 (core) / 0.1.100 (npm) | Status: Production Ready