ruvector/docs/research
ruvnet 3a1afa2284 feat(rulake): vector-native federation intermediary — ADR-155 + MVP crate
Implements the M1 scope of docs/research/ruLake/ as an intermediary that
fans out vector queries across heterogeneous backends (Parquet, BigQuery,
Snowflake, Delta, Iceberg, local) behind a single RVF wire protocol, with
a RaBitQ-compressed cache in front.

## What ships

- **Research docs** under docs/research/ruLake/ (9 files, ~2.5k lines),
  reframed from the earlier "plug RVF into BigQuery" shape to the
  intermediary/federation shape. BigQuery-native compute becomes a Tier-2
  push-down optimization inside the BigQueryBackend adapter, not a new
  product shape.
- **ADR-155 v2** as "Proposed" — captures the seven alternatives
  considered (plug-in-per-lake, standalone vector DB, Iceberg extension,
  Trino connector, JVM intermediary, notebook-only, push-through-only),
  consequences, and eight open questions.
- **crates/ruvector-rulake/** — new workspace member:
  - `BackendAdapter` trait with minimum surface (id / list_collections /
    pull_vectors / generation / supports_pushdown).
  - `LocalBackend` in-memory reference implementation (thread-safe).
  - `VectorCache` wrapping ruvector_rabitq::RabitqPlusIndex, with per-
    collection generation tracking and `Consistency::{Fresh, Eventual}`
    policies.
  - `RuLake` entry point: register backends, search single or federated,
    cache-stats introspection.
  - 7 smoke tests (`tests/federation_smoke.rs`): byte-exact match vs
    direct RaBitQ, cache-coherence after backend mutation, cross-backend
    fan-out with correct score ordering, cache-hit-faster-than-miss,
    three error-path tests.
  - `rulake-demo` bin: unified benchmark producing the same-run table in
    BENCHMARK.md.

## Measured numbers (LocalBackend, D=128, rerank×20, 300 queries)

| n       | direct RaBitQ+ QPS | ruLake Fresh QPS | ruLake Eventual QPS | tax   |
|--------:|-------------------:|-----------------:|--------------------:|------:|
|   5,000 |             17,311 |           17,874 |              17,858 | 0.97× |
|  50,000 |              5,162 |            5,123 |               5,050 | 1.01× |
| 100,000 |              3,122 |            3,117 |               3,114 | 1.00× |

**Intermediary tax is effectively zero on a local backend.** Federated
across 2 shards: 2,470 QPS @ n=100k (0.79× of single-shard); 4 shards:
1,781 QPS (0.57×) — sequential fan-out, parallel merge is the v2
optimisation per ADR-155 §Consequences.

## Build + test status (this crate only)

```
cargo build  -p ruvector-rulake --release                            ✓
cargo test   -p ruvector-rulake --release                            ✓ 7 passed
cargo clippy -p ruvector-rulake --release --all-targets -- -D warnings   ✓ clean
cargo fmt    -p ruvector-rulake -- --check                           ✓ clean
cargo run    -p ruvector-rulake --release --bin rulake-demo          ✓ reproduces BENCHMARK.md
```

## Scope this commit does NOT cover (M2-M5, see 07-implementation-plan.md)

- ParquetBackend, BigQueryBackend, SnowflakeBackend, IcebergBackend,
  DeltaBackend (real-backend adapters).
- Push-down paths into backends with native vector ops.
- Governance / RBAC / PII / lineage / audit (M4).
- SIFT1M recall measurement on the real-backend path.
- Parallel fan-out via rayon.
- LRU cache eviction.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 18:38:49 -04:00
..
agentic-robotics feat: Add unified ruvector-robotics crate with bridge, perception, cognitive, and MCP modules 2026-02-27 03:35:54 +00:00
claude-code-rvsource feat(decompiler): add graph-derived folder hierarchy for Claude Code v2.1.91 2026-04-03 16:00:41 +00:00
climate-consciousness feat(examples): gene, climate, ecosystem, quantum consciousness explorers 2026-03-31 22:01:55 +00:00
cnn feat(ruvector-cnn): CNN contrastive learning + SIMD optimization fixes (#252) 2026-03-11 17:41:53 -04:00
cognitive-frontier feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
DrAgnes feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282) 2026-03-23 10:12:50 -04:00
dspy feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
exotic-structure-discovery research(boundary-first): 17 experiments proving boundary-first detection across 11 domains (#347) 2026-04-13 12:01:47 -04:00
FalkorDB Add developer quickstart guide and knowledge export JSON 2026-02-27 03:41:13 +00:00
federated-rvf Add developer quickstart guide and knowledge export JSON 2026-02-27 03:41:13 +00:00
gene-consciousness feat(examples): gene, climate, ecosystem, quantum consciousness explorers 2026-03-31 22:01:55 +00:00
gnn-v2 feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
knowledge-export Add developer quickstart guide and knowledge export JSON 2026-02-27 03:41:13 +00:00
latent-space fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
miller-rabin-optimizations docs(adr): ADR-151 + PRD §6 — Phase 0 findings, revised perf targets, Grok review 2026-04-16 14:41:02 -04:00
mincut fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
models feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
nightly/2026-04-23-rabitq feat(rabitq): add RaBitQ rotation-based 1-bit quantization crate (ADR-154) 2026-04-23 07:56:23 +00:00
pglite feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
pi-brain feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
quantization-edge research(kv-cache): TriAttention + TurboQuant stacked compression analysis (#342) 2026-04-08 13:29:16 -05:00
quantum-crypto feat(ruvix): implement CLI, kernel shell, and PBFT consensus (#261) 2026-03-14 16:25:03 -04:00
ruLake feat(rulake): vector-native federation intermediary — ADR-155 + MVP crate 2026-04-23 18:38:49 -04:00
ruvm feat(rvm): RVM — Coherence-Native Microhypervisor for the Agentic Age 2026-04-04 12:10:19 -04:00
rv2 feat(rvAgent): Complete DeepAgents Rust Conversion (ADR-093 → ADR-103) (#262) 2026-03-16 09:52:32 -04:00
rvagent-gemini-grounding feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282) 2026-03-23 10:12:50 -04:00
rvf feat(rvf): add WASM_SEG (0x10) for self-bootstrapping RVF files 2026-02-15 15:36:34 +00:00
seizure-prediction research(boundary-first): 17 experiments proving boundary-first detection across 11 domains (#347) 2026-04-13 12:01:47 -04:00
sota-gap-implementation Add SOTA gap implementations: hybrid search, MLA, KV-cache, SSM, Graph RAG (#304) 2026-03-27 10:12:48 -04:00
sparql fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
spectral-sparsification feat: add ruvector-sparsifier — dynamic spectral graph sparsification 2026-03-20 10:37:39 -04:00
sublinear-time-solver fix(brain): defer sparsifier build on startup for large graphs 2026-03-24 12:29:52 +00:00
wasm-integration-2026 docs: add WASM integration research series (6 documents, 3465 lines) 2026-02-22 21:18:46 +00:00
sota-gap-analysis-2026.md Add SOTA gap implementations: hybrid search, MLA, KV-cache, SSM, Graph RAG (#304) 2026-03-27 10:12:48 -04:00