Commit graph

33 commits

Author SHA1 Message Date
ruvnet
e63fda6339 chore(rulake): add version to ruvector-rabitq path dep for crates.io
cargo publish requires both `path` and `version` for cross-crate
workspace deps so crates.io can resolve them once the published
version lands. Adds `version = \"2.2\"` alongside the existing path
(path wins during local dev, version is what crates.io indexes).

Dry-run now passes the manifest-validation stage and only fails on
the expected "ruvector-rabitq not yet on crates.io" line —
publishing rabitq first + waiting ~1 min for indexing will clear it.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-24 10:15:18 -04:00
ruvnet
d60c802889 feat(rabitq,rulake): external_ids accessor + warm_restart example — close wave-5 gaps
=== Agent A: rabitq — non-dense ID preservation through persist ===
crates/ruvector-rabitq/src/{index,persist}.rs

Wave-5's warm_from_dir collapsed external u64 ids to (0..n) identity
because RabitqPlusIndex lacked an outer ids accessor. Surprise finding:
the persist LOAD path was already id-preserving — the pipeline reads
`id:u32` from disk and hands (id, v) into from_vectors_parallel, which
writes `id` into inner.ids. The only missing piece was the outer-layer
accessor so ruLake could read them back.

Added:
  - RabitqPlusIndex::external_ids(&self) -> &[u32]  (thin forward)
  - RabitqPlusIndex::ids_u64(&self)    -> Vec<u64>  (widening clone)

Regression test `persist_preserves_non_dense_ids` builds an index with
non-dense external ids (13*i + 7 for i in 0..50), save/load, asserts
byte-identical ids after round-trip. 37 → 38 rabitq tests.

=== rulake: drop the (0..n) workaround ===
crates/ruvector-rulake/src/lake.rs

warm_from_dir now calls `idx.ids_u64()` instead of synthesizing
(0..n). Non-dense external ids round-trip faithfully. The
~15-line inline comment documenting the old limitation is gone;
replaced with a 4-line pointer to the wave-6 close.

=== Agent B: warm_restart runnable example ===
crates/ruvector-rulake/examples/warm_restart.rs (new)

Runnable demo of the full save → ship → warm-restart cycle:
  - Phase 1: prime from backend, save to disk
  - Phase 2: spin up a FRESH RuLake with NO backend, warm_from_dir,
    query, assert warm_installs=1 / primes=0
  - Phase 3: cold-prime from backend for comparison
  - Final: report cold/warm speedup

Measured at n=5000 D=128 (agent's single-run numbers):
  Phase 1 prime:     5.03 ms
  save_cache_to_dir: 3.44 ms  (2.46 MiB rbpx)
  Phase 2 warm:      5.00 ms  (warm_installs=1, primes=0)
  Phase 3 cold:      3.60 ms
  Speedup cold/warm: 0.70×

Honest finding: at n=5k D=128, cold-prime is actually faster than
warm-load because our parallel prime is <5ms and parsing 2.5 MB of
rbpx is slower. The warm-restart win shows up at larger n where
compression dominates; documented in the example's closing block.

Steady-state QPS matches within 1.2% (same compressed index in both).

38 rabitq + 21 rulake lib + 22 rulake federation = 81 tests. Clippy
-D warnings clean across both crates.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 23:52:51 -04:00
ruvnet
0ceba2a032 feat(rabitq,rulake): persist end-to-end — save_cache_to_dir + warm_from_dir
Wires the previously-shipped rabitq::persist module into ruLake's
lake.rs as first-class cache-save/restore APIs. The architectural
blocker I've deferred across 3 waves is now closed.

=== Agent A: rabitq::RabitqPlusIndex::export_items() ===
crates/ruvector-rabitq/src/index.rs +1 method, +1 test.

Exposes `export_items() -> Vec<(usize, Vec<f32>)>` — each row as
(pos, original_vec) extracted from originals_flat with one clone per
row. Feeds directly into persist::save_index or
from_vectors_parallel_with_rotation. No new deps, no public API
breakage.

Regression test (`export_items_roundtrip_via_from_vectors_parallel`)
builds via serial add(), exports, rebuilds via the parallel path,
asserts byte-identical search results on 5 queries. Tests: 36 → 37.

=== Agent B: RuLake save_cache_to_dir + warm_from_dir ===
crates/ruvector-rulake/src/{cache.rs, lake.rs, tests/federation_smoke.rs}.

New API:
  pub fn save_cache_to_dir(&self, key, dir) -> Result<PathBuf>
    — writes dir/index.rbpx (atomic temp+rename+fsync) alongside
      the table.rulake.json bundle sidecar. Uses export_items +
      persist::save_index.
  pub fn warm_from_dir(&self, key, dir) -> Result<usize>
    — reads bundle, witness-verifies, loads index.rbpx via
      persist::load_index, cross-checks dim+rerank_factor, installs
      into cache via the new install_prebuilt path. Returns n vectors.
      Does NOT require the backend to be registered — warm restart
      without backend RTT is the point.

New on CacheStats: warm_installs counter (separate from primes so
warm-restart cost isn't confused with cold-prime cost).

New on VectorCache: install_prebuilt + install_prebuilt_interned —
insert a pre-built Arc<RabitqPlusIndex> at a known witness without
any prime-timer bookkeeping. Respects the LRU cap. Shared-entry
path reuses an existing witness entry if another pointer already
holds it (witness-addressed cache sharing remains the headline).

New test: `warm_from_dir_skips_backend_and_returns_bit_exact_results`
Prime a 50-vec D=8 collection, save, spin up a FRESH RuLake with
NO backend registered + Consistency::Frozen, warm_from_dir, run the
same query, assert byte-identical ids + f32 score bits,
warm_installs=1, primes=0. Closes the "restart without re-prime"
gap end-to-end.

Documented limitation: pos_to_id reconstructed as (0..n) identity
because RabitqPlusIndex doesn't expose outer ids() accessor, and
the rabitq agent's scope prohibited adding it. Every current prime
path uses positional ids so this is byte-equivalent to the real
ids; external non-dense u64 ids would collapse (a known M2+ issue
filed inline).

Tests: 37 rabitq + 21 rulake lib + 22 rulake federation = 80 total.
Clippy -D warnings clean across both crates.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 23:34:38 -04:00
ruvnet
21396313f1 docs(bench): add Hadamard rotation table — 3× build speedup at D=128
Documents the randomized Hadamard feature shipped in commit 835f35087
+ bench results from bf48f16e2. ADR-158 has the full rationale.

Key numbers preserved in BENCHMARK.md so operators can see the
tradeoff without reading two ADRs.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 23:10:37 -04:00
ruvnet
bf48f16e27 bench(rabitq,rulake): Hadamard vs Haar — 3× prime speedup at D=128
Adds direct comparison in rulake-demo. RandomRotationKind re-exported
at the crate root so callers don't need to reach into the rotation
module.

Measured (clustered Gaussian, D=128, rerank×20):

  n= 5 000  Haar build: 22.4 ms   Hadamard: 7.2 ms    (3.09×)
  n=50 000  Haar build: 211.6 ms  Hadamard: 72.7 ms   (2.91×)
  n=100 000 Haar build: 421.1 ms  Hadamard: 142.9 ms  (2.95×)

Matches the O(D²) → O(D log D) theoretical speedup: at D=128,
~16 K flops for the dense matrix multiply vs ~900 flops for three
FWHT passes + three sign-vector multiplies. The 3× ceiling reflects
that other allocations + SoA writes take non-negligible fraction of
build time.

Per-query QPS is flat (±3% noise) because the query-side rotation
is only one of many per-query steps — the scan + rerank dominate,
especially at n ≥ 50k. Hadamard's win is entirely on the prime /
cold-start path, which was already the critical-path latency for
cache-miss queries.

Hadamard + existing parallel prime stack:
  n=100k total prime (incl. compression + SoA writes) still ~40 ms
  (parallel prime already dominates), but single-threaded rabitq-
  demo shows the pure-rotation win at 3×.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 23:09:52 -04:00
ruvnet
835f35087e feat(rabitq): persistence + randomized Hadamard rotation — 2 M2 items
Two parallel swarm agents delivered disjoint features for M2:

=== Agent A: seed-based index persistence ===
NEW: crates/ruvector-rabitq/src/persist.rs (+393 LoC)

save_index / load_index serialize a RabitqPlusIndex via its *build
inputs* (dim, seed, rerank_factor, ids, vectors) rather than the
opaque internal SoA state. Rationale: (dim, seed, data) →
bit-identical index by construction (RaBitQ is deterministic), and
the public API doesn't expose packed / rotation / cos_lut — so
seed-based reconstruction is the only path without touching index.rs.

On-disk format (32-byte header + payload):
  magic "rbpx0001" | version:u32 | dim:u32 | seed:u64
    | rerank_factor:u32 | n:u32 | (id:u32, v:f32[dim])*n

DoS caps: dim ≤ 8192, n ≤ 100M, rerank_factor ≤ 1024. Format is
portable — no matrix, no packed codes stored (rebuilt on load).

Tests: serialize_roundtrip_preserves_search_results (10 queries,
byte-exact ids + score bits), reject_bad_magic, reject_version_too_new,
reject_oversize_fields (4 sub-cases).

=== Agent B: randomized Hadamard (HD-HD-HD) rotation ===
MODIFIED: crates/ruvector-rabitq/src/rotation.rs (+219 LoC)

Adds RandomRotation::hadamard(dim, seed) as an opt-in O(D log D)
rotation. Storage is 3 × padded_dim × 4 bytes of ±1 signs instead
of D×D × 4 bytes of Haar matrix (1.5 KiB vs 64 KiB at D=128).

Based on TurboQuant 2025 (arXiv:2504.19874 §3.2): D₃·FWHT·D₂·FWHT·D₁
is close-to-Haar-uniform in the Johnson–Lindenstrauss sense, which
is all RaBitQ's error bound requires. For non-power-of-2 dim:
zero-pad to next_power_of_two, apply, truncate.

Backward-compatible: RandomRotation::random() still returns the
Haar matrix. New RandomRotationKind { HaarDense, HadamardSigned }
enum for introspection. RabitqIndex unchanged — integration into
the scan path is future work (ADR-158 pending).

Tests: hadamard_apply_preserves_norm_power_of_two (D=128, 256),
hadamard_apply_preserves_norm_non_power_of_two (D=1000 → pad 1024,
norm ∈ [0.95, 1.05] on 100 unit vectors), hadamard_is_deterministic,
hadamard_is_fast.

=== Totals ===
25 → 33 rabitq lib tests (+4 persist, +4 hadamard). All 21 rulake
federation + 21 rulake lib tests unchanged and passing. Clippy -D
warnings clean across both crates.

Both agents worked on strictly disjoint file scopes (persist.rs +
lib.rs one-liner vs rotation.rs only) — no merge conflicts.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 22:42:19 -04:00
ruvnet
a6599197ac docs(bench): update BENCHMARK.md with wave-2 stacked numbers
Refreshed the headline tables to reflect the AVX2 popcount + CacheKey
Arc-intern commits. Wave-2 delta on top of the Arc-refactor baseline:

  n=100k direct RaBitQ+: 2,975 → 3,661 QPS (+23%)
  n=100k concurrent 1-shard: 23,681 → 27,814 QPS (+17%)
  n=100k concurrent 2-shard: 28,971 → 32,194 QPS (+11%)
  n=100k concurrent 4-shard: 33,094 → 36,715 QPS (+11%)

vs original pre-optimization M1 baseline:
  concurrent 4-shard: 2,791 → 36,715 QPS = 13.2× lift.

Intermediary tax on cache-hit path stays at 1.01-1.03× direct RaBitQ.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 22:22:30 -04:00
ruvnet
5f32fd4508 perf(rulake): CacheKey Arc<str> intern — cheap refcount clones on hot path
Memory-audit finding #1: the hot router path cloned (String, String)
keys ~3K times per federated query (one per mark_hit / mark_miss /
per_backend_mut call). At 10 k QPS × 8 shards that's 7.6 MB/s of
short-lived allocator traffic + hashmap rehashing on every step.

Fix: intern at the RuLake boundary into Arc<str>.

  pub type CacheKey = (BackendId, CollectionId);      // unchanged (public)
  pub(crate) type InternedKey = (Arc<str>, Arc<str>); // internal
  pub(crate) fn intern_key(b: &str, c: &str) -> InternedKey;

ensure_fresh interns once at entry; every downstream mark_hit /
mark_miss / per_backend_mut call takes refcount-cheap Arc<str>
clones instead of cloning owned Strings. The public CacheKey alias
stays (String, String) for API stability — callers passing owned
tuples at rare diagnostic sites (cache_witness_of, invalidate_cache)
keep working untouched.

Bench delta (stacked with AVX2 popcount commit 5a4b0d782):
  n=100k single-thread Eventual: 2,963 → 3,626 QPS (+22%)
  n=100k concurrent 1-shard:    23,681 → 27,814 QPS (+17%)
  n=100k concurrent 4-shard:    33,094 → 36,715 QPS (+11%)

vs original pre-optimization M1 baseline: **13.2× concurrent QPS**.

21 federation tests + 21 rulake lib + 25 rabitq = 67 tests passing.
Clippy -D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 22:21:46 -04:00
ruvnet
a0fdd4d9b0 feat(rulake): per-shard over-request for federated top-K (SOTA folklore rule)
Closes the data-skew recall gap the 2026-04-23 cache/federation SOTA
agent flagged. Weaviate/Elasticsearch default to k-per-shard which
under-recalls when the true top-K is concentrated in one shard.

Policy:  k' = k + ceil(sqrt(k * ln(S))), clamped to [k, 4k]

Examples:
  k=10, S=2  → k'=13
  k=10, S=4  → k'=14
  k=10, S=16 → k'=16
  k=10, S=64 → k'=17

At k=10 the over-request is ~30-70% of k — cheap insurance against
skew. Formula is the folklore rule cited in SPIRE (arxiv 2512.17264),
HARMONY (SIGMOD'25), and the OpenSearch recall guide. Extra cost per
shard is O(k' × rerank) — negligible vs the scan cost at rerank=20.

Single-shard (S=1) returns k unchanged. Callers can still override
via search_federated_with_rerank to get exact parity.

21 federation tests passing. Clippy -D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 22:04:58 -04:00
ruvnet
fdc43fb55c security(rulake): 3 audit findings — DoS caps + 32-bit truncation + Generation tag byte
All three from the 2026-04-23 round-2 security audit. Format version
bumped to 2 because Generation::hash_bytes() now prepends a variant
tag; v1 witnesses don't match v2 witnesses by design.

HIGH — unbounded pull_vectors DoS
  Hostile backend returning n=u64::MAX / dim=u32::MAX would OOM the
  host at prime. Added MAX_PULLED_VECTORS=100M, MAX_PULLED_DIM=8192,
  MAX_PULLED_BYTES=16 GiB caps with checked_mul overflow guard. Cache
  prime() validates the batch before any allocation. FsBackend::pull
  also bounds-checks the on-disk ruvec1 header fields before
  Vec::with_capacity so a corrupt file can't crash the reader.

MEDIUM — u64/u32 → usize truncation on 32-bit targets
  FsBackend::pull now uses usize::try_from(count_u64) and errors on
  overflow instead of silently truncating. wasm32 / armv7 deployments
  no longer have the content-smuggling-past-witness attack surface.

LOW — Generation::Num(n) vs Generation::Opaque(bytes) collision
  Before: Num(7) and Opaque("\x07\0\0\0\0\0\0\0") produced identical
  hash_bytes() output, so two different generations could share a
  witness.
  After: hash_bytes() prepends 0x00 for Num, 0x01 for Opaque. Tests
  cover the regression (generation_num_and_opaque_cannot_collide).
  Format version bumped 1→2 since witness bytes change.

Also added 4 backend-module tests for the pulled_batch validator
(accepts reasonable size, rejects dim=0, rejects dim>cap, rejects
ids/vectors length mismatch).

42 tests passing (21 lib + 21 federation). Clippy -D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:59:01 -04:00
ruvnet
4e541fd6be docs(rulake): rewrite README — clearer intro, benefits, RuVector positioning
Reworked the top-level README per review feedback. Changes:

- New intro: concrete one-line description + a plain-English
  "what is ruLake" that leads with the user's existing data ("you
  already have vectors somewhere"), not internal architecture.
- Added "Why ruLake exists" section contrasting the three existing
  options (managed vector DB / lakehouse / local library) and where
  ruLake fits.
- Features: simpler bullet-point descriptions, every claim backed
  by a measured number.
- New "Benefits" section grouping wins by reader role (app dev,
  platform, performance eng, security eng).
- Comparison section now leads with the "not a vector database"
  statement verbatim and positions ruLake explicitly as part of
  the RuVector ecosystem (RVF / rabitq / ruLake table).
- Usage recipes expanded with the 95% hit-rate target + acceptance
  test mention.
- Benchmarks section now includes the parallel-prime numbers
  (11× speedup at n=100k).
- "How it works" cleaned up: data-flow diagram, bundle structure
  example, adaptive rerank explanation, Arc concurrency, parallel
  prime.
- User guide: decision tables for consistency mode, metric
  interpretation, custom backend writing, running examples.
- Status: M1 shipped list (60 tests, zero unsafe) + M2 roadmap.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:51:34 -04:00
ruvnet
d9aedaabb2 perf(rabitq,rulake): parallel prime via rayon — 11× faster at n=100k
RabitqPlusIndex::from_vectors_parallel rotates + bit-packs every
vector in parallel using rayon, then commits the SoA serially.
Produces a bit-identical index to the serial add loop — rotation
matrix is seeded once at construction and encode is deterministic,
so parallel ordering cannot affect output bytes.

VectorCache::prime picks between serial add() and the new parallel
constructor based on batch size (PARALLEL_PRIME_THRESHOLD = 1024).
Below 1k vectors the rayon task-queue overhead outweighs the D×D
rotation savings; above it the parallel path dominates.

Measured (clustered D=128, rerank×20):

  n=5k    prime 22.3 ms → 4.5 ms     (4.9×)
  n=50k   prime 213 ms  → 19.6 ms    (10.9×)
  n=100k  prime 420 ms  → 37.6 ms    (11.2×)

This is the biggest cold-start-latency win available in M1. Real
backend deployments where prime cost is the critical-path latency
on a cache miss now see p99 drop by an order of magnitude.

rayon dep is no longer feature-gated in rabitq (it's already a
runtime dep via the workspace-pinned 1.10 that ruLake uses).

40 tests passing. Clippy -D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:48:41 -04:00
ruvnet
87c6ef2677 docs(rulake): comprehensive README — features, usages, benchmarks, guide
Root-level README for the ruvector-rulake crate in the style of the
workspace README. Sections:

- Introduction: one-line positioning + what ruLake is/isn't
- Features: 8 subsections covering the actual measured properties
  (cache-first, witness-authenticated, federated, three-mode
  consistency, sidecar protocol, KPIs, kernel plane, security)
- Quick start: minimal usable example
- Usage patterns: 5 concrete recipes (RAG, federated, Frozen audit,
  sidecar daemon, memory substrate tagging)
- Benchmarks: 4 measurement tables from the commit history
- Comparison: vs Pinecone / Weaviate / Milvus / LanceDB / BQ Vector
  Search (intermediary tax, federation, witness, cross-process
  cache sharing, GPU optionality, unsafe count)
- Technical details: architecture diagram, bundle protocol, coherence
  protocol state machine, adaptive rerank formula, Arc concurrency
  model, security model
- User guide: consistency-mode choice, cache sizing, observability
  metrics, writing a custom backend, running the examples
- Status: M1 shipped, M2 roadmap
- Links to ADR-155 / 156 / 157

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:36:16 -04:00
ruvnet
39e0b4f3ab security+perf(rulake): path-traversal fix + JSON caps + Arc-drop-lock (8-12× QPS)
Two items in one commit because they both touch broad surface:

=== Iter 27: deep security review + fixes ===

CRITICAL — path traversal in FsBackend
  FsBackend::register/write previously joined user-supplied filenames
  with the root dir without validation. A filename of "../etc/passwd"
  escaped the root. Also polluted the data_ref (witness input),
  letting an attacker forge witness values via canonicalization.

  Fix: validate_filename() rejects empty, ".", "..", control bytes,
  path separators (both / and \), colons (Windows drive/UNC), and
  names > 255 bytes (POSIX NAME_MAX). register() is now fallible.

  Test fs_register_rejects_path_traversal covers 12 malicious forms
  (../escape, ../../etc/passwd, /absolute, sub/foo, back\slash,
  "." "..", null bytes, control bytes, C:drive, empty string).

LOW — unbounded JSON metadata fields
  from_json had no cap on input size or field lengths. A 1 GB
  memory_class string would force the reader to allocate that much.

  Fix: MAX_JSON_BYTES=64 KiB total, MAX_FIELD_BYTES=4 KiB per
  (data_ref, pii_policy, lineage_id, memory_class), rvf_witness
  capped at 128 bytes (SHAKE-256(32) hex is exactly 64).

  Tests: from_json_rejects_oversize_input + from_json_rejects_
  oversize_metadata_field.

No unsafe in ruLake or the new kernel module. Bundle witness scheme
is already domain-separated + length-prefixed SHAKE-256. Mutex
poisoning on unwrap() is a deliberate fail-fast; documented.

=== Iter 28: Arc-drop-lock optimization ===

The cache Mutex was held for the entire scan duration. Under
concurrent load, every reader serialized behind the previous one.

Fix: CacheEntry::index is now Arc<RabitqPlusIndex>, pos_to_id is
Arc<Vec<u64>>. search_cached + search_cached_batch clone the Arcs
under the mutex (microseconds) and drop the lock before running
the scan. The index is immutable once built; no data race.

Measured on n=100k, 8 clients × 300 queries:

  shards   QPS before   QPS after    lift
       1       2,854      23,681    8.3×
       2       2,959      28,971    9.8×
       4       2,791      33,094   11.9×

Federated sequential also wins big — previously each shard's scan
serialized on the same cache mutex; now rayon actually parallelizes:

  n=100k, 4-shard sequential: 1,673 → 8,289 qps (5.0×)

Single-threaded single-shard numbers are unchanged (mutex is
uncontended in that case) — this is purely a concurrency win, and
it's the single biggest optimization on the M1 branch.

All 40 tests passing (17 lib + 21 federation + 3 fs_backend).
Clippy -D warnings clean in both rabitq and rulake.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:33:22 -04:00
ruvnet
2bdfd342e3 feat(rabitq,rulake): VectorKernel + memory_class + per-collection stats + sidecar example
Four in-scope M1 items from the remaining backlog, landed together
because they cross-cut cleanly.

Iter 23 (rabitq): VectorKernel trait + CpuKernel default
  - Trait: id(), caps() → KernelCaps, scan(ScanRequest) → ScanResponse.
    Scan-phase determinism is the hard contract; rerank-phase nondet
    is declared via caps().deterministic = false and the caller's
    dispatch policy filters those out of Fresh/Frozen paths (ADR-157).
  - CpuKernel wraps RabitqPlusIndex::search_with_rerank, always
    available, unbounded dim, deterministic.
  - Tests: CPU kernel matches direct search byte-exactly + respects
    per-call rerank override + caps advertised correctly.

Iter 24 (rulake): memory_class on RuLakeBundle (ADR-156)
  - Opaque caller-defined tag — agent systems write "episodic" /
    "semantic" / etc; ruLake stores but never interprets.
  - Not part of the witness: two bundles with identical data but
    different memory_class share the cache.
  - Serde default+skip_if_none keeps old bundles forward-compatible.
  - Test: roundtrip + witness-unchanged + legacy bundles without the
    field still parse.

Iter 25 (rulake): examples/sidecar_daemon.rs
  - Runnable demo of publish_bundle / refresh_from_bundle_dir pair.
  - Publisher mutates backend + re-publishes; daemon poll loop
    detects witness change, invalidates; next query re-primes.
  - Includes a bug fix in refresh_from_bundle_dir: when the cache
    pointer is None (already invalidated), report UpToDate instead
    of Invalidated so daemons don't re-fire on every poll between
    "we invalidated" and "somebody queried."

Iter 26 (rulake): CacheStats::stats_by_collection
  - Per-(backend, collection) counters, one level finer than
    stats_by_backend. Operators can identify which specific
    collection is hot and pin it in LRU or increase its shard count.

21 federation + 11 bundle + 3 fs_backend + 3 kernel = 38 tests
passing across both crates. Clippy -D warnings clean. Example runs
end-to-end.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:27:04 -04:00
ruvnet
39110f09d9 bench(rulake): search_batch vs per-query — 1.05× warm, big latent wins
Adds a batch-vs-loop block to rulake-demo. Measures on an already-primed
LocalBackend under Eventual consistency (the hot path):

  batch=8     qps=2874   1.01×
  batch=32    qps=2961   1.04×
  batch=128   qps=2943   1.03×
  batch=300   qps=2986   1.05×
  per-query loop  2855   baseline

Modest on this workload because the warm cache path is already
uncontended (single-threaded + Eventual TTL makes ensure_fresh a
HashMap lookup, not a backend RTT). BENCHMARK.md is updated to
record the honest number and name the three latent wins the bench
does not measure:

  1. Fresh consistency — batch of N amortizes N backend RTTs to 1.
  2. Concurrent contention — fewer mutex acquires under multi-client.
  3. Kernel dispatch (ADR-157) — batch is the plug-point GPU / SIMD
     kernels need to cross over CPU.

The mechanical guarantee is unchanged and already tested
(search_batch_acquires_cache_lock_once): batch=32 registers as 1
coherence check, not 32. Speedup is workload-dependent; the shape
is correct.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 20:34:20 -04:00
ruvnet
3daa8b1b2a test(rulake): brain_substrate_acceptance — the six-guarantee loop
Ships the runnable acceptance test ADR-156 spec'd. Drives a single
LocalBackend through the full substrate contract in one test:

  1. Recall:     search_one → results
  2. Verify:     publish_bundle → read_from_dir → verify_witness
                 → cache pointer matches on-disk witness
  3. Forget:     invalidate_cache → pointer is None
  4. Rehydrate:  next search_one → primes+1, pointer reinstalled
  5. Location-   results before forget ≡ results after rehydrate
     transparency (byte-exact ids + scores at the same seed); the
                 caller never touched data_ref or knew which tier
                 served the call
  6. Compact:    explicitly out of scope per ADR-156 — belongs to
                 RVM/Cognitum, not the substrate

If this test stays green on every commit, the agent-facing memory
substrate claim is mechanical, not aspirational.

Also closes ADR-156 open question #4 (substrate test needed) as
resolved.

21 federation + 9 bundle + 3 fs_backend = 33 tests passing. Clippy
-D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 20:28:16 -04:00
ruvnet
9e21d89a16 feat(rulake): PerBackendStats — hit rate attribution per backend id
Lets operators see which backend is hot (high hit_rate) vs cold (high
miss+prime cost). The global CacheStats are unchanged; the per-backend
counters are populated lazily on first activity against a given
backend id.

Attribution touches four events: hits, misses, primes, and
invalidations. Shared-hits (witness-match cache shares) attribute to
the *receiving* backend — that's the one that saved the prime work.
Mark_hit/mark_miss take &CacheKey so attribution is explicit, not
threaded through shared state.

API:
  - VectorCache::stats_by_backend() -> HashMap<BackendId, PerBackendStats>
  - RuLake::cache_stats_by_backend() (thin delegate)
  - PerBackendStats::hit_rate() mirrors CacheStats::hit_rate()

Test cache_stats_by_backend_attributes_hits_to_the_right_backend:
two backends, hammer one, cold one stays at 0 hits. Hot hit_rate ≥
0.95 after warmup; both primed once.

Motivation (ADR-157): kernel dispatch decisions need per-backend
signals — a cold backend with high miss rate should not trigger GPU
dispatch the same way a hot backend serving at high hit rate does.
Also generally useful for capacity planning.

20 federation + 9 bundle + 3 fs_backend = 32 tests passing. Clippy
green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 20:20:46 -04:00
ruvnet
bb6080e378 feat(rulake): search_batch API — one lock, one coherence check for N queries
Batched single-collection search that amortizes per-query overhead
across the batch:

  - ensure_fresh() runs once, not N times (big for Fresh consistency
    where it's a backend RTT each time).
  - VectorCache::search_cached_batch takes the cache mutex once and
    runs all N scans under that single acquisition.
  - pos_to_id clone is amortized across the whole batch.

Layered trait shape (ADR-157 preparation):

  RuLake::search_batch
    └─ VectorCache::search_cached_batch
         └─ N × RabitqPlusIndex::search (today — CPU per-query in loop)

A future VectorKernel trait plugs in under the innermost step. The
signature up to search_cached_batch is already kernel-agnostic —
GPU / SIMD kernels cross over CPU only at batch sizes ≥ their
min_batch, so a per-query API would never let dispatch pick them.

Tests:
  - search_batch_matches_per_query_results: byte-exact parity with
    search_one called individually. No recall loss.
  - search_batch_acquires_cache_lock_once: a batch of 32 registers
    as 1 coherence-skip hit (Eventual), not 32 — proves the
    amortization is real, not nominal.

19 federation + 9 bundle + 3 fs_backend = 31 tests passing. Clippy
-D warnings clean.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 20:11:12 -04:00
ruvnet
773d05c9c4 feat(rulake): Consistency::Frozen + ADR-156 substrate positioning
Two changes from the 2026-04-22 strategic review reframing ruLake as
the memory substrate for agent brain systems:

1. Consistency::Frozen variant — caller asserts bundle immutability;
   never automatic backend recheck. Maps to "Frozen for audit" from
   the reviewer's three-mode product knob. Automatic coherence is
   suppressed; explicit refresh_from_bundle_dir still works (lets
   operators invalidate frozen caches without needing Fresh mode).

   can_skip_check short-circuits when the pointer is already
   installed — first prime still runs, subsequent queries never
   round-trip to the backend.

   Test frozen_consistency_never_rechecks_after_prime: prime → bump
   backend → 10 warm searches still hit on the old witness, primes
   stay at 1. Explicit refresh on a re-published bundle correctly
   reports Invalidated, proving operator control remains.

2. ADR-156 — positioning addendum, not replacement of ADR-155.
   ruLake stays as substrate (memory hierarchy); brain system stays
   above (memory type, recall policy, mutation policy). Decomposes
   the reviewer's "recall / verify / forget / compact / rehydrate"
   acceptance test into six guarantees, five of which are shipped.

   Rejects:
   - absorbing the brain into ruLake (violates substrate separation)
   - a new rulake-memory crate (premature; M1 primitives suffice)
   - forking into two products (identical properties; no win)

17 federation + 9 bundle + 3 fs_backend = 29 tests passing. Clippy
green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 20:02:13 -04:00
ruvnet
93146fe995 feat(rulake,rabitq): adaptive per-shard rerank — 4-shard 0.60× → 0.98×
Ships the cross-crate fix that iter 12's concurrent bench identified:
K-shard federation no longer pays K× the rerank cost.

Changes:
  - rabitq: RabitqPlusIndex::search_with_rerank(query, k, rerank_factor)
    — non-mutating per-call override, same body as search(). The stored
    field stays the default used by plain search().
  - rulake: VectorCache::search_cached_with_rerank(key, q, k, rf_opt)
    forwards through. search_cached() remains the default path.
  - rulake: RuLake::search_federated uses an adaptive default of
    max(MIN_PER_SHARD_RERANK=5, global / K). search_federated_with_rerank
    lets callers override explicitly (None = adaptive, Some(global) =
    byte-exact parity with single-shard).

Bench (n=100k, 8 clients × 300 queries, same box):

  shards   before QPS   after QPS   per-shard rerank
       1      2,963        2,854                 20
       2      2,500        2,959 (1.04×)         10
       4      1,778        2,791 (0.98×)          5

4-shard federation went from 0.60× the single-shard baseline to
0.98×. At 2 shards, the mutex serialization overhead even nets us
slightly above 1-shard. Federation is genuinely free now.

Recall gate: adaptive_per_shard_rerank_preserves_recall asserts
recall@10 ≥ 0.85 at K=2 and K=4 on clustered D=128 n=5k.

This closes the M2 cross-crate task filed in ADR-155 (iter 13). The
strategic review's "immediate optimization, high impact" is shipped.

27 → 28 tests passing. Clippy -D warnings clean in both crates.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:54:59 -04:00
ruvnet
45460153b8 feat(rulake): hit_rate + avg_prime_ms accessors on CacheStats
Cache-first reframe (ADR-155) makes hit_rate the primary KPI. Before
this, operators were flying blind — the raw hits/misses counters told
them nothing about the 95% gate the acceptance test targets.

- CacheStats gets total_prime_ms + last_prime_ms (prime timed inside
  prime() with Instant::now() around the lock-free build).
- CacheStats::hit_rate() → Option<f64> (None when no searches yet).
- CacheStats::avg_prime_ms() → Option<f64> (None when no primes).

Test stats_expose_hit_rate_and_prime_duration: 1 prime + 99 warm
queries → hit_rate ≥ 0.95, last_prime_ms in the expected range.

This is step 1 of the strategic reframe the latest review surfaced:
cache-coherent execution layer, federation as refill. The KPI it
exposes is what the 95% acceptance gate will measure.

15 federation + 9 bundle + 3 fs_backend = 27 passing. Clippy green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:51:41 -04:00
ruvnet
a68a2ed68c bench(rulake): concurrent-clients × shard-count — honest rayon finding
Added measure_concurrent_fed: N client threads hammering federated
search against K shards on one box. Expected result was "rayon helps
under concurrent load where single-thread bench masks it." Actual
result is different and worth recording honestly.

At n=100k, 8 clients × 300 queries:
  1 shard: 810ms wall, 2,963 qps
  2 shards: 960ms wall, 2,500 qps  (0.84×)
  4 shards: 1,350ms wall, 1,778 qps (0.60×)

More shards = LOWER concurrent throughput for this "same data split K
ways on one box" workload. Root cause: the RaBitQ rerank_factor × k =
200 rerank runs per shard, so K-shard federation does ~K× the rerank
work. Parallel fan-out cuts scan cost but not rerank cost.

Consequences documented in BENCHMARK.md:
  - Don't shard for throughput on same-box same-data; shard for
    reachability or memory bounds.
  - Per-shard rerank factor reduction is an obvious M2 optimization:
    fan out at rerank=50 per shard when K≥2 keeps global recall above
    90% while approximately K× reducing the rerank cost. Measurement-
    driven, not speculative.
  - Real federation gain (disjoint data across network backends) is
    genuine; this bench just doesn't measure it.

Rayon fan-out is NOT reverted — still correct for the miss-path prime
(1.97× / 3.86× speedup retained) and for remote-backend I/O overlap.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:45:03 -04:00
ruvnet
ec5ce6c6af feat(rulake): refresh_from_bundle_dir — reader side of sidecar protocol
Completes the sidecar loop (publish → disk → refresh). Given a key
and a directory, read the on-disk table.rulake.json and:
  - UpToDate: witness matches cache pointer, nothing to do
  - Invalidated: witnesses differ, cache pointer for key is dropped
  - BundleMissing: no sidecar present (caller decides)

A corrupt/tampered sidecar surfaces as InvalidParameter via
RuLakeBundle::read_from_dir's witness verification — a poisoned
publish cannot silently invalidate the cache.

This is the minimal primitive a cache sidecar daemon needs. The
daemon itself is a ~10-line loop in user code: for each watched
(key, dir), call refresh_from_bundle_dir periodically or in
response to inotify events; handle the three outcomes.

Closes the "cache sidecar daemon protocol" open question from
ADR-155. The protocol is: filesystem-based, witness-authenticated,
atomic-write on publish, three-state on refresh.

14 federation + 9 bundle + 3 fs_backend = 26 tests passing.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:21:39 -04:00
ruvnet
865e2c8e2c feat(rulake): publish_bundle — writer side of the bundle protocol
Pairs with iter 4's read_from_dir: given a registered (backend,
collection) key, emit the current table.rulake.json to a directory.
This is what a cache sidecar daemon calls when the warehouse triggers
a bundle refresh — the daemon publishes the new bundle, any serving
ruLake watching that directory swaps in the new witness on next
search.

Does NOT prime the cache — publish is a metadata emission, not a
data load. That keeps publish cheap and lets operators stage bundle
updates without moving any compressed data.

Test publish_bundle_roundtrips_through_disk: publish → read_from_dir
on a third party → witness matches what a cache prime would see.

13 federation + 9 bundle + 3 fs_backend = 25 passing. Clippy green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:15:46 -04:00
ruvnet
fedaa85284 test(rulake): concurrent query hammer — M3 multi-client smoke
8 threads × 50 queries against a shared RuLake, alternating single-shard
and federated calls. Validates:
  - no deadlocks (bounded time to completion)
  - no panics from the cache Mutex or backend RwLock under contention
  - every returned hit is finite and the per-call result is sorted
  - prime count stays at ≤ 2 (one per shard) — hits serve the rest

Closes the M3 "concurrent multi-client throughput" smoke item from
BENCHMARK.md. The Send + Sync bound on RuLake is now exercised, not
just declared.

12/12 federation + 9 bundle + 3 fs_backend tests passing (24 total).
Clippy -D warnings green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:12:52 -04:00
ruvnet
a604fd2d5e feat(rulake): FsBackend — real file-based adapter with mtime witness
First concrete adapter that reads real persistent data. Uses a simple
'ruvec1' binary format (8-byte magic + u64 count + u32 dim + records)
and takes the mtime as the generation token. This proves the full
bundle → witness → cache → search loop works against the filesystem
without pulling arrow/parquet deps — a real ParquetBackend reuses the
exact same shape, only the decoder and generation source change.

- current_bundle() reads only the 24-byte header to pick up dim —
  real-backend hot-path ergonomics; a full pull per coherence check
  would be catastrophic on a warehouse adapter.
- Atomic write via temp+rename so concurrent reads never observe a
  torn record stream (matches the bundle sidecar write pattern).
- data_ref is 'file://<path>', anchoring the witness on the local
  filesystem location — two FsBackends pointing at the same file
  share the cache entry (content-addressed, per ADR-155).

Tests:
- fs_write_then_pull_roundtrip: write vectors, read them back bitwise.
- fs_bundle_has_file_uri_and_header_dim: verify witness + data_ref.
- fs_pull_rejects_bad_magic: magic-byte guard on pull.
- fs_backend_end_to_end_search_and_recache_on_mtime_bump (federation
  smoke): full RuLake → FsBackend → mtime bump → re-prime cycle.

23/23 passing (9 bundle + 3 fs_backend + 11 federation). Clippy green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:11:13 -04:00
ruvnet
f88016cc50 feat(rulake): rayon parallel fan-out — 4× prime speedup on 4 shards
search_federated now par_iters over targets so that cache-miss primes
(the expensive case — pulling from the backend + building a RabitqPlus
index) run concurrently per shard. Measured speedups in BENCHMARK.md:

  n=100k:  1-shard prime 425ms → 2-shard 215ms (1.97×) → 4-shard 110ms (3.86×)
  n= 50k:  1-shard prime 213ms → 2-shard 110ms (1.95×) → 4-shard  56ms (3.83×)

Warm-cache QPS on a single-threaded benchmark drops slightly because
rayon's par_iter startup is measurable at sub-ms per-query. The win is
in tail-latency under miss and in real remote-backend deployments where
per-shard latency dominates — the bench understates this.

Short-circuits on error (first shard to return Err wins), matching the
sequential loop's semantics.

Rayon pinned via workspace.dependencies (rayon = "1.10").

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:08:15 -04:00
ruvnet
4df5bc86a0 feat(rulake): bundle FS persistence — write_to_dir + read_from_dir
Direct dependency of the BQ UDF + cache sidecar daemon: the daemon
needs to read `table.rulake.json` off GCS (or a local mount) and
verify its witness before swapping in a new compressed entry.

- Atomic write via temp+rename so concurrent readers never see a
  truncated sidecar (matches the pattern a warehouse-push path needs).
- Read verifies witness on-disk → malformed or tampered bundles
  surface as InvalidParameter with a "witness" message.
- Canonical filename is exposed as SIDECAR_FILENAME so callers
  don't hardcode the string.

Tests:
- fs_roundtrip: write + read preserves witness + optional fields.
- fs_read_rejects_tampered_sidecar: edit dim on disk → read errors.
- fs_write_is_atomic_under_crash_simulation: leftover .tmp.* files
  don't corrupt reads of the canonical sidecar.

19/19 passing (9 bundle + 10 federation). Clippy -D warnings green.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:05:42 -04:00
ruvnet
9d9bf30882 feat(rulake): LRU eviction with max_entries — bounded-memory mode
MVP shipped an unbounded cache. v1 must-have: a hard cap on the number
of distinct compressed entries, evicting the least-recently-used
*unpinned* (refcount=0) entry when the cap is exceeded.

Design note: entries pinned by a live `(backend, collection)` pointer
are never evicted — dropping them would orphan a caller. If every
entry is pinned, the cap is temporarily exceeded rather than return
an error. Correctness over strict bounds.

API:
- `VectorCache::with_max_entries(n)` — builder-mode cap.
- `RuLake::with_max_cache_entries(n)` — user-facing constructor flag.
- `RuLake::invalidate_cache(key)` — drop a pointer explicitly so its
  entry becomes evictable.
- `CacheEntry.last_used` bumped on every search_cached; LRU picks the
  oldest unpinned entry as victim.

Eviction runs opportunistically at the end of each prime when a cap
is set. Zero overhead when `max_entries == None` (default path).

Test: `lru_eviction_caps_entry_count_when_pointers_dropped` pins three
entries, invalidates one, asserts the cap=2 holds after the next
prime runs the sweep.

16/16 tests pass. Clippy clean under -D warnings.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 19:00:56 -04:00
ruvnet
79e57f35ef feat(rulake): witness-addressed cache — cross-backend sharing via SHAKE-256
Implements the reviewer's "use RVF witness chain hash as cache-key
anchor" design. Cache entries are now keyed by the RuLakeBundle
witness, not (backend_id, collection). Two backends advertising the
same logical dataset (same data_ref + seed + rerank + generation)
produce the same witness and share one compressed index.

## The change

### BackendAdapter::current_bundle() (new trait method)

Returns the backend's authoritative bundle for a collection. Default
impl synthesizes from `id() + generation()`; real backends override to
report a shared data_ref when they're replicas of the same source of
truth. LocalBackend overrides to avoid the default's pull-to-read-dim
round-trip.

### VectorCache: two-layer storage

- `entries: HashMap<WitnessKey, CacheEntry>`  — content-addressed
- `pointers: HashMap<CacheKey, WitnessKey>`   — (backend, collection) → witness
- `last_checked: HashMap<CacheKey, Instant>`  — for Eventual-mode TTL

`CacheEntry` now carries a `refcount` so an entry is GC'd only when
its last pointer drops. New stat: `shared_hits` — incremented when a
pointer move finds the target witness already cached.

### RuLake::ensure_fresh flow

1. Eventual within TTL → skip check (fast).
2. Witness matches pointer → hit, no-op.
3. Witness mismatch, target witness already in pool (another pointer
   has it) → just swap the pointer, zero prime work. This is the
   cross-backend share.
4. Witness not in pool → pull + prime as before.

### Prime is now race-tolerant

A concurrent thread racing to prime the same witness doesn't rebuild —
whichever thread gets the lock second observes the entry and drops
its own build. Two builds for the same witness are byte-identical by
determinism, so no data is lost.

## Test added

`two_backends_share_cache_when_witness_matches` — uses a
`SharedLocalBackend` shim that overrides `current_bundle()` to advertise
a shared data_ref. Two distinct `LocalBackend`s behind shims report
identical witnesses; the second search finds `primes=1, shared_hits=1`
and only ONE compressed entry in the pool despite two pointers. Both
pointers' `refcount_of(witness) == 2`.

## Lint + test status

```
cargo test    -p ruvector-rulake --release                              ✓ 15/0
cargo clippy  -p ruvector-rulake --release --all-targets -- -D warnings  ✓ clean
cargo fmt     -p ruvector-rulake -- --check                              ✓ clean
```

## Closes open question from earlier ADR review

"Cache invalidation drift" — the witness is now the cache-key anchor.
Backend generation bumps become witness changes; witness changes are
content-addressable so old entries can drop but shared ones survive.
"Where does freshness truth live?" — answered: in the bundle.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 18:54:57 -04:00
ruvnet
8e574daa68 feat(rulake): cache-first reframe + bundle sidecar + recall gate
Applies the reviewer's architectural feedback (docs/research/ruLake/
chat thread): ruLake is a cache-first vector execution fabric, not a
federation engine. Federation is the cache's refill mechanism.

## Perf fix — cache prime now runs lock-free

`VectorCache::prime()` previously built a fresh `RabitqPlusIndex`
(~400 ms at n=100k) while holding the cache mutex, serialising all
other queries. Now builds entirely before touching `inner`; the lock
is only taken to swap the finished entry in. No benchmark regression —
intermediary tax still 1.00× on LocalBackend at n=100k.

## New: bundle sidecar (`table.rulake.json`)

`ruvector_rulake::bundle` — the portable unit that defines ruLake's
reproducibility + governance scope. Flagged by the reviewer as more
important than the UDF because it's what travels between teams,
clouds, and backups.

Carries: `data_ref`, `dim`, `rotation_seed`, `rerank_factor`,
`generation`, `rvf_witness` (SHAKE-256 over the preceding fields),
`pii_policy`, `lineage_id`.

`Generation` is a serde-untagged union of `Num(u64)` (Parquet mtime,
Iceberg version, Snowflake offset) and `Opaque(String)` (UUIDs,
hashes, base64 blobs) — fixes the "u64 doesn't fit an Iceberg snapshot
id" open question from the M1 review.

Witness fn is domain-separated, length-prefixed, and verifiable via
`bundle.verify_witness()`. 6 new tests: determinism,
field-change-detection, length-prefix-anti-collision, serde roundtrip,
tamper-detection, format-version-downgrade-rejected.

## New: recall-vs-brute-force gate

`rulake_recall_at_10_above_90pct_vs_brute_force` — the missing
correctness test. Builds brute-force L2 truth over 5k clustered
Gaussian vectors, asserts ruLake's top-10 hits ≥ 90% at rerank×20.
Uses the same n + cluster-count + methodology as
`ruvector-rabitq::BENCHMARK.md` so a regression shows up as a
divergence from the known-good estimator baseline.

## ADR-155 v2 — cache-first decision explicit

- Decision opens with "cache-first vector execution fabric; federation
  is the refill mechanism", lifts the reviewer's 5-axis decision
  matrix (cache-first wins 4/5 axes).
- New Decision §6 declares the bundle sidecar as the portable unit
  (not the UDF) and documents how the witness acts as the cache-key
  anchor, closing the "cache invalidation drift" failure mode.

## Test + lint status

```
cargo test    -p ruvector-rulake --release                             ✓ 14/0
cargo clippy  -p ruvector-rulake --release --all-targets -- -D warnings  ✓ clean
cargo fmt     -p ruvector-rulake -- --check                            ✓ clean
cargo run     -p ruvector-rulake --release --bin rulake-demo -- --fast  ✓ no regression
```

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 18:46:35 -04:00
ruvnet
3a1afa2284 feat(rulake): vector-native federation intermediary — ADR-155 + MVP crate
Implements the M1 scope of docs/research/ruLake/ as an intermediary that
fans out vector queries across heterogeneous backends (Parquet, BigQuery,
Snowflake, Delta, Iceberg, local) behind a single RVF wire protocol, with
a RaBitQ-compressed cache in front.

## What ships

- **Research docs** under docs/research/ruLake/ (9 files, ~2.5k lines),
  reframed from the earlier "plug RVF into BigQuery" shape to the
  intermediary/federation shape. BigQuery-native compute becomes a Tier-2
  push-down optimization inside the BigQueryBackend adapter, not a new
  product shape.
- **ADR-155 v2** as "Proposed" — captures the seven alternatives
  considered (plug-in-per-lake, standalone vector DB, Iceberg extension,
  Trino connector, JVM intermediary, notebook-only, push-through-only),
  consequences, and eight open questions.
- **crates/ruvector-rulake/** — new workspace member:
  - `BackendAdapter` trait with minimum surface (id / list_collections /
    pull_vectors / generation / supports_pushdown).
  - `LocalBackend` in-memory reference implementation (thread-safe).
  - `VectorCache` wrapping ruvector_rabitq::RabitqPlusIndex, with per-
    collection generation tracking and `Consistency::{Fresh, Eventual}`
    policies.
  - `RuLake` entry point: register backends, search single or federated,
    cache-stats introspection.
  - 7 smoke tests (`tests/federation_smoke.rs`): byte-exact match vs
    direct RaBitQ, cache-coherence after backend mutation, cross-backend
    fan-out with correct score ordering, cache-hit-faster-than-miss,
    three error-path tests.
  - `rulake-demo` bin: unified benchmark producing the same-run table in
    BENCHMARK.md.

## Measured numbers (LocalBackend, D=128, rerank×20, 300 queries)

| n       | direct RaBitQ+ QPS | ruLake Fresh QPS | ruLake Eventual QPS | tax   |
|--------:|-------------------:|-----------------:|--------------------:|------:|
|   5,000 |             17,311 |           17,874 |              17,858 | 0.97× |
|  50,000 |              5,162 |            5,123 |               5,050 | 1.01× |
| 100,000 |              3,122 |            3,117 |               3,114 | 1.00× |

**Intermediary tax is effectively zero on a local backend.** Federated
across 2 shards: 2,470 QPS @ n=100k (0.79× of single-shard); 4 shards:
1,781 QPS (0.57×) — sequential fan-out, parallel merge is the v2
optimisation per ADR-155 §Consequences.

## Build + test status (this crate only)

```
cargo build  -p ruvector-rulake --release                            ✓
cargo test   -p ruvector-rulake --release                            ✓ 7 passed
cargo clippy -p ruvector-rulake --release --all-targets -- -D warnings   ✓ clean
cargo fmt    -p ruvector-rulake -- --check                           ✓ clean
cargo run    -p ruvector-rulake --release --bin rulake-demo          ✓ reproduces BENCHMARK.md
```

## Scope this commit does NOT cover (M2-M5, see 07-implementation-plan.md)

- ParquetBackend, BigQueryBackend, SnowflakeBackend, IcebergBackend,
  DeltaBackend (real-backend adapters).
- Push-down paths into backends with native vector ops.
- Governance / RBAC / PII / lineage / audit (M4).
- SIFT1M recall measurement on the real-backend path.
- Parallel fan-out via rayon.
- LRU cache eviction.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 18:38:49 -04:00