ruvector/crates/ruvector-rairs/src
rUv 8f97421297
research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459)
* feat(rairs-ivf): add RAIRS IVF — ruvector's first Inverted File Index (ADR-193)

Implements Yang & Chen, SIGMOD 2026 (arXiv:2601.07183): three variants of
IVF with Redundant Assignment + Amplified Inverse Residual + SEIL layout.

Three measurable variants (N=5K, D=128, 64 clusters, cargo --release):
  IvfFlat      nprobe=1 recall@10  61.3%  mem 2,571 KB  26,984 QPS
  RairsStrict  nprobe=1 recall@10  83.8%  mem 5,110 KB  13,243 QPS
  RairsSeil    nprobe=1 recall@10  93.1%  mem 2,571 KB  13,582 QPS

RairsSeil: +31.8 pp recall at nprobe=1 vs IvfFlat with identical memory.

Files:
  crates/ruvector-rairs/         — new crate (IvfFlat, RairsStrict, RairsSeil)
  docs/adr/ADR-193-rairs-ivf.md  — architecture decision record
  docs/research/nightly/2026-05-12-rairs-ivf/README.md — SOTA survey + results
  Cargo.toml                     — workspace member added

10/10 unit tests pass. cargo build --release -p ruvector-rairs green.

* perf(ruvector-rairs): SIMD-friendly distance kernels + partial-select top-k; fix clippy/fmt; flag unverified citation

Optimizations (recall unchanged; ~2.3–2.9× single-thread QPS across all
variants/nprobe on x86-64):
- index.rs: rewrite l2sq/dot as 8-lane unrolled reductions so LLVM
  auto-vectorises the f32 accumulation (the naïve iter().sum() can't — f32
  add isn't associative). This is the hot path: every centroid scan + every
  list-entry distance.
- index.rs: add finalize_topk() / top_nprobe_centroids() using
  select_nth_unstable (O(n) avg) instead of full O(n log n) sorts of every
  candidate / every centroid; all three search() impls use them. Distance
  ordering switched to f32::total_cmp — no more partial_cmp().unwrap() panics.
- rairs.rs: rair_score is now allocation-free (no per-call Vec for the diff);
  search() dedups ids with a reused bool scratch array instead of allocating
  a HashSet per query.
- seil.rs: block-visited dedup uses a flat bool array indexed via per-list
  prefix sums instead of a per-query HashSet<(usize,usize)>.

Fixes:
- clippy `-D warnings` now passes: documented the 6 RairsError struct fields
  + RairsSeil::lambda; elided the explicit lifetime on resolve_block.
- cargo fmt --check now passes (benches/rairs_bench.rs import ordering, etc.).
- lib.rs + ADR-193 + the research README now carry a Provenance note: the
  "RAIRS/SEIL" names and the SIGMOD-2026 / arXiv:2601.07183 citation are
  unverified; the crate is an original implementation of the redundant-
  assignment idea (cf. IVF spill lists / SOAR / multi-probe LSH) and should
  be judged on src/main.rs's reproducible benchmarks, not the reference.

cargo test -p ruvector-rairs: 10/10 pass; recall@10 at nprobe∈{1,4,16}
unchanged (61.3/97.9/100 IvfFlat, 83.8/99.4/100 RairsStrict,
93.1/99.9/100 RairsSeil); index memory unchanged.

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: ruvnet <ruvnet@gmail.com>
2026-05-12 09:47:19 -04:00
..
error.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
index.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
ivf.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
kmeans.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
lib.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
main.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
rairs.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00
seil.rs research(nightly): rairs-ivf — RAIRS IVF, ruvector's first Inverted File Index (ADR-193) (#459) 2026-05-12 09:47:19 -04:00