Commit graph

2 commits

Author SHA1 Message Date
ruvnet
d60c802889 feat(rabitq,rulake): external_ids accessor + warm_restart example — close wave-5 gaps
=== Agent A: rabitq — non-dense ID preservation through persist ===
crates/ruvector-rabitq/src/{index,persist}.rs

Wave-5's warm_from_dir collapsed external u64 ids to (0..n) identity
because RabitqPlusIndex lacked an outer ids accessor. Surprise finding:
the persist LOAD path was already id-preserving — the pipeline reads
`id:u32` from disk and hands (id, v) into from_vectors_parallel, which
writes `id` into inner.ids. The only missing piece was the outer-layer
accessor so ruLake could read them back.

Added:
  - RabitqPlusIndex::external_ids(&self) -> &[u32]  (thin forward)
  - RabitqPlusIndex::ids_u64(&self)    -> Vec<u64>  (widening clone)

Regression test `persist_preserves_non_dense_ids` builds an index with
non-dense external ids (13*i + 7 for i in 0..50), save/load, asserts
byte-identical ids after round-trip. 37 → 38 rabitq tests.

=== rulake: drop the (0..n) workaround ===
crates/ruvector-rulake/src/lake.rs

warm_from_dir now calls `idx.ids_u64()` instead of synthesizing
(0..n). Non-dense external ids round-trip faithfully. The
~15-line inline comment documenting the old limitation is gone;
replaced with a 4-line pointer to the wave-6 close.

=== Agent B: warm_restart runnable example ===
crates/ruvector-rulake/examples/warm_restart.rs (new)

Runnable demo of the full save → ship → warm-restart cycle:
  - Phase 1: prime from backend, save to disk
  - Phase 2: spin up a FRESH RuLake with NO backend, warm_from_dir,
    query, assert warm_installs=1 / primes=0
  - Phase 3: cold-prime from backend for comparison
  - Final: report cold/warm speedup

Measured at n=5000 D=128 (agent's single-run numbers):
  Phase 1 prime:     5.03 ms
  save_cache_to_dir: 3.44 ms  (2.46 MiB rbpx)
  Phase 2 warm:      5.00 ms  (warm_installs=1, primes=0)
  Phase 3 cold:      3.60 ms
  Speedup cold/warm: 0.70×

Honest finding: at n=5k D=128, cold-prime is actually faster than
warm-load because our parallel prime is <5ms and parsing 2.5 MB of
rbpx is slower. The warm-restart win shows up at larger n where
compression dominates; documented in the example's closing block.

Steady-state QPS matches within 1.2% (same compressed index in both).

38 rabitq + 21 rulake lib + 22 rulake federation = 81 tests. Clippy
-D warnings clean across both crates.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 23:52:51 -04:00
ruvnet
2bdfd342e3 feat(rabitq,rulake): VectorKernel + memory_class + per-collection stats + sidecar example
Four in-scope M1 items from the remaining backlog, landed together
because they cross-cut cleanly.

Iter 23 (rabitq): VectorKernel trait + CpuKernel default
  - Trait: id(), caps() → KernelCaps, scan(ScanRequest) → ScanResponse.
    Scan-phase determinism is the hard contract; rerank-phase nondet
    is declared via caps().deterministic = false and the caller's
    dispatch policy filters those out of Fresh/Frozen paths (ADR-157).
  - CpuKernel wraps RabitqPlusIndex::search_with_rerank, always
    available, unbounded dim, deterministic.
  - Tests: CPU kernel matches direct search byte-exactly + respects
    per-call rerank override + caps advertised correctly.

Iter 24 (rulake): memory_class on RuLakeBundle (ADR-156)
  - Opaque caller-defined tag — agent systems write "episodic" /
    "semantic" / etc; ruLake stores but never interprets.
  - Not part of the witness: two bundles with identical data but
    different memory_class share the cache.
  - Serde default+skip_if_none keeps old bundles forward-compatible.
  - Test: roundtrip + witness-unchanged + legacy bundles without the
    field still parse.

Iter 25 (rulake): examples/sidecar_daemon.rs
  - Runnable demo of publish_bundle / refresh_from_bundle_dir pair.
  - Publisher mutates backend + re-publishes; daemon poll loop
    detects witness change, invalidates; next query re-primes.
  - Includes a bug fix in refresh_from_bundle_dir: when the cache
    pointer is None (already invalidated), report UpToDate instead
    of Invalidated so daemons don't re-fire on every poll between
    "we invalidated" and "somebody queried."

Iter 26 (rulake): CacheStats::stats_by_collection
  - Per-(backend, collection) counters, one level finer than
    stats_by_backend. Operators can identify which specific
    collection is hot and pin it in LRU or increase its shard count.

21 federation + 11 bundle + 3 fs_backend + 3 kernel = 38 tests
passing across both crates. Clippy -D warnings clean. Example runs
end-to-end.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-23 21:27:04 -04:00