Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.
Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**
Why:
- The existing `*-node` NAPI templates (e.g.
`crates/ruvector-diskann-node/src/lib.rs`) already prove out the
opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
while remaining compatible with asyncio + uvloop.
Milestone shape (each with explicit scope + acceptance tests):
M1 — RaBitQ-only Python wheel. Just the published
`ruvector-rabitq` crate exposed via PyO3. Smallest possible
useful surface. ~600 LoC, 3 weeks.
M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
~900 LoC, 4 weeks.
M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
`ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
dispatch tasks to A2A peers, including signed AgentCard
discovery. ~800 LoC, 4 weeks.
Three acceptance gates that gate the whole effort:
1. A Python user can do RAG over 1 M vectors in <5 lines.
2. An asyncio user can stream A2A task updates without thread
fights.
3. `pip install ruvector` takes <10 s on a stock machine.
Top 3 risks identified:
R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
single lazy runtime, `pyo3-asyncio` shim.
R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
rustls) could blow it. Mitigation: feature-gate axum/reqwest
behind `agent` extra; default install is rabitq + rulake only.
R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
before M1 ships.
Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.
Planning doc only; no implementation.
Co-Authored-By: claude-flow <ruv@ruv.net>
Two unrelated bits of working-tree state cleaned up alongside the
ADR-159 branch:
1. `.gitignore`: add `.claude/worktrees/` — these are agent worktree
directories created at runtime for per-agent isolation; should
never be committed.
2. `docs/research/ruvllm/`: include 2 research notes from 2026-04-24
that were sitting uncommitted on this working tree. Both are pure
research / pre-design markdown:
- larql-integration.md: LARQL × RuvLLM integration assessment
- rust-rebuild-sota.md: clean-sheet Rust rebuild SOTA survey
`examples/connectome-fly/ui/` remains untracked — the directory has
no source code, only a stale `dist/`, `node_modules/`, and an
orphan `package-lock.json` from an abandoned scaffold. Whoever owns
that example can decide what to do with it.
Co-Authored-By: claude-flow <ruv@ruv.net>
Records the decision to add a third protocol surface (A2A) alongside
the existing rvagent-mcp (agent ↔ tool) and rvagent-acp (client ↔ agent)
stacks. Three review revisions captured in-document:
- r1: shape of the AgentCard, Task lifecycle, JSON-RPC surface
- r2: identity (signed AgentCards), per-task policy, routing selectors,
typed artifacts (RuLakeWitness for zero-copy memory handoff)
- r3: global budget, trace-level causality, recursion guard, artifact
versioning — second-order failure modes only visible under multi-agent
traffic at scale
Three-point acceptance test gates the deliverable:
1. Remote agent call indistinguishable from local
2. Memory transfer size constant regardless of payload
3. Cost bounded under recursive delegation
Implementation status addendum (2026-04-24) records what shipped against
each milestone with proof points.
Co-Authored-By: claude-flow <ruv@ruv.net>
Ships the runnable acceptance test ADR-156 spec'd. Drives a single
LocalBackend through the full substrate contract in one test:
1. Recall: search_one → results
2. Verify: publish_bundle → read_from_dir → verify_witness
→ cache pointer matches on-disk witness
3. Forget: invalidate_cache → pointer is None
4. Rehydrate: next search_one → primes+1, pointer reinstalled
5. Location- results before forget ≡ results after rehydrate
transparency (byte-exact ids + scores at the same seed); the
caller never touched data_ref or knew which tier
served the call
6. Compact: explicitly out of scope per ADR-156 — belongs to
RVM/Cognitum, not the substrate
If this test stays green on every commit, the agent-facing memory
substrate claim is mechanical, not aspirational.
Also closes ADR-156 open question #4 (substrate test needed) as
resolved.
21 federation + 9 bundle + 3 fs_backend = 33 tests passing. Clippy
-D warnings clean.
Co-Authored-By: claude-flow <ruv@ruv.net>
Locks the CPU-first, GPU-optional architecture from the 2026-04-22
strategic review. Scaffolding-only ADR — no kernel implementations
ship with this decision.
Key positions:
1. VectorKernel trait lives in ruvector-rabitq (kernels are RaBitQ
primitives); dispatch lives in ruvector-rulake (has the live
signals — batch size, hit rate, rerank pressure).
2. GPU implementations (CUDA/ROCm/Metal) ship as separate crates
(ruvector-rabitq-cuda, -rocm, -metal) on their own cadence.
Laptop and WASM builds never pay the dep cost.
3. WASM SIMD is feature-gated in ruvector-rabitq itself (same source,
different target).
4. Determinism as a hard gate: scan-phase must be bit-reproducible
across kernels; rerank-phase may be float-nondeterministic but
caps().deterministic=false kernels are refused on Fresh/Frozen
paths. Witness chain stays anchored on data, not kernel identity.
5. Acceptance gate for promotion past experimental:
p95 ≥ 2× lower OR cost per 1M queries ≥ 30% lower,
at identical recall@10 on a reference workload
(clustered D=768 n=1M rerank×20).
Considers and rejects: single-crate GPU kernels (build/CI bloat),
dispatch inside rabitq (wrong info), new ruvector-kernel crate
(premature), feature-flag-only static dispatch (no runtime detection),
wgpu-first (shader model not mature for popcount+reduction).
Co-Authored-By: claude-flow <ruv@ruv.net>
Two changes from the 2026-04-22 strategic review reframing ruLake as
the memory substrate for agent brain systems:
1. Consistency::Frozen variant — caller asserts bundle immutability;
never automatic backend recheck. Maps to "Frozen for audit" from
the reviewer's three-mode product knob. Automatic coherence is
suppressed; explicit refresh_from_bundle_dir still works (lets
operators invalidate frozen caches without needing Fresh mode).
can_skip_check short-circuits when the pointer is already
installed — first prime still runs, subsequent queries never
round-trip to the backend.
Test frozen_consistency_never_rechecks_after_prime: prime → bump
backend → 10 warm searches still hit on the old witness, primes
stay at 1. Explicit refresh on a re-published bundle correctly
reports Invalidated, proving operator control remains.
2. ADR-156 — positioning addendum, not replacement of ADR-155.
ruLake stays as substrate (memory hierarchy); brain system stays
above (memory type, recall policy, mutation policy). Decomposes
the reviewer's "recall / verify / forget / compact / rehydrate"
acceptance test into six guarantees, five of which are shipped.
Rejects:
- absorbing the brain into ruLake (violates substrate separation)
- a new rulake-memory crate (premature; M1 primitives suffice)
- forking into two products (identical properties; no win)
17 federation + 9 bundle + 3 fs_backend = 29 tests passing. Clippy
green.
Co-Authored-By: claude-flow <ruv@ruv.net>
Acts on the 2026-04-22 strategic review. Three changes:
1. Sharpen the one-line decision:
'ruLake is a vector execution cache with deterministic compression
and federated refill.' Federation is the refill mechanism; the
cache is the product surface. Previous framing was correct but
fuzzy on which half was the headline.
2. New M1.5 acceptance test:
'95% of queries return exact top-k without touching the backend.'
Measurable from CacheStats::hit_rate() alone. Replaces the prior
'federation works across 4 shards' gate, which the concurrent
bench showed was a distraction from the real product claim.
3. Strategic questions section — two product choices recorded with
recommendations instead of resolutions:
a) Invisible infrastructure vs user-facing query layer?
→ Recommend invisible first (BQ UDF path).
b) Strict Fresh vs 10× Eventual?
→ Recommend both as a product knob, not a flag.
4. Close per-shard-rerank question (shipped in iter 15) and
cache-first KPI surface question (shipped in iter 14) as
resolved in M1.
Co-Authored-By: claude-flow <ruv@ruv.net>
Iter 12's concurrent benchmark surfaced that K-shard federation pays
~K× rerank work because RaBitQ's rerank runs per-shard on candidates
that can't be globally merged before rerank without an API change.
Fix spec'd precisely so it's easy to land later:
1. ruvector-rabitq: add search_with_rerank(query, k, rerank_factor)
— same body as search() but takes rerank_factor as a parameter.
2. rulake: plumb through VectorCache and RuLake::search_federated
with an optional per_shard_rerank. Default policy: divide by K,
floor 5.
3. Re-bench the concurrent workload to verify; recall@10 should
stay > 85%.
Deferred to M2 because rabitq was just merged and changing its public
API mid-branch is out of scope. Filed as the explicit trigger for
the first rabitq follow-up.
Co-Authored-By: claude-flow <ruv@ruv.net>
Iter 10 shipped the symmetric publish_bundle / refresh_from_bundle_dir
primitives with witness-authenticated handoff. The protocol is:
publisher → atomic-write table.rulake.json
reader → read, verify witness, compare, invalidate if different
Three-state refresh result (UpToDate / Invalidated / BundleMissing)
covers all the daemon's logging / alerting needs. Tampered sidecars
fail loudly instead of silently corrupting the cache.
Move the question from "still open" to "resolved in M1" and drop the
now-stale M2 placeholder.
Co-Authored-By: claude-flow <ruv@ruv.net>
M1 done and benchmarked. Update status from 'Proposed' → 'Accepted (M1)',
collapse the implementation-plan M1 bullet to reflect everything that
actually shipped on the branch, and move the open-question resolutions
into a dedicated "Resolved in M1" block.
New M1 evidence in the ADR:
- Intermediary tax 1.00× at n=100k on LocalBackend
- Byte-exact parity with direct RaBitQ at same (seed, rerank_factor)
- Rayon fan-out 1.97× (2-shard) / 3.86× (4-shard) prime-time speedup
- Recall@10 > 90% gate passes
- Witness-addressed cache sharing verified
- Send+Sync under 8-thread contention
Remaining open questions rewritten for M2 focus:
- Remote-backend tax measurement (Parquet-on-GCS prime)
- Cache sidecar daemon protocol for bundle handoff
- Push-down negotiation policy
- Cost accounting for pushed-down BQ work
Co-Authored-By: claude-flow <ruv@ruv.net>
Applies the reviewer's architectural feedback (docs/research/ruLake/
chat thread): ruLake is a cache-first vector execution fabric, not a
federation engine. Federation is the cache's refill mechanism.
## Perf fix — cache prime now runs lock-free
`VectorCache::prime()` previously built a fresh `RabitqPlusIndex`
(~400 ms at n=100k) while holding the cache mutex, serialising all
other queries. Now builds entirely before touching `inner`; the lock
is only taken to swap the finished entry in. No benchmark regression —
intermediary tax still 1.00× on LocalBackend at n=100k.
## New: bundle sidecar (`table.rulake.json`)
`ruvector_rulake::bundle` — the portable unit that defines ruLake's
reproducibility + governance scope. Flagged by the reviewer as more
important than the UDF because it's what travels between teams,
clouds, and backups.
Carries: `data_ref`, `dim`, `rotation_seed`, `rerank_factor`,
`generation`, `rvf_witness` (SHAKE-256 over the preceding fields),
`pii_policy`, `lineage_id`.
`Generation` is a serde-untagged union of `Num(u64)` (Parquet mtime,
Iceberg version, Snowflake offset) and `Opaque(String)` (UUIDs,
hashes, base64 blobs) — fixes the "u64 doesn't fit an Iceberg snapshot
id" open question from the M1 review.
Witness fn is domain-separated, length-prefixed, and verifiable via
`bundle.verify_witness()`. 6 new tests: determinism,
field-change-detection, length-prefix-anti-collision, serde roundtrip,
tamper-detection, format-version-downgrade-rejected.
## New: recall-vs-brute-force gate
`rulake_recall_at_10_above_90pct_vs_brute_force` — the missing
correctness test. Builds brute-force L2 truth over 5k clustered
Gaussian vectors, asserts ruLake's top-10 hits ≥ 90% at rerank×20.
Uses the same n + cluster-count + methodology as
`ruvector-rabitq::BENCHMARK.md` so a regression shows up as a
divergence from the known-good estimator baseline.
## ADR-155 v2 — cache-first decision explicit
- Decision opens with "cache-first vector execution fabric; federation
is the refill mechanism", lifts the reviewer's 5-axis decision
matrix (cache-first wins 4/5 axes).
- New Decision §6 declares the bundle sidecar as the portable unit
(not the UDF) and documents how the witness acts as the cache-key
anchor, closing the "cache invalidation drift" failure mode.
## Test + lint status
```
cargo test -p ruvector-rulake --release ✓ 14/0
cargo clippy -p ruvector-rulake --release --all-targets -- -D warnings ✓ clean
cargo fmt -p ruvector-rulake -- --check ✓ clean
cargo run -p ruvector-rulake --release --bin rulake-demo -- --fast ✓ no regression
```
Co-Authored-By: claude-flow <ruv@ruv.net>
Main recently merged ADR-151 (Miller-Rabin prime optimizations, PR #358)
and ADR-152 is reserved for Obsidian Brain Plugin (ADR-SYS-152), so
renumber the kalshi integration ADR to 153 to avoid collision.
- Rename docs/adr/ADR-151-kalshi-neural-trader-integration.md →
docs/adr/ADR-153-kalshi-neural-trader-integration.md
- Update 5 references: workspace Cargo.toml comment, the two kalshi
crate descriptions, the lib.rs doc-comment, and the ADR title line.
- Resolve .gitignore: keep both trailing additions (.kalshi + bench_data/).
Co-Authored-By: claude-flow <ruv@ruv.net>
New crate ruvector-kalshi: RSA-PSS-SHA256 signer (PKCS#1/#8), GCS/local/env
secret loader with 5-min cache, typed REST + WS DTOs, Kalshi→MarketEvent
normalizer (reuses neural-trader-core), transport-free FeedDecoder,
reqwest-backed REST client with live-trade env gate, and an offline
sign+verify example that validates against the real PEM.
New crate neural-trader-strategies: venue-agnostic Strategy trait, Intent
type, RiskGate (position cap, daily-loss kill, concentration, min-edge,
live gate, cash check), and ExpectedValueKelly prior-driven strategy.
36 unit tests pass across both crates. End-to-end offline validation
confirmed against the real Kalshi PEM via both local and GCS sources.
Co-Authored-By: claude-flow <ruv@ruv.net>
Phase 0 implementation revealed that the original PRD §6 targets
(50 ns / 200 ns for is_prime_u64 worst case) were structurally
unachievable in safe Rust on Apple-silicon. Apples-to-apples competitor
benchmark in the same binary on the same machine measured num-prime
0.4.4 at 884 ns vs ours at 15.63 µs — ~17.7× headroom recoverable via
Montgomery reduction in Phase 0.1, but not the ~300× the original target
implied. The 50 ns figure was a pre-implementation estimate that did not
survive contact with measured hardware.
ADR-151 (docs/adr/ADR-151-miller-rabin-prime-optimizations.md)
- Status promoted from "Proposed" to "Accepted (Phase 0 landed
2026-04-16; performance targets revised)".
- New "Phase 0 Findings (2026-04-16)" section documenting what landed,
measurements vs original targets, num-prime competitor baseline, the
revised target band, and Phase 0.1 scope (Montgomery only).
- Explicit rejection of swapping to the empirical 7-witness set:
Sinclair-12 is theorem-proven across all u64; the 7-witness sets in
the literature are empirically tested up to 2^64 but not proven, and
swapping invalidates the A014233(11) canary in the pseudoprime test.
PRD §6 (docs/research/miller-rabin-optimizations/PRD.md)
- Revision header noting the relaxation.
- is_prime_u64(p) worst-case row updated to ≤ 1 µs (was 50 ns) M-series
/ ≤ 4 µs (was 200 ns) WASM.
- New §6.1 "Empirical findings (Phase 0)" with the measurement table
and the num-prime baseline data.
GROK-REVIEW-REQUEST.md (new, 424 lines)
- Self-contained briefing used to obtain external Grok review of the
Phase 0 design and Phase 0.1 plan: §1 binding context, §2 implementation
embedded verbatim, §3 measurements + competitor baseline, §4 four-section
ask (correctness, perf plan ranked, architecture, validation
methodology), §5 response format. Constraints block forbids
"just use num-prime" answers and pins the canary witness set.
Adds the binding ADR and full PRD for the Prime-Indexed Acceleration
Layer (PIAL): a single ~250-LoC Miller-Rabin primality utility in
crates/ruvector-collections that unblocks five independent prime-aware
optimizations across hashing, sharding, sketching, and the pi-brain
witness chain.
Use cases:
* Shard-router prime modulus — closes ADR-058 finding #6
* HNSW prime-bucket adjacency — micro-hnsw-wasm, hyperbolic-hnsw
* Certified-prime LSH modulus — sparsifier, attn-mincut
* Witness-chain ephemeral primes — pi-brain brain_share payload
* Anti-aliasing prime strides — sparsifier sampler
Generation strategy combines a compile-time table of primes near 2^k
(fast path, ~1ns) with a Miller-Rabin descent fallback (~250ns). The
table is generated by build.rs from the MR implementation and
cross-checked against MR in CI, so MR remains the source of truth.
Includes HANDOFF.md with Phase 0 deliverables for the next session.
ADR and PRD pin acceptance criteria, performance targets, and a
six-phase rollout (each phase ships as a separate PR).
The Related field incorrectly referenced ADR-003 as KV Cache and
ADR-005 as LoRA Adapter Loading. In the actual repo:
- ADR-003 is SIMD Optimization Strategy
- ADR-004 is KV Cache Management (correct target)
- ADR-005 is WASM Runtime Integration (correct name)
No LoRA Adapter Loading ADR exists; ADR-005 (WASM) is the genuine
related decision for memory management concerns.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Offload embedding from Cloud Run HashEmbedder (128-dim, hash-based) to
local RuvLtra Q4 transformer (896-dim, ANE-optimized, with SONA learning).
Architecture:
- Mac Mini runs new ruvltra-embed-server binary on :8090
- Tailscale mesh VPN connects Cloud Run brain to Mac Mini
- TailscaleEmbedder variant added to brain embedder chain
- HashEmbedder fallback on unreachable endpoint
- 3-week migration plan for 10K existing memories
Expected: 7x semantic info per embedding, NDCG@10 0.3→0.85,
$0/month cost (Tailscale free, Mac Mini already on), 50ms per embed
(acceptable on write path).
Co-Authored-By: claude-flow <ruv@ruv.net>
Add deep research into three-axis KV cache compression:
- TriAttention (arXiv:2604.04921): trigonometric RoPE-based token sparsity, 10.7x
- Stacked compression: TriAttention × TurboQuant for ~50x KV reduction
- ADR-147: formal architecture decision with GOAP implementation plan
No published work combines these orthogonal methods. First-mover opportunity
for ruvLLM edge inference (128K context in 175MB on Pi 5).
Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
ADR-145: Fix training pipeline issues across WASM and NAPI bindings.
WASM (ruvector-attention-wasm):
- Replace serde_wasm_bindgen deserialization of negatives param with
explicit js_sys::Float32Array conversion. TypedArrays don't
deserialize via serde — use js_sys::Array iteration instead.
NAPI (ruvector-attention-node):
- Add stepInPlace() to SGD, Adam, AdamW optimizers for zero-copy
in-place parameter mutation via Float32Array's AsMut<[f32]>
- Document that step() returns a NEW array (callers must use return)
Note: LoRA B=0 initialization in learning-wasm is correct by design
(Hu et al. 2021) — documented in ADR-145, no code change needed.
Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
Full 981-module output too large for git (621MB).
Available as GitHub release download (121MB tar.gz):
https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v2.1.91
Repo keeps: modules-manifest.json (lists all 661 modules),
witness.json, metrics.json, README.md
Co-Authored-By: claude-flow <ruv@ruv.net>
Full decompile: 885/885 modules parse (100%)
Manifest lists all modules with sizes.
Full source too large for git (419MB) — generate via:
cargo run --release -p ruvector-decompiler --example run_on_cli -- \
$(npm root -g)/@anthropic-ai/claude-code/cli.js --output-dir ./decompiled
Co-Authored-By: claude-flow <ruv@ruv.net>
v2 model trained on 8,201 pairs (5x expansion):
- Val accuracy: 75.7% → 95.7% (+20 points)
- Val loss: 0.914 → 0.149 (6x improvement)
- Beats JSNice (63%), DIRE (65.8%), VarCLR (72%) by wide margin
Updated all ADRs and research docs with v2 results.
Exported weights-v2.bin (2.6MB) for pure Rust inference.
Co-Authored-By: claude-flow <ruv@ruv.net>
SOTA research: added implementation status table, validation results
showing 75.7% accuracy beating JSNice (63%), DIRE (65.8%), VarCLR (72%).
Model weight analysis: added Section 8 with trained model details,
inference backends, training pipeline, and ADR status.
Co-Authored-By: claude-flow <ruv@ruv.net>