ruvector/docs/sdk/06-decision-record.md
ruvnet f6c684aba0 docs(sdk): add deep planning review for ruvector Python SDK
Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.

Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**

Why:
- The existing `*-node` NAPI templates (e.g.
  `crates/ruvector-diskann-node/src/lib.rs`) already prove out the
  opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
  ~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
  to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
  while remaining compatible with asyncio + uvloop.

Milestone shape (each with explicit scope + acceptance tests):

  M1 — RaBitQ-only Python wheel. Just the published
       `ruvector-rabitq` crate exposed via PyO3. Smallest possible
       useful surface. ~600 LoC, 3 weeks.
  M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
       ~900 LoC, 4 weeks.
  M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
       `ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
  M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
       dispatch tasks to A2A peers, including signed AgentCard
       discovery. ~800 LoC, 4 weeks.

Three acceptance gates that gate the whole effort:
  1. A Python user can do RAG over 1 M vectors in <5 lines.
  2. An asyncio user can stream A2A task updates without thread
     fights.
  3. `pip install ruvector` takes <10 s on a stock machine.

Top 3 risks identified:
  R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
       single lazy runtime, `pyo3-asyncio` shim.
  R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
       rustls) could blow it. Mitigation: feature-gate axum/reqwest
       behind `agent` extra; default install is rabitq + rulake only.
  R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
       before M1 ships.

Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.

Planning doc only; no implementation.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-25 20:28:54 -04:00

5.8 KiB
Raw Permalink Blame History

06 — Decision Record (one-page summary)

The chosen strategy

A new in-tree workspace crate crates/ruvector-py/ exposes the Rust SDK through PyO3, built and distributed as a single abi3-py39 wheel via maturin + cibuildwheel. Async surfaces use pyo3-asyncio over a singleton tokio runtime; vector inputs are accepted as zero-copy np.ndarray[np.float32]; type stubs are hand-written and shipped with py.typed.

Roadmap

M Scope Rust LoC Wheel cap Calendar
M1 RaBitQ index (FlatF32, Rabitq, RabitqPlus, RabitqAsym); persistence; CI publishing pipeline. ~700 8 MB 2 wk
M2 ruLake (RuLake builder, LocalBackend / FsBackend / Python BackendAdapter ABC); witness operations; sync + async search; tokio runtime singleton. ~680 12 MB 3 wk
M3 Embeddings (Embedder.from_pretrained for MiniLM-text and MobileNetV3-image); HF model cache + SHA-256 verification. ~600 14 MB 2.5 wk
M4 A2A client (A2aClient.connect/send_task/stream_task/cancel_task); typed AgentCard / Task / Artifact; signed card verify-on-discover. ~950 22 MB 3.5 wk
Total ~2,930 22 MB ~11 wk

(One full-time engineer with PyO3 experience. Sequenceable; M3 may parallelize after M1 ships.)

Three acceptance gates that gate the whole effort

G1 — RAG in 5 lines. A user types ≤ 5 significant lines of Python to embed a corpus, build an index, and query it with sub-10-ms p99 latency on 100k D=128 vectors. Concretely:

import ruvector, numpy as np
emb = ruvector.Embedder.from_pretrained("all-MiniLM-L6-v2")
idx = ruvector.RabitqPlusIndex.build(emb.embed_batch(corpus), seed=42, rerank_factor=20)
hits = idx.search(emb.embed("my query"), k=10)
print([(h.id, h.score) for h in hits])

This gate clears at the end of M3.

G2 — asyncio without thread fights. A user awaits an A2A SSE stream of 1,000 status updates concurrently with two ruLake search_one_async calls inside a single asyncio event loop, with no event-loop-blocked warnings, no thread-stuck warnings, and total wall time within 1.2× of the maximum of the three workloads in isolation.

This gate clears at the end of M4 and is enforced by tests/test_a2a_stream.py::test_stream_no_thread_fight.

G3 — pip install ruvector is instant. On a stock Linux x86_64 GitHub Actions runner with a warm pip cache, pip install ruvector from PyPI completes in ≤ 10 s. This is the "we ship a binary wheel, not a sdist" gate. Enforced as a CI step that fails the release if the timing regresses.

This gate clears at the end of M1 and stays clear forever.

Open questions for stakeholders before M1

O1 — PyPI name. Is ruvector available on PyPI? If not, do we negotiate transfer, register ruvector-py, or pick something else? Owner: project lead. Resolution required before M1 PR is opened.

O2 — Python version floor. abi3-py39 covers Python 3.93.14+. Are we comfortable dropping support for 3.8 (which is EOL but still deployed)? This document assumes yes. Owner: product.

O3 — Tokio runtime sizing default. This document picks min(8, os.cpu_count()). Is that right for the typical ruvector user? A serving deployment on a 96-core box might want more. Decision can slide post-M2 (env var override is cheap to add) but the default needs to be picked once. Owner: performance engineer.

O4 — ort (ONNX Runtime) coupling for M3. The plan is to not bundle ort and instead expose ruvector[text] as a Python extra that pulls onnxruntime from PyPI. Confirm this is acceptable from a "works out of the box" UX perspective. Owner: product.

O5 — Where does the Python A2A server live? Plan deliberately ships only the client in M4. If/when a Python user wants to host an A2A peer from inside their Python process, do they (a) embed the Rust server via PyO3, (b) run an external rvAgent binary, or (c) re-implement the server in Python? This document says (b). Owner: rvAgent maintainer.

O6 — Stable-ABI commitment. abi3-py39 is a forward commitment: once published, downgrading to "version-specific" wheels is a breaking change for users on niche Python builds. Confirm we're willing to make that commitment. Owner: maintainer.

What "done" looks like

When M4 ships:

  • pip install ruvector works on Linux x86_64/arm64, macOS x86_64/arm64, Windows x86_64.
  • import ruvector exposes vector indexes, ruLake, embedders, and the A2A client.
  • 100% of the public surface has hand-written type stubs.
  • CI gates all three acceptance gates G1, G2, G3 on every PR.
  • Four notebooks (docs/sdk/notebooks/01..04) walk a new user from hello-world to multi-agent dispatch.
  • A single PyO3 crate at crates/ruvector-py/ is the only place Python-related Rust code lives.

Rejected alternatives (one-liners)

  • CFFI — strictly worse than PyO3 for this code.
  • wasmtime-py — loses native perf, requires writing missing WASM crates first, drags 6 MB runtime.
  • gRPC service + thin client — wrong architectural shape for a vector index.
  • One-wheel-per-Python-version — abi3 collapses the matrix.
  • Separate ruvnet/ruvector-py repo — breaks the single-PR cross-binding diff workflow that NAPI bindings already enjoy.

Source pointers

  • This plan: docs/sdk/INDEX.md and siblings 0106.
  • Survey of existing ruvector code: docs/sdk/01-survey.md.
  • Strategy defense: docs/sdk/02-strategy.md.
  • API sketch: docs/sdk/03-api-surface.md.
  • Milestone breakdown: docs/sdk/04-milestones.md.
  • Risks: docs/sdk/05-risks-and-tradeoffs.md.
  • Reference Rust APIs: crates/ruvector-rabitq/src/lib.rs, crates/ruvector-rulake/src/lib.rs, crates/rvAgent/rvagent-a2a/src/lib.rs.
  • NAPI binding template (mirror this style in PyO3): crates/ruvector-diskann-node/src/lib.rs.
  • Anchor ADRs: ADR-154 (RaBitQ), ADR-155 (ruLake), ADR-159 (A2A).