# 06 — Decision Record (one-page summary) ## The chosen strategy **A new in-tree workspace crate `crates/ruvector-py/` exposes the Rust SDK through PyO3, built and distributed as a single abi3-py39 wheel via maturin + cibuildwheel.** Async surfaces use `pyo3-asyncio` over a singleton tokio runtime; vector inputs are accepted as zero-copy `np.ndarray[np.float32]`; type stubs are hand-written and shipped with `py.typed`. ## Roadmap | M | Scope | Rust LoC | Wheel cap | Calendar | |---|---|---:|---:|---:| | **M1** | RaBitQ index (`FlatF32`, `Rabitq`, `RabitqPlus`, `RabitqAsym`); persistence; CI publishing pipeline. | ~700 | 8 MB | 2 wk | | **M2** | ruLake (`RuLake` builder, `LocalBackend` / `FsBackend` / Python `BackendAdapter` ABC); witness operations; sync + async search; tokio runtime singleton. | ~680 | 12 MB | 3 wk | | **M3** | Embeddings (`Embedder.from_pretrained` for MiniLM-text and MobileNetV3-image); HF model cache + SHA-256 verification. | ~600 | 14 MB | 2.5 wk | | **M4** | A2A client (`A2aClient.connect/send_task/stream_task/cancel_task`); typed AgentCard / Task / Artifact; signed card verify-on-discover. | ~950 | 22 MB | 3.5 wk | | **Total** | — | **~2,930** | **22 MB** | **~11 wk** | (One full-time engineer with PyO3 experience. Sequenceable; M3 may parallelize after M1 ships.) ## Three acceptance gates that gate the whole effort **G1 — RAG in 5 lines.** A user types ≤ 5 significant lines of Python to embed a corpus, build an index, and query it with sub-10-ms p99 latency on 100k D=128 vectors. Concretely: ```python import ruvector, numpy as np emb = ruvector.Embedder.from_pretrained("all-MiniLM-L6-v2") idx = ruvector.RabitqPlusIndex.build(emb.embed_batch(corpus), seed=42, rerank_factor=20) hits = idx.search(emb.embed("my query"), k=10) print([(h.id, h.score) for h in hits]) ``` This gate clears at the end of M3. **G2 — asyncio without thread fights.** A user awaits an A2A SSE stream of 1,000 status updates concurrently with two ruLake `search_one_async` calls inside a single asyncio event loop, with no event-loop-blocked warnings, no thread-stuck warnings, and total wall time within 1.2× of the maximum of the three workloads in isolation. This gate clears at the end of M4 and is enforced by `tests/test_a2a_stream.py::test_stream_no_thread_fight`. **G3 — `pip install ruvector` is instant.** On a stock Linux x86_64 GitHub Actions runner with a warm pip cache, `pip install ruvector` from PyPI completes in ≤ 10 s. This is the "we ship a binary wheel, not a sdist" gate. Enforced as a CI step that fails the release if the timing regresses. This gate clears at the end of M1 and stays clear forever. ## Open questions for stakeholders before M1 **O1 — PyPI name.** Is `ruvector` available on PyPI? If not, do we negotiate transfer, register `ruvector-py`, or pick something else? Owner: project lead. Resolution required before M1 PR is opened. **O2 — Python version floor.** abi3-py39 covers Python 3.9–3.14+. Are we comfortable dropping support for 3.8 (which is EOL but still deployed)? This document assumes yes. Owner: product. **O3 — Tokio runtime sizing default.** This document picks `min(8, os.cpu_count())`. Is that right for the typical ruvector user? A serving deployment on a 96-core box might want more. Decision can slide post-M2 (env var override is cheap to add) but the default needs to be picked once. Owner: performance engineer. **O4 — `ort` (ONNX Runtime) coupling for M3.** The plan is to **not** bundle `ort` and instead expose `ruvector[text]` as a Python extra that pulls `onnxruntime` from PyPI. Confirm this is acceptable from a "works out of the box" UX perspective. Owner: product. **O5 — Where does the Python A2A *server* live?** Plan deliberately ships only the client in M4. If/when a Python user wants to host an A2A peer from inside their Python process, do they (a) embed the Rust server via PyO3, (b) run an external rvAgent binary, or (c) re-implement the server in Python? This document says (b). Owner: rvAgent maintainer. **O6 — Stable-ABI commitment.** abi3-py39 is a forward commitment: once published, downgrading to "version-specific" wheels is a breaking change for users on niche Python builds. Confirm we're willing to make that commitment. Owner: maintainer. ## What "done" looks like When M4 ships: - `pip install ruvector` works on Linux x86_64/arm64, macOS x86_64/arm64, Windows x86_64. - `import ruvector` exposes vector indexes, ruLake, embedders, and the A2A client. - 100% of the public surface has hand-written type stubs. - CI gates all three acceptance gates G1, G2, G3 on every PR. - Four notebooks (`docs/sdk/notebooks/01..04`) walk a new user from hello-world to multi-agent dispatch. - A single PyO3 crate at `crates/ruvector-py/` is the only place Python-related Rust code lives. ## Rejected alternatives (one-liners) - **CFFI** — strictly worse than PyO3 for this code. - **wasmtime-py** — loses native perf, requires writing missing WASM crates first, drags 6 MB runtime. - **gRPC service + thin client** — wrong architectural shape for a vector index. - **One-wheel-per-Python-version** — abi3 collapses the matrix. - **Separate `ruvnet/ruvector-py` repo** — breaks the single-PR cross-binding diff workflow that NAPI bindings already enjoy. ## Source pointers - This plan: `docs/sdk/INDEX.md` and siblings 01–06. - Survey of existing ruvector code: `docs/sdk/01-survey.md`. - Strategy defense: `docs/sdk/02-strategy.md`. - API sketch: `docs/sdk/03-api-surface.md`. - Milestone breakdown: `docs/sdk/04-milestones.md`. - Risks: `docs/sdk/05-risks-and-tradeoffs.md`. - Reference Rust APIs: `crates/ruvector-rabitq/src/lib.rs`, `crates/ruvector-rulake/src/lib.rs`, `crates/rvAgent/rvagent-a2a/src/lib.rs`. - NAPI binding template (mirror this style in PyO3): `crates/ruvector-diskann-node/src/lib.rs`. - Anchor ADRs: ADR-154 (RaBitQ), ADR-155 (ruLake), ADR-159 (A2A).