Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.
Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**
Why:
- The existing `*-node` NAPI templates (e.g.
`crates/ruvector-diskann-node/src/lib.rs`) already prove out the
opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
while remaining compatible with asyncio + uvloop.
Milestone shape (each with explicit scope + acceptance tests):
M1 — RaBitQ-only Python wheel. Just the published
`ruvector-rabitq` crate exposed via PyO3. Smallest possible
useful surface. ~600 LoC, 3 weeks.
M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
~900 LoC, 4 weeks.
M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
`ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
dispatch tasks to A2A peers, including signed AgentCard
discovery. ~800 LoC, 4 weeks.
Three acceptance gates that gate the whole effort:
1. A Python user can do RAG over 1 M vectors in <5 lines.
2. An asyncio user can stream A2A task updates without thread
fights.
3. `pip install ruvector` takes <10 s on a stock machine.
Top 3 risks identified:
R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
single lazy runtime, `pyo3-asyncio` shim.
R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
rustls) could blow it. Mitigation: feature-gate axum/reqwest
behind `agent` extra; default install is rabitq + rulake only.
R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
before M1 ships.
Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.
Planning doc only; no implementation.
Co-Authored-By: claude-flow <ruv@ruv.net>
15 KiB
04 — Milestones
Same shape as ADR-159's milestone plan (docs/adr/ADR-159-rvagent-a2a-protocol.md
§ "Implementation plan"). Each milestone has explicit scope, a file list,
a LoC budget, an acceptance test set, the wheel platforms shipped, and the
docs that must land.
The crate crates/ruvector-py/ is created in M1 and grows by one source
module per milestone.
M1 — RaBitQ-only Python wheel
Scope.
- Create the new workspace crate
crates/ruvector-py/withcrate-type = ["cdylib"],pyo3 = "0.22",numpy = "0.22",pyo3-asyncio = "0.22"(or successorpyo3-async-runtimesif pinned),maturinas the build backend inpyproject.toml. - Bind exactly the four index types from
crates/ruvector-rabitq/src/index.rs:FlatF32Index,RabitqIndex,RabitqPlusIndex,RabitqAsymIndex. All four implement theAnnIndextrait there. - Bind
BinaryCodeaccessors for advanced users (ids,norms,packed) even though most users will never touch them — they're cheap to expose and the test suite uses them. - Bind
RandomRotationonly as an opaque handle returned byidx.rotation()— no public constructor in v1. - Persistence: bind
crates/ruvector-rabitq/src/persist.rssoidx.save(path)andIndex.load(path)work forRabitqIndexandRabitqPlusIndex(the.rbpxformat fromlake.rsPERSISTED_INDEX_FILENAME). - Hand-write
python/ruvector/__init__.pyiwith full stubs for the M1 surface. - Set up
pyproject.tomlwith cibuildwheel configured for the five platform wheels listed in02-strategy.md§ "Wheel distribution matrix", abi3-py39. - CI: GitHub Actions workflow
python-wheels.ymlthat builds + tests- uploads to TestPyPI on every PR, PyPI on tag.
- Pure-Python helpers in
python/ruvector/:_version.py(mirrors Cargo version),_typing.py(theAnyIndexruntime-checkable Protocol).
File list.
crates/ruvector-py/
Cargo.toml # ~30 LoC
pyproject.toml # ~80 LoC (cibuildwheel matrix, project meta)
README.md # ~50 LoC, links docs/sdk
src/
lib.rs # ~80 LoC — PyModule init, re-exports
rabitq.rs # ~450 LoC — four index types
error.rs # ~80 LoC — exception hierarchy root + IndexError tree
numpy_util.rs # ~60 LoC — view & dtype enforcement helpers
python/ruvector/
__init__.py # ~40 LoC — re-exports
__init__.pyi # ~200 LoC — typed stubs
py.typed # 0 LoC marker
_version.py # ~5 LoC
_typing.py # ~30 LoC — AnyIndex Protocol
tests/
test_rabitq_basic.py # ~120 LoC
test_rabitq_persist.py # ~80 LoC
test_numpy_interop.py # ~80 LoC
test_errors.py # ~60 LoC
.github/workflows/
python-wheels.yml # ~120 LoC
docs/sdk/notebooks/
01_rag_in_5_lines.ipynb # uses M1 index over toy data
LoC budget. ~700 Rust + ~340 Python tests + ~200 stubs + ~120 CI YAML.
Acceptance tests.
pip install ruvectorfrom TestPyPI on each of the five platforms in02-strategy.md's matrix succeeds in ≤ 10 s on a stock GitHub-hosted runner with a warm pip cache.test_rabitq_basic.py::test_100k_search_under_10ms: build aRabitqPlusIndexover 100,000 random D=128 vectors withrerank_factor=20, run 100 single-query searches, assert p99 latency < 10 ms (mirrorsruvector-rabitq/BENCHMARK.mdbaseline of 1.05 ms/query forRabitqPlus rerank×20with headroom for Python overhead).test_numpy_interop.py::test_zero_copy_search: build an index, pass a contiguousnp.ndarray[np.float32]query, assert the call produces no copy via a memory-tracker fixture.test_rabitq_persist.py::test_roundtrip: save → load → search, assert bit-identical results to a search before save.test_errors.py::test_dim_mismatch: query of wrong dim raisesruvector.DimensionMismatchand the message names both expected and got dim.mypy --strictontests/finds no errors.- Wheel size ≤ 8 MB on linux x86_64, ≤ 12 MB on macOS arm64.
Wheels shipped. All five platforms in 02-strategy.md § "Wheel
distribution matrix". M1 ships nothing else.
Docs.
docs/sdk/notebooks/01_rag_in_5_lines.ipynb— derived fromexamples/refrag-pipeline/. Uses the M1 surface only (no embedder yet — uses pre-computed vectors from a fixture).- Sphinx rST scaffold under
docs/sdk/api/is created but only the RaBitQ section is filled. - Top-level
crates/ruvector-py/README.md.
M2 — ruLake bindings
Scope.
- Add
crates/ruvector-py/src/rulake.rs. Bind:RuLakewith builder (RuLake::new+ chainedwith_*mapped to a Python builder pattern).BackendAdapterexposed as a Python ABC for users to implement; bridges into Rust via aPyBackendAdapterimpl that calls back into the Python class. (This is the only place we need GIL re-acquisition in M2.)LocalBackendandFsBackendas concrete classes.Consistencyas an int-enum.RuLakeBundle,RefreshResult,SearchResult,CacheStats,PerBackendStats.- All
RuLakemethods listed incrates/ruvector-py-survey(i.e.register_backend,search_one,search_federated,search_batch,publish_bundle,refresh_from_bundle_dir,save_cache_to_dir,warm_from_dir,cache_stats*,cache_witness_of,invalidate_cache).
- Add
_asyncsiblings forsearch_one/search_federated/search_batchusingpyo3_asyncio::tokio::future_into_py. The underlying Rust calls are sync today (perlake.rs); async siblings exist so we don't have to break the surface when the Rustasyncrefactor lands. - Initialize the singleton tokio runtime here in M2 (M1 doesn't need it).
- Extend
error.rswith theLakeErrorsubtree.
File list.
crates/ruvector-py/src/
rulake.rs # ~600 LoC
runtime.rs # ~80 LoC — singleton tokio runtime
crates/ruvector-py/python/ruvector/
__init__.pyi # +180 LoC for RuLake surface
crates/ruvector-py/tests/
test_rulake_local.py # ~150 LoC
test_rulake_fs_backend.py # ~120 LoC
test_rulake_async.py # ~100 LoC
test_rulake_witness.py # ~80 LoC
docs/sdk/notebooks/
02_warm_restart_with_witness.ipynb
LoC budget. ~680 Rust + ~450 Python tests + ~180 stub additions.
Acceptance tests.
test_rulake_local.py::test_register_search_local: register aLocalBackendwith 50,000 D=128 vectors, runsearch_one, assert results match a directRabitqPlusIndexsearch.test_rulake_async.py::test_search_one_async_in_event_loop: run 100 concurrentawait lake.search_one_async(...)calls inside a single asyncio event loop, assert they complete in less than 10× the sync time (no thread-fight regression).test_rulake_witness.py::test_publish_refresh_roundtrip: publish bundle, mutate underlying data, re-publish, refresh, assertRefreshResult.INVALIDATED. Mirrorslake.rsrefresh_from_bundle_dircontract.test_rulake_fs_backend.py::test_warm_restart: prime cache, save to disk, kill process, start a freshRuLake,warm_from_dir, assert first search after warmup is < 1.5× steady-state latency.test_rulake_local.py::test_python_backend_adapter: a user-defined Python class subclassesruvector.BackendAdapter, registers, gets called back by ruLake on cache miss. (This is the GIL re-acquisition round-trip.)
Wheels shipped. Same five platforms. Wheel size budget bumps to ≤ 12 MB linux / ≤ 16 MB macOS arm64 (tokio adds ~3 MB).
Docs.
docs/sdk/notebooks/02_warm_restart_with_witness.ipynb— derived fromcrates/ruvector-rulake/examples/warm_restart.rs.- Sphinx page for
ruLakereference complete.
M3 — Embeddings + ML helpers
Scope.
- Add
crates/ruvector-py/src/embed.rs. Bind a singleEmbedderclass with two factory functions:Embedder.from_pretrained(name)for text. Implementation calls intocrates/ruvector-cnn/for image and into a new tinycrates/ruvector-py/src/onnx_embed.rshelper for text (ONNX Runtime viaort2.x). Text models:all-MiniLM-L6-v2first;bge-small-en-v1.5second.Embedder.from_pretrainedwith amobilenetv3-*prefix routes toruvector-cnn'sMobileNetEmbedder(gated on thebackbonefeature inruvector-cnn/Cargo.toml).
- Model weights: download once on first use into the standard
~/.cache/ruvector/models/directory, verify a SHA-256 digest, cache. No bundled weights — the wheel stays small. - Sync
embed, syncembed_batch, asyncembed_batch_async. Async exists so a notebook user can interleave embedding with ruLake ingestion in the same event loop. - Extend
error.rswithEmbedError.
File list.
crates/ruvector-py/src/
embed.rs # ~350 LoC
onnx_embed.rs # ~250 LoC — ort wrapper, model registry
crates/ruvector-py/python/ruvector/
__init__.pyi # +120 LoC for Embedder
_models.py # ~80 LoC — model registry, download paths
crates/ruvector-py/tests/
test_embed_text.py # ~120 LoC
test_embed_image.py # ~100 LoC
test_embed_to_index.py # ~80 LoC — end-to-end RAG
docs/sdk/notebooks/
03_text_to_search.ipynb # full RAG: text → embed → RabitqPlus → search
LoC budget. ~600 Rust + ~300 Python tests + ~200 helpers/stubs.
Acceptance tests.
test_embed_text.py::test_minilm_dim: embed 100 strings, assert shape(100, 384)and dtypefloat32.test_embed_text.py::test_first_use_downloads: in a fresh cache dir,from_pretrained("all-MiniLM-L6-v2")downloads, verifies SHA-256, caches; second call is no-network.test_embed_image.py::test_mobilenetv3_small_dim: embed a (224, 224, 3) image, assert shape(576,)(matchesruvector-cnnMobileNetV3-Small dim).test_embed_to_index.py::test_e2e_rag_under_5_lines: file is the acceptance gate G1 in 06-decision-record. Full pipeline, ≤ 5 significant lines of user code, completes < 30 s on a stock laptop with warm model cache. (Subject to network for the first run only.)- ONNX Runtime is optional: the wheel ships without
ortbundled in; image-only userspip install ruvectorand skip the text path. ImportingEmbedder.from_pretrained("all-MiniLM-...")withoutortraisesEmbedError("install ruvector[text]").
Wheels shipped. Same five platforms; the ruvector wheel does
not bundle ort. We ship a ruvector[text] extra that adds
onnxruntime as a Python-side dep (so wheel size of ruvector itself
stays ≤ 14 MB).
Docs.
docs/sdk/notebooks/03_text_to_search.ipynb.- Sphinx page for
Embedder. - README example block updated.
M4 — A2A client
Scope.
- Add
crates/ruvector-py/src/a2a.rs. Bind fromcrates/rvAgent/rvagent-a2a/src/:A2aClientwithconnect,send_task,get_task,cancel_task,stream_task. All async (the underlying Rust API is async via reqwest). Sync siblings viapyo3_asyncio::tokio::run_until_completefor non-async users.AgentCard,AgentCapabilities,AgentSkill,AgentProvider,AuthScheme,Task,TaskSpec,TaskState,TaskStatus,Message,Part,Role,Artifact,TaskArtifactUpdateEvent,TaskStatusUpdateEvent— all fromrvagent-a2a/src/types.rsandlib.rsre-exports.TaskPolicyfromrvagent-a2a/src/policy.rs. Construction-only on the Python side; not modifiable post-send.TaskUpdatediscriminated dataclass returned bystream_task.
- Verify-on-discover (ADR-159 r2) enabled by default;
strict_verify=Falseis exposed but documented as for-test-only. - We do not bind the A2A server. Server-side rvAgent stays Rust-only in v1.
- Extend
error.rswithA2aErrorsubtree (CardSignatureInvalid,PolicyViolation,BudgetExceeded,TransportError).
File list.
crates/ruvector-py/src/
a2a.rs # ~700 LoC
a2a_types.rs # ~250 LoC — type conversions for AgentCard, Task, Artifact
crates/ruvector-py/python/ruvector/
__init__.pyi # +220 LoC for the A2A surface
crates/ruvector-py/tests/
test_a2a_card.py # ~120 LoC
test_a2a_send_task.py # ~150 LoC
test_a2a_stream.py # ~150 LoC
test_a2a_policy.py # ~80 LoC
docs/sdk/notebooks/
04_dispatch_to_python_peer.ipynb
A test fixture stands up an in-process rvAgent A2A server (using
tokio::test-equivalent in pytest via a test-only Rust binary
launched in a subprocess.Popen). The server lives in
crates/ruvector-py/tests/a2a_test_server/ and is built once per
test session.
LoC budget. ~950 Rust + ~500 Python tests + ~220 stub additions.
Acceptance tests.
test_a2a_card.py::test_fetch_signed_card: connect to the test server, fetch the AgentCard, assert signature verifies andagent_idmatchesSHAKE-256(pubkey).test_a2a_card.py::test_tampered_card_rejected: redirect the client to a tampered/.well-known/agent.json, assertCardSignatureInvalid.test_a2a_send_task.py::test_lifecycle: send a task, poll untilcompleted, assert artifacts present.test_a2a_stream.py::test_stream_no_thread_fight(acceptance gate G2): consume an SSE stream of 1,000 status updates inside a single asyncio event loop alongside two other concurrent ruLakesearch_one_asynccalls; assert no event-loop-blocked warnings, no thread-stuck warnings, total time < 1.2 × the maximum of the three workloads in isolation.test_a2a_policy.py::test_budget_exceeded: send a task that violatesmax_cost_usd, assertPolicyViolationraised before any work begins.
Wheels shipped. Same five platforms. Wheel size budget tops out at
≤ 22 MB linux / ≤ 28 MB macOS arm64 (reqwest + rustls + axum-deps).
This is the size red line; if we trip it we ship the A2A bits as a
ruvector[a2a] extra with a separate wheel ruvector-a2a.
Docs.
docs/sdk/notebooks/04_dispatch_to_python_peer.ipynb— derived fromexamples/a2a-swarm/.- Sphinx page for
A2aClient. - README updated with end-to-end "Python app dispatches to rvAgent" walkthrough.
Total sizing
| Milestone | Rust LoC | Python LoC | Tests LoC | Cum. wheel size | Calendar weeks |
|---|---|---|---|---|---|
| M1 | ~700 | ~75 | ~340 | ≤ 8 MB | 2 |
| M2 | ~680 | ~30 | ~450 | ≤ 12 MB | 3 |
| M3 | ~600 | ~80 | ~300 | ≤ 14 MB | 2.5 |
| M4 | ~950 | ~30 | ~500 | ≤ 22 MB | 3.5 |
| Total | ~2,930 | ~215 | ~1,590 | ≤ 22 MB | ~11 weeks |
Calendar weeks assume one engineer with PyO3 experience working full-time; double if pair-programmed; halve if not done in series (M1 and M3 can parallelize after M1's CI is green).