Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.
Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**
Why:
- The existing `*-node` NAPI templates (e.g.
`crates/ruvector-diskann-node/src/lib.rs`) already prove out the
opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
while remaining compatible with asyncio + uvloop.
Milestone shape (each with explicit scope + acceptance tests):
M1 — RaBitQ-only Python wheel. Just the published
`ruvector-rabitq` crate exposed via PyO3. Smallest possible
useful surface. ~600 LoC, 3 weeks.
M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
~900 LoC, 4 weeks.
M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
`ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
dispatch tasks to A2A peers, including signed AgentCard
discovery. ~800 LoC, 4 weeks.
Three acceptance gates that gate the whole effort:
1. A Python user can do RAG over 1 M vectors in <5 lines.
2. An asyncio user can stream A2A task updates without thread
fights.
3. `pip install ruvector` takes <10 s on a stock machine.
Top 3 risks identified:
R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
single lazy runtime, `pyo3-asyncio` shim.
R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
rustls) could blow it. Mitigation: feature-gate axum/reqwest
behind `agent` extra; default install is rabitq + rulake only.
R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
before M1 ships.
Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.
Planning doc only; no implementation.
Co-Authored-By: claude-flow <ruv@ruv.net>
6.2 KiB
01 — Survey: What ruvector Ships Today
Snapshot taken 2026-04-25 against main at commit 2e68f0c9f.
Workspace shape
crates/contains ~110 directories. The workspaceCargo.tomlhas 96 activemembers =entries (rest areexcluded for env-specific build reasons —ruvector-postgresneedspgrx,mcp-brain-serveris private, the hyperbolic-hnsw pair is intentionally out of the default workspace).- Workspace version pin is
2.2.0for first-partyruvector-*crates;rvAgent/*crates are independently versioned at0.1.0. - The two crates that have actual
[package].descriptiontext indicating a consumer-facing v1 are:ruvector-rabitq— "RaBitQ: rotation-based 1-bit quantization for ultra-fast approximate nearest-neighbor search with theoretical error bounds." No NAPI/wasm sibling crate. Pure Rust, 9 source files, ~3,700 LoC, the trait surface isAnnIndexover four index variants (FlatF32Index,RabitqIndex,RabitqPlusIndex,RabitqAsymIndex). Already published on crates.io at2.2.0per the workspace version.ruvector-rulake— "ruLake — vector-native federation intermediary over heterogeneous backends (ADR-155)." Depends onruvector-rabitq. 7 source files, ~3,100 LoC. Public surface isRuLake,BackendAdapter,LocalBackend,FsBackend,VectorCache,RuLakeBundle. Methods onRuLakeincludesearch_one,search_federated,search_batch,publish_bundle,refresh_from_bundle_dir,save_cache_to_dir,warm_from_dir. All sync (noasync).
These are the obvious starter targets — they're recent, they're small,
they're the ones the ADR pair (ADR-154 + ADR-155) is shipping behind, and
they're the only crates whose names appear in the workspace member list
ahead of ruvector-core.
Existing FFI surfaces (the templates we copy)
NAPI-RS bindings (Node.js)
The workspace has 14 *-node crates wired through napi-derive 2.16. The
cleanest minimal template is crates/ruvector-diskann-node/src/lib.rs —
one file, ~250 LoC, wraps ruvector-diskann with:
#[napi(object)]config struct (DiskAnnOptions).#[napi]result struct (DiskAnnSearchResult).#[napi]opaque handle holdingArc<RwLock<CoreIndex>>.- Sync methods (
insert,insert_batch,search). - Async methods via
tokio::task::spawn_blocking+.awaiton the JoinHandle (build_async).
This shape — opaque handle, Arc<RwLock<inner>>, sync + spawn_blocking
async pair — is the existing house style. PyO3 bindings should mirror
it module-for-module so reviewers can diff them against each other and so
behaviour is identical across language clients.
wasm-bindgen modules (browser / Node)
There are ~30 *-wasm crates. They use wasm-bindgen 0.2 + js-sys 0.3
- a
getrandomshim (features = ["wasm_js"]) that's the workspace default. Pattern is identical: opaque handle, sync methods only (WASM has no real threads in stable browsers without SharedArrayBuffer gymnastics).
WASM is relevant to the SDK strategy as an alternative-not-taken (see 02-strategy), not as a code-share opportunity.
Raw cbindgen / FFI
crates/ruvector-router-ffi is the only -ffi crate. C ABI. We do not
use it. Mentioning here because someone will ask.
What's published
ruvector-rabitqandruvector-rulake— both at workspace version2.2.0. These are the v1 consumer-facing crates.- npm packages:
npm/packages/has 57 directories. The flagshipruvectornpm package is at0.2.23and pulls in@ruvector/core(0.1.25),@ruvector/attention(0.1.3),@ruvector/gnn(0.1.22),@ruvector/sona(0.1.4) — i.e. the JS/TS story is fragmented: one umbrella package over four core sub-packages, each backed by a*-nodecrate. The umbrella also bundles a CLI (bin/cli.js), WASM artifacts (wasm/), and an MCP server (@modelcontextprotocol/sdkis a runtime dep).
What the JS/TS SDK actually covers (anchor for parity)
Reading npm/packages/ruvector/package.json keywords + dependencies:
- HNSW search, hybrid search, RaBitQ ("turboquant" appears), Graph RAG, FlashAttention-3, ColBERT, Mamba, hyperbolic geometry, ONNX MiniLM (semantic embeddings), SONA / LoRA / EWC adaptive learning, MCP server, Pi-Brain identity ("pi-key").
The Python SDK does not need to chase parity. The JS package is the everything-bagel; the Python package should be narrow and deliberate (see 02-strategy and 04-milestones).
Examples that map to Python notebooks
examples/ has 60+ directories. The ones that translate naturally:
examples/refrag-pipeline/— RAG pipeline usingcompress.rs/expand.rs/sense.rs. Becomes the M1 hello-world notebook (01_rag_in_5_lines.ipynb).examples/onnx-embeddings/— MiniLM ONNX embedder. Backs the M3 embedding tutorial.examples/a2a-swarm/— multi-peer A2A demo. Backs the M4 agent tutorial. Lives at the workspace top level, was added with ADR-159.crates/ruvector-rulake/examples/sidecar_daemon.rsandwarm_restart.rs— the "production deployment" patterns. Become the M2 ops notebook.
The notebooks are tracked under 04-milestones.md per milestone, not
checked in here.
What we are deliberately ignoring
These crates exist, are interesting, and will not be in the Python SDK roadmap:
- The 30+
*-wasmbrowser crates. Not Python's market. ruvix/(cognition kernel, bare-metal AArch64). Out of scope for any host-language SDK.mcp-*crates. MCP is a coordination protocol; if a Python user wants MCP they use the official MCP SDK.examples/*-consciousness,examples/*-boundary-discovery,examples/seti-*,examples/seizure-*, etc. — research demos, not API surfaces.crates/ruQu*,crates/ruvix/*,crates/cognitum-*,crates/prime-radiant,crates/thermorust. Internal R&D.
Net assessment
There is no existing Python work — confirmed by exhaustive search. This
is a clean room. The four crates that matter for v1 of a Python SDK are,
in order: ruvector-rabitq, ruvector-rulake, the embedder
(ruvector-cnn + ONNX glue), and rvagent-a2a. The NAPI template at
crates/ruvector-diskann-node/src/lib.rs is the structural exemplar to
follow for every PyO3 module we write.