mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-26 07:44:05 +00:00
Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.
Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**
Why:
- The existing `*-node` NAPI templates (e.g.
`crates/ruvector-diskann-node/src/lib.rs`) already prove out the
opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
while remaining compatible with asyncio + uvloop.
Milestone shape (each with explicit scope + acceptance tests):
M1 — RaBitQ-only Python wheel. Just the published
`ruvector-rabitq` crate exposed via PyO3. Smallest possible
useful surface. ~600 LoC, 3 weeks.
M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
~900 LoC, 4 weeks.
M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
`ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
dispatch tasks to A2A peers, including signed AgentCard
discovery. ~800 LoC, 4 weeks.
Three acceptance gates that gate the whole effort:
1. A Python user can do RAG over 1 M vectors in <5 lines.
2. An asyncio user can stream A2A task updates without thread
fights.
3. `pip install ruvector` takes <10 s on a stock machine.
Top 3 risks identified:
R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
single lazy runtime, `pyo3-asyncio` shim.
R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
rustls) could blow it. Mitigation: feature-gate axum/reqwest
behind `agent` extra; default install is rabitq + rulake only.
R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
before M1 ships.
Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.
Planning doc only; no implementation.
Co-Authored-By: claude-flow <ruv@ruv.net>
169 lines
8.7 KiB
Markdown
169 lines
8.7 KiB
Markdown
# 02 — Binding Strategy
|
||
|
||
## Decision
|
||
|
||
**PyO3 + maturin, single extension module, abi3-py39, with `pyo3-asyncio`
|
||
for async bridging and a hand-written `.pyi` stub.** Built and distributed
|
||
via `cibuildwheel` in CI, published to PyPI as `ruvector`. The crate lives
|
||
in-tree at `crates/ruvector-py/`.
|
||
|
||
The rest of this document defends that choice against the four alternatives
|
||
considered, and locks in the supporting decisions (asyncio, GIL, wheels,
|
||
stubs).
|
||
|
||
## The choice space
|
||
|
||
| Option | Idea | Why we are not picking it |
|
||
|---|---|---|
|
||
| **A. PyO3 + maturin** *(chosen)* | Native Rust extension exposed as a CPython C-API module via `pyo3`, built with `maturin`. | — |
|
||
| B. CFFI over a Rust `cdylib` | Hand-roll a C ABI in `ruvector-py/` (or reuse `ruvector-router-ffi`) and let Python call it via `cffi`. | Loses the rich type story PyO3 gives for free (NumPy buffers, `Vec<T>` <-> `list`, `Result<T,E>` <-> exception, `async fn` <-> awaitable). Forces us to maintain a C header. We already maintain NAPI bindings; CFFI is a strictly worse parallel surface. |
|
||
| C. ctypes over cbindgen | Same as B, but using the stdlib `ctypes` module instead of `cffi`. | Same loss; less ergonomic; no installer to declare a build dep on; users hit a `ctypes.CDLL` import error if they pip-install on a platform without a wheel. |
|
||
| D. wasmtime-py over the existing `*-wasm` crates | Reuse `ruvector-rabitq` via a new `ruvector-rabitq-wasm` crate, run the WASM in `wasmtime-py`. | Requires writing the missing `*-wasm` crate first (rabitq has none; rulake has none). Loses 5–20× perf vs native (no SIMD escape hatch). Tokio doesn't run inside `wasm32-wasi`. Adds a 6 MB+ wasmtime runtime to every wheel. The whole point of going native is to *match* the Rust numbers, not lose half of them at the boundary. |
|
||
| E. gRPC / OpenAPI server with thin Python client | Stand up `ruvector-server` over HTTP/gRPC, ship a Python client that hits localhost. | Two-process architecture is the wrong default for a library — the user gets to deal with port allocation, server lifecycle, and serialization cost on every call. This is the right shape for a Python *service* SDK, but a vector index isn't a service; it's a data structure. |
|
||
|
||
## Why PyO3 specifically
|
||
|
||
1. **Surface area parity with NAPI is automatic.** PyO3's `#[pyclass]`
|
||
maps onto an opaque handle the same way `#[napi]` does, and
|
||
`#[pymethods]` maps onto `#[napi]` impl blocks. Anyone who maintains
|
||
`crates/ruvector-diskann-node` can read and review the PyO3 module
|
||
in `crates/ruvector-py` line-for-line.
|
||
2. **NumPy zero-copy.** `pyo3` + `numpy` (the `rust-numpy` crate) lets
|
||
us accept `np.ndarray` and read it as `&[f32]` without a copy when
|
||
the array is contiguous and `dtype=float32`. RaBitQ search loops on
|
||
`&[f32]` already; this is a thin wrap.
|
||
3. **abi3 wheels.** PyO3 supports the stable ABI (`abi3-py39`), which
|
||
means **one wheel covers Python 3.9 / 3.10 / 3.11 / 3.12 / 3.13 /
|
||
3.14**. We do not need to ship a wheel per Python version.
|
||
This collapses the matrix from ~25 wheels (5 versions × 5 platforms)
|
||
to 5 wheels.
|
||
4. **Mature async.** `pyo3-asyncio` (or its successor `pyo3-async-runtimes`,
|
||
which we should track) lets a Rust `async fn` return a Python
|
||
`awaitable` that `asyncio.run` awaits without spawning a thread per
|
||
call. This is the only practical way to bridge tokio without
|
||
double-runtime-fights.
|
||
5. **Maturin is the de-facto Rust-Python build tool.** Used by polars,
|
||
pydantic-core, cryptography (in part), tokenizers. We are not
|
||
pioneering anything; we are taking the well-trodden path.
|
||
|
||
## Async story
|
||
|
||
**Native asyncio via `pyo3-asyncio`.** Every Rust `async fn` we expose
|
||
becomes an `async def` in Python by way of a `pyo3_asyncio::tokio::future_into_py`
|
||
wrapper. There is exactly one tokio runtime in the process: a multi-thread
|
||
runtime owned by the extension module, lazily initialized on first use,
|
||
sized to `min(8, os.cpu_count())` worker threads. We do **not** create a
|
||
runtime per call.
|
||
|
||
We do **not** use `asyncio.to_thread` or `run_in_executor` to wrap a sync
|
||
API. That works but breaks cancellation propagation and tracing context.
|
||
|
||
The main async surfaces are:
|
||
|
||
- `RuLake.search_async` (M2)
|
||
- `A2aClient.send_task` / `stream_task` (M4)
|
||
- `Embedder.embed_batch_async` (M3, optional — sync is fine for CPU work)
|
||
|
||
Sync siblings are kept for every async method (e.g. `search` and
|
||
`search_async`). Synchronous calls release the GIL via
|
||
`Python::allow_threads`; async calls return immediately and block the
|
||
tokio runtime, not the calling Python thread.
|
||
|
||
Compatibility: tested against CPython's default asyncio + uvloop. We do
|
||
not pin uvloop. We do not invent our own loop policy.
|
||
|
||
## GIL story
|
||
|
||
Every CPU-bound entry point that takes more than ~50 µs releases the
|
||
GIL via `py.allow_threads(|| { ... })` around the inner Rust call. The
|
||
list as of M3:
|
||
|
||
| Surface | Releases GIL? | Why |
|
||
|---|---|---|
|
||
| `RabitqIndex.build` | yes | dominant cost is rotation + popcount, all Rust |
|
||
| `RabitqIndex.search` | yes | scan loop, no Python interaction |
|
||
| `RabitqIndex.add` | no | one vector per call, overhead < release cost |
|
||
| `RuLake.search_*` | yes | scan + cache lookup, all Rust |
|
||
| `Embedder.embed` | yes | tensor ops |
|
||
| `A2aClient.send_task` | n/a (async) | tokio runs without holding the GIL |
|
||
|
||
This is the same calculus polars and tokenizers use. Documenting it
|
||
explicitly so the next person who adds a method knows the rule.
|
||
|
||
## Wheel distribution matrix
|
||
|
||
We ship five wheels for each release, all `abi3-py39` (works on Python
|
||
3.9+):
|
||
|
||
| Platform | Triple | Built on | Notes |
|
||
|---|---|---|---|
|
||
| Linux x86_64 | `manylinux_2_28_x86_64` | GitHub Actions ubuntu-latest | AVX2 baseline; runtime detect AVX-512 |
|
||
| Linux aarch64 | `manylinux_2_28_aarch64` | GHA ARM runners or QEMU via cibuildwheel | NEON baseline |
|
||
| macOS x86_64 | `macosx_10_15_x86_64` | GHA macos-13 | AVX2 baseline; bottlenecking on M-series users is fine, they have an arm64 wheel |
|
||
| macOS aarch64 | `macosx_11_0_arm64` | GHA macos-14 | NEON baseline |
|
||
| Windows x86_64 | `win_amd64` | GHA windows-latest | AVX2 baseline; runtime detect AVX-512 |
|
||
|
||
We **drop** musllinux, Windows arm64, and 32-bit anything. cibuildwheel
|
||
configures via `[tool.cibuildwheel]` in `pyproject.toml`. A 32-bit user
|
||
gets `pip install` falling back to sdist, which fails to build, which
|
||
is the correct outcome.
|
||
|
||
SIMD is **runtime-detected**, not compiled per-platform. ruvector-rabitq
|
||
is pure Rust without explicit AVX-512 paths today (the `kernel.rs`
|
||
`VectorKernel` trait is the extension point). We ship one binary per
|
||
platform; if/when we add an AVX-512 kernel it lives behind a runtime
|
||
CPU-feature check.
|
||
|
||
## Type stubs
|
||
|
||
**Hand-written `.pyi` stubs**, checked in at
|
||
`crates/ruvector-py/python/ruvector/__init__.pyi`. Reasons:
|
||
|
||
- `pyo3-stub-gen` is real and improving but generates noisy stubs that
|
||
need editing anyway (it overstates `Any`, doesn't infer `Optional[...]`
|
||
from `Option<T>` cleanly).
|
||
- The stub surface is small enough (≤ 4 modules × ≤ 40 methods) that
|
||
hand-writing is feasible.
|
||
- We control the user-visible API shape, e.g. we want NumPy types in
|
||
signatures (`np.ndarray[np.float32]`), not `list[float]`.
|
||
|
||
A CI job runs `mypy --strict tests/` and `pyright tests/` against an
|
||
`import ruvector` to catch stub regressions.
|
||
|
||
## Source layout
|
||
|
||
```
|
||
crates/ruvector-py/
|
||
Cargo.toml # crate-type = ["cdylib"], pyo3 + numpy + pyo3-asyncio
|
||
pyproject.toml # maturin backend; cibuildwheel config; project metadata
|
||
README.md # short — links to docs/sdk
|
||
src/
|
||
lib.rs # PyModule init, re-exports each submodule
|
||
rabitq.rs # M1
|
||
rulake.rs # M2
|
||
embed.rs # M3
|
||
a2a.rs # M4
|
||
error.rs # exception hierarchy
|
||
runtime.rs # the singleton tokio runtime
|
||
python/ruvector/
|
||
__init__.py # re-exports from the compiled module + small pure-Py helpers
|
||
__init__.pyi # hand-written stubs
|
||
py.typed # marker so mypy/pyright recognize stubs
|
||
tests/ # pytest, runs against the installed wheel
|
||
benches/ # asv (airspeed-velocity) over identical workloads to Rust criterion
|
||
```
|
||
|
||
The `python/ruvector/__init__.py` re-export pattern lets us add pure-Python
|
||
helpers (e.g. dataclasses for config) without forcing them through the
|
||
extension boundary.
|
||
|
||
## What this strategy explicitly does NOT do
|
||
|
||
- Does not wrap every workspace crate. We pick four crates over four
|
||
milestones; everything else stays Rust-only.
|
||
- Does not try to be a Pythonic vector DB framework (chromadb, weaviate,
|
||
qdrant). We are a thin, fast, typed binding to a specific Rust stack.
|
||
- Does not vendor models. The embedder downloads weights from
|
||
HuggingFace at first use, the same way `ruvector-cnn` does in Rust.
|
||
- Does not provide an asyncio-only API. Sync siblings always exist
|
||
for non-network calls.
|