# 02 — Binding Strategy ## Decision **PyO3 + maturin, single extension module, abi3-py39, with `pyo3-asyncio` for async bridging and a hand-written `.pyi` stub.** Built and distributed via `cibuildwheel` in CI, published to PyPI as `ruvector`. The crate lives in-tree at `crates/ruvector-py/`. The rest of this document defends that choice against the four alternatives considered, and locks in the supporting decisions (asyncio, GIL, wheels, stubs). ## The choice space | Option | Idea | Why we are not picking it | |---|---|---| | **A. PyO3 + maturin** *(chosen)* | Native Rust extension exposed as a CPython C-API module via `pyo3`, built with `maturin`. | — | | B. CFFI over a Rust `cdylib` | Hand-roll a C ABI in `ruvector-py/` (or reuse `ruvector-router-ffi`) and let Python call it via `cffi`. | Loses the rich type story PyO3 gives for free (NumPy buffers, `Vec` <-> `list`, `Result` <-> exception, `async fn` <-> awaitable). Forces us to maintain a C header. We already maintain NAPI bindings; CFFI is a strictly worse parallel surface. | | C. ctypes over cbindgen | Same as B, but using the stdlib `ctypes` module instead of `cffi`. | Same loss; less ergonomic; no installer to declare a build dep on; users hit a `ctypes.CDLL` import error if they pip-install on a platform without a wheel. | | D. wasmtime-py over the existing `*-wasm` crates | Reuse `ruvector-rabitq` via a new `ruvector-rabitq-wasm` crate, run the WASM in `wasmtime-py`. | Requires writing the missing `*-wasm` crate first (rabitq has none; rulake has none). Loses 5–20× perf vs native (no SIMD escape hatch). Tokio doesn't run inside `wasm32-wasi`. Adds a 6 MB+ wasmtime runtime to every wheel. The whole point of going native is to *match* the Rust numbers, not lose half of them at the boundary. | | E. gRPC / OpenAPI server with thin Python client | Stand up `ruvector-server` over HTTP/gRPC, ship a Python client that hits localhost. | Two-process architecture is the wrong default for a library — the user gets to deal with port allocation, server lifecycle, and serialization cost on every call. This is the right shape for a Python *service* SDK, but a vector index isn't a service; it's a data structure. | ## Why PyO3 specifically 1. **Surface area parity with NAPI is automatic.** PyO3's `#[pyclass]` maps onto an opaque handle the same way `#[napi]` does, and `#[pymethods]` maps onto `#[napi]` impl blocks. Anyone who maintains `crates/ruvector-diskann-node` can read and review the PyO3 module in `crates/ruvector-py` line-for-line. 2. **NumPy zero-copy.** `pyo3` + `numpy` (the `rust-numpy` crate) lets us accept `np.ndarray` and read it as `&[f32]` without a copy when the array is contiguous and `dtype=float32`. RaBitQ search loops on `&[f32]` already; this is a thin wrap. 3. **abi3 wheels.** PyO3 supports the stable ABI (`abi3-py39`), which means **one wheel covers Python 3.9 / 3.10 / 3.11 / 3.12 / 3.13 / 3.14**. We do not need to ship a wheel per Python version. This collapses the matrix from ~25 wheels (5 versions × 5 platforms) to 5 wheels. 4. **Mature async.** `pyo3-asyncio` (or its successor `pyo3-async-runtimes`, which we should track) lets a Rust `async fn` return a Python `awaitable` that `asyncio.run` awaits without spawning a thread per call. This is the only practical way to bridge tokio without double-runtime-fights. 5. **Maturin is the de-facto Rust-Python build tool.** Used by polars, pydantic-core, cryptography (in part), tokenizers. We are not pioneering anything; we are taking the well-trodden path. ## Async story **Native asyncio via `pyo3-asyncio`.** Every Rust `async fn` we expose becomes an `async def` in Python by way of a `pyo3_asyncio::tokio::future_into_py` wrapper. There is exactly one tokio runtime in the process: a multi-thread runtime owned by the extension module, lazily initialized on first use, sized to `min(8, os.cpu_count())` worker threads. We do **not** create a runtime per call. We do **not** use `asyncio.to_thread` or `run_in_executor` to wrap a sync API. That works but breaks cancellation propagation and tracing context. The main async surfaces are: - `RuLake.search_async` (M2) - `A2aClient.send_task` / `stream_task` (M4) - `Embedder.embed_batch_async` (M3, optional — sync is fine for CPU work) Sync siblings are kept for every async method (e.g. `search` and `search_async`). Synchronous calls release the GIL via `Python::allow_threads`; async calls return immediately and block the tokio runtime, not the calling Python thread. Compatibility: tested against CPython's default asyncio + uvloop. We do not pin uvloop. We do not invent our own loop policy. ## GIL story Every CPU-bound entry point that takes more than ~50 µs releases the GIL via `py.allow_threads(|| { ... })` around the inner Rust call. The list as of M3: | Surface | Releases GIL? | Why | |---|---|---| | `RabitqIndex.build` | yes | dominant cost is rotation + popcount, all Rust | | `RabitqIndex.search` | yes | scan loop, no Python interaction | | `RabitqIndex.add` | no | one vector per call, overhead < release cost | | `RuLake.search_*` | yes | scan + cache lookup, all Rust | | `Embedder.embed` | yes | tensor ops | | `A2aClient.send_task` | n/a (async) | tokio runs without holding the GIL | This is the same calculus polars and tokenizers use. Documenting it explicitly so the next person who adds a method knows the rule. ## Wheel distribution matrix We ship five wheels for each release, all `abi3-py39` (works on Python 3.9+): | Platform | Triple | Built on | Notes | |---|---|---|---| | Linux x86_64 | `manylinux_2_28_x86_64` | GitHub Actions ubuntu-latest | AVX2 baseline; runtime detect AVX-512 | | Linux aarch64 | `manylinux_2_28_aarch64` | GHA ARM runners or QEMU via cibuildwheel | NEON baseline | | macOS x86_64 | `macosx_10_15_x86_64` | GHA macos-13 | AVX2 baseline; bottlenecking on M-series users is fine, they have an arm64 wheel | | macOS aarch64 | `macosx_11_0_arm64` | GHA macos-14 | NEON baseline | | Windows x86_64 | `win_amd64` | GHA windows-latest | AVX2 baseline; runtime detect AVX-512 | We **drop** musllinux, Windows arm64, and 32-bit anything. cibuildwheel configures via `[tool.cibuildwheel]` in `pyproject.toml`. A 32-bit user gets `pip install` falling back to sdist, which fails to build, which is the correct outcome. SIMD is **runtime-detected**, not compiled per-platform. ruvector-rabitq is pure Rust without explicit AVX-512 paths today (the `kernel.rs` `VectorKernel` trait is the extension point). We ship one binary per platform; if/when we add an AVX-512 kernel it lives behind a runtime CPU-feature check. ## Type stubs **Hand-written `.pyi` stubs**, checked in at `crates/ruvector-py/python/ruvector/__init__.pyi`. Reasons: - `pyo3-stub-gen` is real and improving but generates noisy stubs that need editing anyway (it overstates `Any`, doesn't infer `Optional[...]` from `Option` cleanly). - The stub surface is small enough (≤ 4 modules × ≤ 40 methods) that hand-writing is feasible. - We control the user-visible API shape, e.g. we want NumPy types in signatures (`np.ndarray[np.float32]`), not `list[float]`. A CI job runs `mypy --strict tests/` and `pyright tests/` against an `import ruvector` to catch stub regressions. ## Source layout ``` crates/ruvector-py/ Cargo.toml # crate-type = ["cdylib"], pyo3 + numpy + pyo3-asyncio pyproject.toml # maturin backend; cibuildwheel config; project metadata README.md # short — links to docs/sdk src/ lib.rs # PyModule init, re-exports each submodule rabitq.rs # M1 rulake.rs # M2 embed.rs # M3 a2a.rs # M4 error.rs # exception hierarchy runtime.rs # the singleton tokio runtime python/ruvector/ __init__.py # re-exports from the compiled module + small pure-Py helpers __init__.pyi # hand-written stubs py.typed # marker so mypy/pyright recognize stubs tests/ # pytest, runs against the installed wheel benches/ # asv (airspeed-velocity) over identical workloads to Rust criterion ``` The `python/ruvector/__init__.py` re-export pattern lets us add pure-Python helpers (e.g. dataclasses for config) without forcing them through the extension boundary. ## What this strategy explicitly does NOT do - Does not wrap every workspace crate. We pick four crates over four milestones; everything else stays Rust-only. - Does not try to be a Pythonic vector DB framework (chromadb, weaviate, qdrant). We are a thin, fast, typed binding to a specific Rust stack. - Does not vendor models. The embedder downloads weights from HuggingFace at first use, the same way `ruvector-cnn` does in Rust. - Does not provide an asyncio-only API. Sync siblings always exist for non-network calls.