mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-26 07:44:05 +00:00

ruvnet f6c684aba0 docs(sdk): add deep planning review for ruvector Python SDK

Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.

Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**

Why:
- The existing `*-node` NAPI templates (e.g.
  `crates/ruvector-diskann-node/src/lib.rs`) already prove out the
  opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
  ~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
  to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
  while remaining compatible with asyncio + uvloop.

Milestone shape (each with explicit scope + acceptance tests):

  M1 — RaBitQ-only Python wheel. Just the published
       `ruvector-rabitq` crate exposed via PyO3. Smallest possible
       useful surface. ~600 LoC, 3 weeks.
  M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
       ~900 LoC, 4 weeks.
  M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
       `ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
  M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
       dispatch tasks to A2A peers, including signed AgentCard
       discovery. ~800 LoC, 4 weeks.

Three acceptance gates that gate the whole effort:
  1. A Python user can do RAG over 1 M vectors in <5 lines.
  2. An asyncio user can stream A2A task updates without thread
     fights.
  3. `pip install ruvector` takes <10 s on a stock machine.

Top 3 risks identified:
  R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
       single lazy runtime, `pyo3-asyncio` shim.
  R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
       rustls) could blow it. Mitigation: feature-gate axum/reqwest
       behind `agent` extra; default install is rabitq + rulake only.
  R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
       before M1 ships.

Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.

Planning doc only; no implementation.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-04-25 20:28:54 -04:00

15 KiB

Raw Permalink Blame History

04 — Milestones

Same shape as ADR-159's milestone plan (docs/adr/ADR-159-rvagent-a2a-protocol.md § "Implementation plan"). Each milestone has explicit scope, a file list, a LoC budget, an acceptance test set, the wheel platforms shipped, and the docs that must land.

The crate crates/ruvector-py/ is created in M1 and grows by one source module per milestone.

M1 — RaBitQ-only Python wheel

Scope.

Create the new workspace crate crates/ruvector-py/ with crate-type = ["cdylib"], pyo3 = "0.22", numpy = "0.22", pyo3-asyncio = "0.22" (or successor pyo3-async-runtimes if pinned), maturin as the build backend in pyproject.toml.
Bind exactly the four index types from crates/ruvector-rabitq/src/index.rs: FlatF32Index, RabitqIndex, RabitqPlusIndex, RabitqAsymIndex. All four implement the AnnIndex trait there.
Bind BinaryCode accessors for advanced users (ids, norms, packed) even though most users will never touch them — they're cheap to expose and the test suite uses them.
Bind RandomRotation only as an opaque handle returned by idx.rotation() — no public constructor in v1.
Persistence: bind crates/ruvector-rabitq/src/persist.rs so idx.save(path) and Index.load(path) work for RabitqIndex and RabitqPlusIndex (the .rbpx format from lake.rs PERSISTED_INDEX_FILENAME).
Hand-write python/ruvector/__init__.pyi with full stubs for the M1 surface.
Set up pyproject.toml with cibuildwheel configured for the five platform wheels listed in 02-strategy.md § "Wheel distribution matrix", abi3-py39.
CI: GitHub Actions workflow python-wheels.yml that builds + tests
- uploads to TestPyPI on every PR, PyPI on tag.
Pure-Python helpers in python/ruvector/: _version.py (mirrors Cargo version), _typing.py (the AnyIndex runtime-checkable Protocol).

File list.

crates/ruvector-py/
  Cargo.toml                  # ~30 LoC
  pyproject.toml              # ~80 LoC (cibuildwheel matrix, project meta)
  README.md                   # ~50 LoC, links docs/sdk
  src/
    lib.rs                    # ~80 LoC — PyModule init, re-exports
    rabitq.rs                 # ~450 LoC — four index types
    error.rs                  # ~80 LoC — exception hierarchy root + IndexError tree
    numpy_util.rs             # ~60 LoC — view & dtype enforcement helpers
  python/ruvector/
    __init__.py               # ~40 LoC — re-exports
    __init__.pyi              # ~200 LoC — typed stubs
    py.typed                  # 0 LoC marker
    _version.py               # ~5 LoC
    _typing.py                # ~30 LoC — AnyIndex Protocol
  tests/
    test_rabitq_basic.py      # ~120 LoC
    test_rabitq_persist.py    # ~80 LoC
    test_numpy_interop.py     # ~80 LoC
    test_errors.py            # ~60 LoC
.github/workflows/
  python-wheels.yml           # ~120 LoC
docs/sdk/notebooks/
  01_rag_in_5_lines.ipynb     # uses M1 index over toy data

LoC budget. ~700 Rust + ~340 Python tests + ~200 stubs + ~120 CI YAML.

Acceptance tests.

pip install ruvector from TestPyPI on each of the five platforms in 02-strategy.md's matrix succeeds in ≤ 10 s on a stock GitHub-hosted runner with a warm pip cache.
test_rabitq_basic.py::test_100k_search_under_10ms: build a RabitqPlusIndex over 100,000 random D=128 vectors with rerank_factor=20, run 100 single-query searches, assert p99 latency < 10 ms (mirrors ruvector-rabitq/BENCHMARK.md baseline of 1.05 ms/query for RabitqPlus rerank×20 with headroom for Python overhead).
test_numpy_interop.py::test_zero_copy_search: build an index, pass a contiguous np.ndarray[np.float32] query, assert the call produces no copy via a memory-tracker fixture.
test_rabitq_persist.py::test_roundtrip: save → load → search, assert bit-identical results to a search before save.
test_errors.py::test_dim_mismatch: query of wrong dim raises ruvector.DimensionMismatch and the message names both expected and got dim.
mypy --strict on tests/ finds no errors.
Wheel size ≤ 8 MB on linux x86_64, ≤ 12 MB on macOS arm64.

Wheels shipped. All five platforms in 02-strategy.md § "Wheel distribution matrix". M1 ships nothing else.

Docs.

docs/sdk/notebooks/01_rag_in_5_lines.ipynb — derived from examples/refrag-pipeline/. Uses the M1 surface only (no embedder yet — uses pre-computed vectors from a fixture).
Sphinx rST scaffold under docs/sdk/api/ is created but only the RaBitQ section is filled.
Top-level crates/ruvector-py/README.md.

M2 — ruLake bindings

Scope.

Add crates/ruvector-py/src/rulake.rs. Bind:
- RuLake with builder (RuLake::new + chained with_* mapped to a Python builder pattern).
- BackendAdapter exposed as a Python ABC for users to implement; bridges into Rust via a PyBackendAdapter impl that calls back into the Python class. (This is the only place we need GIL re-acquisition in M2.)
- LocalBackend and FsBackend as concrete classes.
- Consistency as an int-enum.
- RuLakeBundle, RefreshResult, SearchResult, CacheStats, PerBackendStats.
- All RuLake methods listed in crates/ruvector-py-survey (i.e. register_backend, search_one, search_federated, search_batch, publish_bundle, refresh_from_bundle_dir, save_cache_to_dir, warm_from_dir, cache_stats*, cache_witness_of, invalidate_cache).
Add _async siblings for search_one / search_federated / search_batch using pyo3_asyncio::tokio::future_into_py. The underlying Rust calls are sync today (per lake.rs); async siblings exist so we don't have to break the surface when the Rust async refactor lands.
Initialize the singleton tokio runtime here in M2 (M1 doesn't need it).
Extend error.rs with the LakeError subtree.

File list.

crates/ruvector-py/src/
  rulake.rs                   # ~600 LoC
  runtime.rs                  # ~80 LoC — singleton tokio runtime
crates/ruvector-py/python/ruvector/
  __init__.pyi                # +180 LoC for RuLake surface
crates/ruvector-py/tests/
  test_rulake_local.py        # ~150 LoC
  test_rulake_fs_backend.py   # ~120 LoC
  test_rulake_async.py        # ~100 LoC
  test_rulake_witness.py      # ~80 LoC
docs/sdk/notebooks/
  02_warm_restart_with_witness.ipynb

LoC budget. ~680 Rust + ~450 Python tests + ~180 stub additions.

Acceptance tests.

test_rulake_local.py::test_register_search_local: register a LocalBackend with 50,000 D=128 vectors, run search_one, assert results match a direct RabitqPlusIndex search.
test_rulake_async.py::test_search_one_async_in_event_loop: run 100 concurrent await lake.search_one_async(...) calls inside a single asyncio event loop, assert they complete in less than 10× the sync time (no thread-fight regression).
test_rulake_witness.py::test_publish_refresh_roundtrip: publish bundle, mutate underlying data, re-publish, refresh, assert RefreshResult.INVALIDATED. Mirrors lake.rs refresh_from_bundle_dir contract.
test_rulake_fs_backend.py::test_warm_restart: prime cache, save to disk, kill process, start a fresh RuLake, warm_from_dir, assert first search after warmup is < 1.5× steady-state latency.
test_rulake_local.py::test_python_backend_adapter: a user-defined Python class subclasses ruvector.BackendAdapter, registers, gets called back by ruLake on cache miss. (This is the GIL re-acquisition round-trip.)

Wheels shipped. Same five platforms. Wheel size budget bumps to ≤ 12 MB linux / ≤ 16 MB macOS arm64 (tokio adds ~3 MB).

Docs.

docs/sdk/notebooks/02_warm_restart_with_witness.ipynb — derived from crates/ruvector-rulake/examples/warm_restart.rs.
Sphinx page for ruLake reference complete.

M3 — Embeddings + ML helpers

Scope.

Add crates/ruvector-py/src/embed.rs. Bind a single Embedder class with two factory functions:
- Embedder.from_pretrained(name) for text. Implementation calls into crates/ruvector-cnn/ for image and into a new tiny crates/ruvector-py/src/onnx_embed.rs helper for text (ONNX Runtime via ort 2.x). Text models: all-MiniLM-L6-v2 first; bge-small-en-v1.5 second.
- Embedder.from_pretrained with a mobilenetv3-* prefix routes to ruvector-cnn's MobileNetEmbedder (gated on the backbone feature in ruvector-cnn/Cargo.toml).
Model weights: download once on first use into the standard ~/.cache/ruvector/models/ directory, verify a SHA-256 digest, cache. No bundled weights — the wheel stays small.
Sync embed, sync embed_batch, async embed_batch_async. Async exists so a notebook user can interleave embedding with ruLake ingestion in the same event loop.
Extend error.rs with EmbedError.

File list.

crates/ruvector-py/src/
  embed.rs                    # ~350 LoC
  onnx_embed.rs               # ~250 LoC — ort wrapper, model registry
crates/ruvector-py/python/ruvector/
  __init__.pyi                # +120 LoC for Embedder
  _models.py                  # ~80 LoC — model registry, download paths
crates/ruvector-py/tests/
  test_embed_text.py          # ~120 LoC
  test_embed_image.py         # ~100 LoC
  test_embed_to_index.py      # ~80 LoC — end-to-end RAG
docs/sdk/notebooks/
  03_text_to_search.ipynb     # full RAG: text → embed → RabitqPlus → search

LoC budget. ~600 Rust + ~300 Python tests + ~200 helpers/stubs.

Acceptance tests.

test_embed_text.py::test_minilm_dim: embed 100 strings, assert shape (100, 384) and dtype float32.
test_embed_text.py::test_first_use_downloads: in a fresh cache dir, from_pretrained("all-MiniLM-L6-v2") downloads, verifies SHA-256, caches; second call is no-network.
test_embed_image.py::test_mobilenetv3_small_dim: embed a (224, 224, 3) image, assert shape (576,) (matches ruvector-cnn MobileNetV3-Small dim).
test_embed_to_index.py::test_e2e_rag_under_5_lines: file is the acceptance gate G1 in 06-decision-record. Full pipeline, ≤ 5 significant lines of user code, completes < 30 s on a stock laptop with warm model cache. (Subject to network for the first run only.)
ONNX Runtime is optional: the wheel ships without ort bundled in; image-only users pip install ruvector and skip the text path. Importing Embedder.from_pretrained("all-MiniLM-...") without ort raises EmbedError("install ruvector[text]").

Wheels shipped. Same five platforms; the ruvector wheel does not bundle ort. We ship a ruvector[text] extra that adds onnxruntime as a Python-side dep (so wheel size of ruvector itself stays ≤ 14 MB).

Docs.

docs/sdk/notebooks/03_text_to_search.ipynb.
Sphinx page for Embedder.
README example block updated.

M4 — A2A client

Scope.

Add crates/ruvector-py/src/a2a.rs. Bind from crates/rvAgent/rvagent-a2a/src/:
- A2aClient with connect, send_task, get_task, cancel_task, stream_task. All async (the underlying Rust API is async via reqwest). Sync siblings via pyo3_asyncio::tokio::run_until_complete for non-async users.
- AgentCard, AgentCapabilities, AgentSkill, AgentProvider, AuthScheme, Task, TaskSpec, TaskState, TaskStatus, Message, Part, Role, Artifact, TaskArtifactUpdateEvent, TaskStatusUpdateEvent — all from rvagent-a2a/src/types.rs and lib.rs re-exports.
- TaskPolicy from rvagent-a2a/src/policy.rs. Construction-only on the Python side; not modifiable post-send.
- TaskUpdate discriminated dataclass returned by stream_task.
Verify-on-discover (ADR-159 r2) enabled by default; strict_verify=False is exposed but documented as for-test-only.
We do not bind the A2A server. Server-side rvAgent stays Rust-only in v1.
Extend error.rs with A2aError subtree (CardSignatureInvalid, PolicyViolation, BudgetExceeded, TransportError).

File list.

crates/ruvector-py/src/
  a2a.rs                      # ~700 LoC
  a2a_types.rs                # ~250 LoC — type conversions for AgentCard, Task, Artifact
crates/ruvector-py/python/ruvector/
  __init__.pyi                # +220 LoC for the A2A surface
crates/ruvector-py/tests/
  test_a2a_card.py            # ~120 LoC
  test_a2a_send_task.py       # ~150 LoC
  test_a2a_stream.py          # ~150 LoC
  test_a2a_policy.py          # ~80 LoC
docs/sdk/notebooks/
  04_dispatch_to_python_peer.ipynb

A test fixture stands up an in-process rvAgent A2A server (using tokio::test-equivalent in pytest via a test-only Rust binary launched in a subprocess.Popen). The server lives in crates/ruvector-py/tests/a2a_test_server/ and is built once per test session.

LoC budget. ~950 Rust + ~500 Python tests + ~220 stub additions.

Acceptance tests.

test_a2a_card.py::test_fetch_signed_card: connect to the test server, fetch the AgentCard, assert signature verifies and agent_id matches SHAKE-256(pubkey).
test_a2a_card.py::test_tampered_card_rejected: redirect the client to a tampered /.well-known/agent.json, assert CardSignatureInvalid.
test_a2a_send_task.py::test_lifecycle: send a task, poll until completed, assert artifacts present.
test_a2a_stream.py::test_stream_no_thread_fight (acceptance gate G2): consume an SSE stream of 1,000 status updates inside a single asyncio event loop alongside two other concurrent ruLake search_one_async calls; assert no event-loop-blocked warnings, no thread-stuck warnings, total time < 1.2 × the maximum of the three workloads in isolation.
test_a2a_policy.py::test_budget_exceeded: send a task that violates max_cost_usd, assert PolicyViolation raised before any work begins.

Wheels shipped. Same five platforms. Wheel size budget tops out at ≤ 22 MB linux / ≤ 28 MB macOS arm64 (reqwest + rustls + axum-deps). This is the size red line; if we trip it we ship the A2A bits as a ruvector[a2a] extra with a separate wheel ruvector-a2a.

Docs.

docs/sdk/notebooks/04_dispatch_to_python_peer.ipynb — derived from examples/a2a-swarm/.
Sphinx page for A2aClient.
README updated with end-to-end "Python app dispatches to rvAgent" walkthrough.

Total sizing

Milestone	Rust LoC	Python LoC	Tests LoC	Cum. wheel size	Calendar weeks
M1	~700	~75	~340	≤ 8 MB	2
M2	~680	~30	~450	≤ 12 MB	3
M3	~600	~80	~300	≤ 14 MB	2.5
M4	~950	~30	~500	≤ 22 MB	3.5
Total	~2,930	~215	~1,590	≤ 22 MB	~11 weeks

Calendar weeks assume one engineer with PyO3 experience working full-time; double if pair-programmed; halve if not done in series (M1 and M3 can parallelize after M1's CI is green).

15 KiB Raw Permalink Blame History Unescape Escape

04 — Milestones

M1 — RaBitQ-only Python wheel

M2 — ruLake bindings

M3 — Embeddings + ML helpers

M4 — A2A client

Total sizing

15 KiB

Raw Permalink Blame History