ruvector/docs/sdk/03-api-surface.md
ruvnet f6c684aba0 docs(sdk): add deep planning review for ruvector Python SDK
Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.

Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**

Why:
- The existing `*-node` NAPI templates (e.g.
  `crates/ruvector-diskann-node/src/lib.rs`) already prove out the
  opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
  ~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
  to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
  while remaining compatible with asyncio + uvloop.

Milestone shape (each with explicit scope + acceptance tests):

  M1 — RaBitQ-only Python wheel. Just the published
       `ruvector-rabitq` crate exposed via PyO3. Smallest possible
       useful surface. ~600 LoC, 3 weeks.
  M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
       ~900 LoC, 4 weeks.
  M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
       `ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
  M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
       dispatch tasks to A2A peers, including signed AgentCard
       discovery. ~800 LoC, 4 weeks.

Three acceptance gates that gate the whole effort:
  1. A Python user can do RAG over 1 M vectors in <5 lines.
  2. An asyncio user can stream A2A task updates without thread
     fights.
  3. `pip install ruvector` takes <10 s on a stock machine.

Top 3 risks identified:
  R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
       single lazy runtime, `pyo3-asyncio` shim.
  R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
       rustls) could blow it. Mitigation: feature-gate axum/reqwest
       behind `agent` extra; default install is rabitq + rulake only.
  R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
       before M1 ships.

Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.

Planning doc only; no implementation.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-25 20:28:54 -04:00

9.2 KiB

03 — Python API Surface

The user-visible Python API across all four milestones. Everything in this document is what gets typed at a REPL or in a notebook. Implementation details (PyO3 attributes, GIL handling) are in 02-strategy.

Top-level layout

import ruvector

# Vector indexes (M1) — backed by ruvector-rabitq
ruvector.FlatF32Index
ruvector.RabitqIndex
ruvector.RabitqPlusIndex
ruvector.RabitqAsymIndex

# Cache-first execution fabric (M2) — backed by ruvector-rulake
ruvector.RuLake
ruvector.LocalBackend
ruvector.FsBackend
ruvector.Consistency  # enum: FRESH | EVENTUAL | STALE
ruvector.RuLakeBundle

# Embedding (M3) — backed by ruvector-cnn + ONNX glue
ruvector.Embedder

# Agent peer protocol (M4) — backed by rvagent-a2a
ruvector.A2aClient
ruvector.AgentCard
ruvector.TaskSpec
ruvector.Task

# Cross-cutting
ruvector.RuVectorError      # base exception
ruvector.__version__         # mirrors Cargo workspace version
ruvector.cpu_features()      # runtime SIMD probe

Every public name above is exported from the compiled extension and re-exported by python/ruvector/__init__.py.

M1 — RaBitQ vector index

import numpy as np
import ruvector

# Build from an (n, d) float32 array. Dtype is enforced; mismatch raises.
vectors = np.random.randn(100_000, 768).astype(np.float32)
idx = ruvector.RabitqPlusIndex.build(
    vectors,
    seed=42,
    rerank_factor=20,    # ADR-154 recommended for 100% recall@10 at D=128
)

# Search a single query — returns a list of (id, score) named tuples.
query = np.random.randn(768).astype(np.float32)
hits = idx.search(query, k=10)
for h in hits:
    print(h.id, h.score)

# Pythonic conveniences
len(idx)            # n vectors
idx.dim             # 768
idx.memory_bytes    # honest accounting (matches AnnIndex::memory_bytes)
idx.save("index.rbpx")
idx2 = ruvector.RabitqPlusIndex.load("index.rbpx")

# Add after build (mirrors AnnIndex::add — appends, must match dim).
idx.add(id=100_001, vector=np.random.randn(768).astype(np.float32))

build() is a classmethod, takes np.ndarray directly (no list copy), releases the GIL, runs in parallel via rayon. Ergonomic but not magic: non-contiguous, non-float32 arrays raise immediately with a clear message rather than silently copying.

The four index types share an AnnIndex-shaped Python protocol but we do not expose a Python ABC; the four classes are concrete. isinstance(idx, ruvector.AnyIndex) works via a runtime-checkable Protocol in the stub.

M2 — RuLake (cache-first vector fabric)

import ruvector
import asyncio

# Builder pattern mirrors RuLake::new + with_*.
lake = (
    ruvector.RuLake.builder()
    .rerank_factor(20)
    .rotation_seed(42)
    .max_cache_entries(1_000_000)
    .consistency(ruvector.Consistency.FRESH)
    .build()
)

# Backends are first-class Python objects.
backend = ruvector.LocalBackend(name="hot-shard")
backend.upsert("docs", ids=[1, 2, 3], vectors=np.random.randn(3, 768).astype(np.float32))
lake.register_backend(backend)

# Sync search.
hits = lake.search_one(collection="docs", query=query, k=10)

# Async search — no thread fight; runs on the extension's tokio runtime.
async def main():
    hits = await lake.search_one_async(collection="docs", query=query, k=10)
    print([(h.backend, h.id, h.score) for h in hits])

asyncio.run(main())

# Federated search across all backends — fanout + merge by score.
hits = lake.search_federated(collection="docs", query=query, k=10)

# Bundle witness operations — surfaces the SHA3 witness from RuLake::publish_bundle.
witness = lake.publish_bundle("docs", out_dir="/tmp/bundle/")
result = lake.refresh_from_bundle_dir(key=("local", "docs"), dir="/tmp/bundle/")
assert result == ruvector.RefreshResult.UP_TO_DATE  # or INVALIDATED, BUNDLE_MISSING

The (backend_id, collection) tuple that Rust uses as a CacheKey is exposed as a Python tuple — no custom class, no surprise.

Consistency is enum.Enum-like (actually pyo3 int enum) with values FRESH, EVENTUAL, STALE. We do not accept string consistency levels; the type system catches typos.

M3 — Embeddings

import ruvector

emb = ruvector.Embedder.from_pretrained("all-MiniLM-L6-v2")  # downloads once, caches
vec = emb.embed("hello world")                                # np.ndarray, shape (384,)
batch = emb.embed_batch(["hello", "world", "foo bar"])        # shape (3, 384)
emb.dim   # 384

# CNN-image embedder (ADR-013). Same shape; takes (H, W, 3) uint8.
img_emb = ruvector.Embedder.from_pretrained("mobilenetv3-small")
v = img_emb.embed_image(np.zeros((224, 224, 3), dtype=np.uint8))  # (576,)

One Embedder class, two factory paths (from_pretrained for text, same name for image — distinguished by model identifier prefix). All results are np.ndarray[np.float32] ready to feed into a RabitqIndex or RuLake. This is the contract that makes "RAG in 5 lines" possible (see acceptance gate G1 in 06-decision-record).

M4 — A2A client

import ruvector
import asyncio

# Discover a peer (verifies signature per ADR-159 r2 identity).
async def main():
    client = await ruvector.A2aClient.connect("https://peer.example.com")
    print(client.card.skills)            # list[AgentSkill]
    print(client.card.agent_id)          # SHAKE-256(pubkey) per ADR-159

    # Send a task.
    spec = ruvector.TaskSpec(
        skill="rag.query",
        input="What is RaBitQ?",
        policy=ruvector.TaskPolicy(
            max_tokens=4_000,
            max_cost_usd=0.10,
            max_duration_ms=30_000,
        ),
    )
    task = await client.send_task(spec)
    print(task.status, task.id)

    # Stream task updates (SSE under the hood).
    async for update in client.stream_task(task.id):
        if update.kind == "artifact":
            print("artifact:", update.artifact)
        elif update.kind == "status":
            print("status:", update.status)

    # Cancel.
    await client.cancel_task(task.id)

asyncio.run(main())

stream_task returns an AsyncIterator[TaskUpdate]. TaskUpdate is a tagged union exposed as a discriminated dataclass-like Python type (kind field).

We do not expose the A2A server in v1 — Python users embed an rvAgent server via the Rust binary; the Python SDK is client-only. This keeps the wheel small and avoids dragging axum + tower into Python's process.

Error hierarchy

A single root, with subclasses that map onto the Rust error variants:

ruvector.RuVectorError                   # root, Exception subclass
├── ruvector.IndexError                   # ruvector_rabitq::RabitqError
│   ├── ruvector.DimensionMismatch        # vector dim != index dim
│   ├── ruvector.EmptyIndex               # search on n=0
│   └── ruvector.PersistError             # save/load IO + format errors
├── ruvector.LakeError                    # ruvector_rulake::RuLakeError
│   ├── ruvector.BackendError             # adapter failure, bubbles backend id
│   ├── ruvector.CacheMissError           # consistency=STRICT and miss happened
│   └── ruvector.WitnessMismatch          # bundle witness != cache witness
├── ruvector.A2aError                     # rvagent_a2a::A2aError
│   ├── ruvector.CardSignatureInvalid     # ADR-159 r2 verify-on-discover failure
│   ├── ruvector.PolicyViolation          # TaskPolicy guard fired
│   ├── ruvector.BudgetExceeded           # GlobalBudget gate fired
│   └── ruvector.TransportError           # HTTP / SSE plumbing
└── ruvector.EmbedError                    # model download / inference failures

Names are stable across milestones. RuVectorError is what users put in their except blocks if they don't care which subsystem failed.

Pythonic conveniences

Operation Behavior
len(idx) n vectors
idx[id] returns the original f32 vector if RabitqPlusIndex (which keeps originals); raises LookupError for RabitqIndex (which doesn't)
for v in idx iterates (id, vector) pairs, only on indexes that retain originals
idx in lake __contains__ checks if a RabitqPlusIndex is currently primed in a RuLake cache (used for "did my warmup work?")
np.asarray(idx) only on indexes that retain originals; returns the (n, d) float32 matrix without a copy
with lake.session() as s optional context manager for batched ops; commits caches on exit
repr(idx) shows variant, n, d, memory_bytes — diagnostic-friendly
idx == idx2 structural equality if both come from same data + seed (matches the determinism guarantee in ruvector-rabitq/src/lib.rs §Guarantees)

NumPy interop is a first-class contract

  • Every vector input accepts np.ndarray[np.float32] directly.
  • list[float] / tuple[float, ...] / Python sequences are accepted for ergonomic one-shot calls but copy through a NumPy buffer internally (documented as slower).
  • Outputs are np.ndarray[np.float32] for vectors and Python int / float scalars for ids and scores.
  • We do not invent a Vector class. NumPy is the lingua franca of Python ML.

Versioning

ruvector.__version__ mirrors the Cargo workspace version; the PyPI release is cut at the same time as the Rust 2.x.y release. We use trailing .postN for Python-only fixes (e.g. stub corrections) without a Rust source change.