ruvector/docs/sdk/03-api-surface.md
ruvnet f6c684aba0 docs(sdk): add deep planning review for ruvector Python SDK
Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.

Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**

Why:
- The existing `*-node` NAPI templates (e.g.
  `crates/ruvector-diskann-node/src/lib.rs`) already prove out the
  opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
  ~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
  to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
  while remaining compatible with asyncio + uvloop.

Milestone shape (each with explicit scope + acceptance tests):

  M1 — RaBitQ-only Python wheel. Just the published
       `ruvector-rabitq` crate exposed via PyO3. Smallest possible
       useful surface. ~600 LoC, 3 weeks.
  M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
       ~900 LoC, 4 weeks.
  M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
       `ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
  M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
       dispatch tasks to A2A peers, including signed AgentCard
       discovery. ~800 LoC, 4 weeks.

Three acceptance gates that gate the whole effort:
  1. A Python user can do RAG over 1 M vectors in <5 lines.
  2. An asyncio user can stream A2A task updates without thread
     fights.
  3. `pip install ruvector` takes <10 s on a stock machine.

Top 3 risks identified:
  R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
       single lazy runtime, `pyo3-asyncio` shim.
  R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
       rustls) could blow it. Mitigation: feature-gate axum/reqwest
       behind `agent` extra; default install is rabitq + rulake only.
  R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
       before M1 ships.

Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.

Planning doc only; no implementation.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-25 20:28:54 -04:00

253 lines
9.2 KiB
Markdown

# 03 — Python API Surface
The user-visible Python API across all four milestones. Everything in this
document is what gets typed at a REPL or in a notebook. Implementation
details (PyO3 attributes, GIL handling) are in 02-strategy.
## Top-level layout
```python
import ruvector
# Vector indexes (M1) — backed by ruvector-rabitq
ruvector.FlatF32Index
ruvector.RabitqIndex
ruvector.RabitqPlusIndex
ruvector.RabitqAsymIndex
# Cache-first execution fabric (M2) — backed by ruvector-rulake
ruvector.RuLake
ruvector.LocalBackend
ruvector.FsBackend
ruvector.Consistency # enum: FRESH | EVENTUAL | STALE
ruvector.RuLakeBundle
# Embedding (M3) — backed by ruvector-cnn + ONNX glue
ruvector.Embedder
# Agent peer protocol (M4) — backed by rvagent-a2a
ruvector.A2aClient
ruvector.AgentCard
ruvector.TaskSpec
ruvector.Task
# Cross-cutting
ruvector.RuVectorError # base exception
ruvector.__version__ # mirrors Cargo workspace version
ruvector.cpu_features() # runtime SIMD probe
```
Every public name above is exported from the compiled extension and
re-exported by `python/ruvector/__init__.py`.
## M1 — RaBitQ vector index
```python
import numpy as np
import ruvector
# Build from an (n, d) float32 array. Dtype is enforced; mismatch raises.
vectors = np.random.randn(100_000, 768).astype(np.float32)
idx = ruvector.RabitqPlusIndex.build(
vectors,
seed=42,
rerank_factor=20, # ADR-154 recommended for 100% recall@10 at D=128
)
# Search a single query — returns a list of (id, score) named tuples.
query = np.random.randn(768).astype(np.float32)
hits = idx.search(query, k=10)
for h in hits:
print(h.id, h.score)
# Pythonic conveniences
len(idx) # n vectors
idx.dim # 768
idx.memory_bytes # honest accounting (matches AnnIndex::memory_bytes)
idx.save("index.rbpx")
idx2 = ruvector.RabitqPlusIndex.load("index.rbpx")
# Add after build (mirrors AnnIndex::add — appends, must match dim).
idx.add(id=100_001, vector=np.random.randn(768).astype(np.float32))
```
`build()` is a classmethod, takes `np.ndarray` directly (no list copy),
releases the GIL, runs in parallel via rayon. Ergonomic but not magic:
non-contiguous, non-`float32` arrays raise immediately with a clear
message rather than silently copying.
The four index types share an `AnnIndex`-shaped Python protocol but we
do **not** expose a Python ABC; the four classes are concrete.
`isinstance(idx, ruvector.AnyIndex)` works via a runtime-checkable
`Protocol` in the stub.
## M2 — RuLake (cache-first vector fabric)
```python
import ruvector
import asyncio
# Builder pattern mirrors RuLake::new + with_*.
lake = (
ruvector.RuLake.builder()
.rerank_factor(20)
.rotation_seed(42)
.max_cache_entries(1_000_000)
.consistency(ruvector.Consistency.FRESH)
.build()
)
# Backends are first-class Python objects.
backend = ruvector.LocalBackend(name="hot-shard")
backend.upsert("docs", ids=[1, 2, 3], vectors=np.random.randn(3, 768).astype(np.float32))
lake.register_backend(backend)
# Sync search.
hits = lake.search_one(collection="docs", query=query, k=10)
# Async search — no thread fight; runs on the extension's tokio runtime.
async def main():
hits = await lake.search_one_async(collection="docs", query=query, k=10)
print([(h.backend, h.id, h.score) for h in hits])
asyncio.run(main())
# Federated search across all backends — fanout + merge by score.
hits = lake.search_federated(collection="docs", query=query, k=10)
# Bundle witness operations — surfaces the SHA3 witness from RuLake::publish_bundle.
witness = lake.publish_bundle("docs", out_dir="/tmp/bundle/")
result = lake.refresh_from_bundle_dir(key=("local", "docs"), dir="/tmp/bundle/")
assert result == ruvector.RefreshResult.UP_TO_DATE # or INVALIDATED, BUNDLE_MISSING
```
The `(backend_id, collection)` tuple that Rust uses as a `CacheKey` is
exposed as a Python tuple — no custom class, no surprise.
`Consistency` is `enum.Enum`-like (actually `pyo3` int enum) with values
`FRESH`, `EVENTUAL`, `STALE`. We do **not** accept string consistency
levels; the type system catches typos.
## M3 — Embeddings
```python
import ruvector
emb = ruvector.Embedder.from_pretrained("all-MiniLM-L6-v2") # downloads once, caches
vec = emb.embed("hello world") # np.ndarray, shape (384,)
batch = emb.embed_batch(["hello", "world", "foo bar"]) # shape (3, 384)
emb.dim # 384
# CNN-image embedder (ADR-013). Same shape; takes (H, W, 3) uint8.
img_emb = ruvector.Embedder.from_pretrained("mobilenetv3-small")
v = img_emb.embed_image(np.zeros((224, 224, 3), dtype=np.uint8)) # (576,)
```
One `Embedder` class, two factory paths (`from_pretrained` for text,
same name for image — distinguished by model identifier prefix). All
results are `np.ndarray[np.float32]` ready to feed into a `RabitqIndex`
or `RuLake`. This is the contract that makes "RAG in 5 lines" possible
(see acceptance gate G1 in 06-decision-record).
## M4 — A2A client
```python
import ruvector
import asyncio
# Discover a peer (verifies signature per ADR-159 r2 identity).
async def main():
client = await ruvector.A2aClient.connect("https://peer.example.com")
print(client.card.skills) # list[AgentSkill]
print(client.card.agent_id) # SHAKE-256(pubkey) per ADR-159
# Send a task.
spec = ruvector.TaskSpec(
skill="rag.query",
input="What is RaBitQ?",
policy=ruvector.TaskPolicy(
max_tokens=4_000,
max_cost_usd=0.10,
max_duration_ms=30_000,
),
)
task = await client.send_task(spec)
print(task.status, task.id)
# Stream task updates (SSE under the hood).
async for update in client.stream_task(task.id):
if update.kind == "artifact":
print("artifact:", update.artifact)
elif update.kind == "status":
print("status:", update.status)
# Cancel.
await client.cancel_task(task.id)
asyncio.run(main())
```
`stream_task` returns an `AsyncIterator[TaskUpdate]`. `TaskUpdate` is a
tagged union exposed as a discriminated dataclass-like Python type
(`kind` field).
We do **not** expose the A2A *server* in v1 — Python users embed an
rvAgent server via the Rust binary; the Python SDK is client-only. This
keeps the wheel small and avoids dragging axum + tower into Python's
process.
## Error hierarchy
A single root, with subclasses that map onto the Rust error variants:
```
ruvector.RuVectorError # root, Exception subclass
├── ruvector.IndexError # ruvector_rabitq::RabitqError
│ ├── ruvector.DimensionMismatch # vector dim != index dim
│ ├── ruvector.EmptyIndex # search on n=0
│ └── ruvector.PersistError # save/load IO + format errors
├── ruvector.LakeError # ruvector_rulake::RuLakeError
│ ├── ruvector.BackendError # adapter failure, bubbles backend id
│ ├── ruvector.CacheMissError # consistency=STRICT and miss happened
│ └── ruvector.WitnessMismatch # bundle witness != cache witness
├── ruvector.A2aError # rvagent_a2a::A2aError
│ ├── ruvector.CardSignatureInvalid # ADR-159 r2 verify-on-discover failure
│ ├── ruvector.PolicyViolation # TaskPolicy guard fired
│ ├── ruvector.BudgetExceeded # GlobalBudget gate fired
│ └── ruvector.TransportError # HTTP / SSE plumbing
└── ruvector.EmbedError # model download / inference failures
```
Names are stable across milestones. `RuVectorError` is what users put
in their `except` blocks if they don't care which subsystem failed.
## Pythonic conveniences
| Operation | Behavior |
|---|---|
| `len(idx)` | n vectors |
| `idx[id]` | returns the original f32 vector if `RabitqPlusIndex` (which keeps originals); raises `LookupError` for `RabitqIndex` (which doesn't) |
| `for v in idx` | iterates `(id, vector)` pairs, only on indexes that retain originals |
| `idx in lake` | `__contains__` checks if a `RabitqPlusIndex` is currently primed in a `RuLake` cache (used for "did my warmup work?") |
| `np.asarray(idx)` | only on indexes that retain originals; returns the (n, d) float32 matrix without a copy |
| `with lake.session() as s` | optional context manager for batched ops; commits caches on exit |
| `repr(idx)` | shows variant, n, d, memory_bytes — diagnostic-friendly |
| `idx == idx2` | structural equality if both come from same data + seed (matches the determinism guarantee in `ruvector-rabitq/src/lib.rs` §Guarantees) |
## NumPy interop is a first-class contract
- Every vector input accepts `np.ndarray[np.float32]` directly.
- `list[float]` / `tuple[float, ...]` / Python sequences are accepted
for ergonomic one-shot calls but copy through a NumPy buffer
internally (documented as slower).
- Outputs are `np.ndarray[np.float32]` for vectors and Python `int` /
`float` scalars for ids and scores.
- We do not invent a `Vector` class. NumPy is the lingua franca of
Python ML.
## Versioning
`ruvector.__version__` mirrors the Cargo workspace version; the PyPI
release is cut at the same time as the Rust 2.x.y release. We use
trailing `.postN` for Python-only fixes (e.g. stub corrections) without
a Rust source change.