mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-25 15:03:46 +00:00
Seven-file design review at docs/sdk/ covering the binding strategy,
API surface, M1-M4 milestones, risks, and a one-page decision record
for shipping a Python SDK.
Recommended path: **PyO3 + maturin, single in-tree
`crates/ruvector-py/` cdylib, abi3-py39 wheel via cibuildwheel,
`pyo3-asyncio` over a singleton tokio runtime.**
Why:
- The existing `*-node` NAPI templates (e.g.
`crates/ruvector-diskann-node/src/lib.rs`) already prove out the
opaque-handle + `Arc<RwLock<…>>` shape PyO3 mirrors line-for-line —
~70% port, ~30% lifetime gymnastics.
- abi3 collapses the wheel matrix from ~25 (cpython36 × 5 platforms)
to 5 (one wheel per platform, all py3.9+).
- Singleton tokio runtime avoids the "one runtime per call" overhead
while remaining compatible with asyncio + uvloop.
Milestone shape (each with explicit scope + acceptance tests):
M1 — RaBitQ-only Python wheel. Just the published
`ruvector-rabitq` crate exposed via PyO3. Smallest possible
useful surface. ~600 LoC, 3 weeks.
M2 — ruLake. Async via pyo3-asyncio. Witness verify exposed.
~900 LoC, 4 weeks.
M3 — Embeddings + ML helpers. Wrap consumer-facing parts of
`ruvector-cnn` / `ruvllm`. ~700 LoC, 3 weeks.
M4 — A2A agent client. Wrap `rvagent-a2a` so Python apps can
dispatch tasks to A2A peers, including signed AgentCard
discovery. ~800 LoC, 4 weeks.
Three acceptance gates that gate the whole effort:
1. A Python user can do RAG over 1 M vectors in <5 lines.
2. An asyncio user can stream A2A task updates without thread
fights.
3. `pip install ruvector` takes <10 s on a stock machine.
Top 3 risks identified:
R1 — tokio runtime + PyO3 + asyncio/uvloop interop. Mitigation:
single lazy runtime, `pyo3-asyncio` shim.
R3 — wheel size. M4 budget is 22 MB; A2A deps (axum + reqwest +
rustls) could blow it. Mitigation: feature-gate axum/reqwest
behind `agent` extra; default install is rabitq + rulake only.
R7 — PyPI name squat on `ruvector`. Mitigation: register placeholder
before M1 ships.
Nuance discovered: `ruvector-rabitq` has **no** sibling `*-node` or
`*-wasm` crate — unlike most consumer crates. M1 is therefore clean
greenfield: no parity-pressure to match a flaky NAPI signature, and
it confirms rabitq alone is the right starter target rather than the
umbrella `ruvector` crate the npm package wraps.
Planning doc only; no implementation.
Co-Authored-By: claude-flow <ruv@ruv.net>
253 lines
9.2 KiB
Markdown
253 lines
9.2 KiB
Markdown
# 03 — Python API Surface
|
|
|
|
The user-visible Python API across all four milestones. Everything in this
|
|
document is what gets typed at a REPL or in a notebook. Implementation
|
|
details (PyO3 attributes, GIL handling) are in 02-strategy.
|
|
|
|
## Top-level layout
|
|
|
|
```python
|
|
import ruvector
|
|
|
|
# Vector indexes (M1) — backed by ruvector-rabitq
|
|
ruvector.FlatF32Index
|
|
ruvector.RabitqIndex
|
|
ruvector.RabitqPlusIndex
|
|
ruvector.RabitqAsymIndex
|
|
|
|
# Cache-first execution fabric (M2) — backed by ruvector-rulake
|
|
ruvector.RuLake
|
|
ruvector.LocalBackend
|
|
ruvector.FsBackend
|
|
ruvector.Consistency # enum: FRESH | EVENTUAL | STALE
|
|
ruvector.RuLakeBundle
|
|
|
|
# Embedding (M3) — backed by ruvector-cnn + ONNX glue
|
|
ruvector.Embedder
|
|
|
|
# Agent peer protocol (M4) — backed by rvagent-a2a
|
|
ruvector.A2aClient
|
|
ruvector.AgentCard
|
|
ruvector.TaskSpec
|
|
ruvector.Task
|
|
|
|
# Cross-cutting
|
|
ruvector.RuVectorError # base exception
|
|
ruvector.__version__ # mirrors Cargo workspace version
|
|
ruvector.cpu_features() # runtime SIMD probe
|
|
```
|
|
|
|
Every public name above is exported from the compiled extension and
|
|
re-exported by `python/ruvector/__init__.py`.
|
|
|
|
## M1 — RaBitQ vector index
|
|
|
|
```python
|
|
import numpy as np
|
|
import ruvector
|
|
|
|
# Build from an (n, d) float32 array. Dtype is enforced; mismatch raises.
|
|
vectors = np.random.randn(100_000, 768).astype(np.float32)
|
|
idx = ruvector.RabitqPlusIndex.build(
|
|
vectors,
|
|
seed=42,
|
|
rerank_factor=20, # ADR-154 recommended for 100% recall@10 at D=128
|
|
)
|
|
|
|
# Search a single query — returns a list of (id, score) named tuples.
|
|
query = np.random.randn(768).astype(np.float32)
|
|
hits = idx.search(query, k=10)
|
|
for h in hits:
|
|
print(h.id, h.score)
|
|
|
|
# Pythonic conveniences
|
|
len(idx) # n vectors
|
|
idx.dim # 768
|
|
idx.memory_bytes # honest accounting (matches AnnIndex::memory_bytes)
|
|
idx.save("index.rbpx")
|
|
idx2 = ruvector.RabitqPlusIndex.load("index.rbpx")
|
|
|
|
# Add after build (mirrors AnnIndex::add — appends, must match dim).
|
|
idx.add(id=100_001, vector=np.random.randn(768).astype(np.float32))
|
|
```
|
|
|
|
`build()` is a classmethod, takes `np.ndarray` directly (no list copy),
|
|
releases the GIL, runs in parallel via rayon. Ergonomic but not magic:
|
|
non-contiguous, non-`float32` arrays raise immediately with a clear
|
|
message rather than silently copying.
|
|
|
|
The four index types share an `AnnIndex`-shaped Python protocol but we
|
|
do **not** expose a Python ABC; the four classes are concrete.
|
|
`isinstance(idx, ruvector.AnyIndex)` works via a runtime-checkable
|
|
`Protocol` in the stub.
|
|
|
|
## M2 — RuLake (cache-first vector fabric)
|
|
|
|
```python
|
|
import ruvector
|
|
import asyncio
|
|
|
|
# Builder pattern mirrors RuLake::new + with_*.
|
|
lake = (
|
|
ruvector.RuLake.builder()
|
|
.rerank_factor(20)
|
|
.rotation_seed(42)
|
|
.max_cache_entries(1_000_000)
|
|
.consistency(ruvector.Consistency.FRESH)
|
|
.build()
|
|
)
|
|
|
|
# Backends are first-class Python objects.
|
|
backend = ruvector.LocalBackend(name="hot-shard")
|
|
backend.upsert("docs", ids=[1, 2, 3], vectors=np.random.randn(3, 768).astype(np.float32))
|
|
lake.register_backend(backend)
|
|
|
|
# Sync search.
|
|
hits = lake.search_one(collection="docs", query=query, k=10)
|
|
|
|
# Async search — no thread fight; runs on the extension's tokio runtime.
|
|
async def main():
|
|
hits = await lake.search_one_async(collection="docs", query=query, k=10)
|
|
print([(h.backend, h.id, h.score) for h in hits])
|
|
|
|
asyncio.run(main())
|
|
|
|
# Federated search across all backends — fanout + merge by score.
|
|
hits = lake.search_federated(collection="docs", query=query, k=10)
|
|
|
|
# Bundle witness operations — surfaces the SHA3 witness from RuLake::publish_bundle.
|
|
witness = lake.publish_bundle("docs", out_dir="/tmp/bundle/")
|
|
result = lake.refresh_from_bundle_dir(key=("local", "docs"), dir="/tmp/bundle/")
|
|
assert result == ruvector.RefreshResult.UP_TO_DATE # or INVALIDATED, BUNDLE_MISSING
|
|
```
|
|
|
|
The `(backend_id, collection)` tuple that Rust uses as a `CacheKey` is
|
|
exposed as a Python tuple — no custom class, no surprise.
|
|
|
|
`Consistency` is `enum.Enum`-like (actually `pyo3` int enum) with values
|
|
`FRESH`, `EVENTUAL`, `STALE`. We do **not** accept string consistency
|
|
levels; the type system catches typos.
|
|
|
|
## M3 — Embeddings
|
|
|
|
```python
|
|
import ruvector
|
|
|
|
emb = ruvector.Embedder.from_pretrained("all-MiniLM-L6-v2") # downloads once, caches
|
|
vec = emb.embed("hello world") # np.ndarray, shape (384,)
|
|
batch = emb.embed_batch(["hello", "world", "foo bar"]) # shape (3, 384)
|
|
emb.dim # 384
|
|
|
|
# CNN-image embedder (ADR-013). Same shape; takes (H, W, 3) uint8.
|
|
img_emb = ruvector.Embedder.from_pretrained("mobilenetv3-small")
|
|
v = img_emb.embed_image(np.zeros((224, 224, 3), dtype=np.uint8)) # (576,)
|
|
```
|
|
|
|
One `Embedder` class, two factory paths (`from_pretrained` for text,
|
|
same name for image — distinguished by model identifier prefix). All
|
|
results are `np.ndarray[np.float32]` ready to feed into a `RabitqIndex`
|
|
or `RuLake`. This is the contract that makes "RAG in 5 lines" possible
|
|
(see acceptance gate G1 in 06-decision-record).
|
|
|
|
## M4 — A2A client
|
|
|
|
```python
|
|
import ruvector
|
|
import asyncio
|
|
|
|
# Discover a peer (verifies signature per ADR-159 r2 identity).
|
|
async def main():
|
|
client = await ruvector.A2aClient.connect("https://peer.example.com")
|
|
print(client.card.skills) # list[AgentSkill]
|
|
print(client.card.agent_id) # SHAKE-256(pubkey) per ADR-159
|
|
|
|
# Send a task.
|
|
spec = ruvector.TaskSpec(
|
|
skill="rag.query",
|
|
input="What is RaBitQ?",
|
|
policy=ruvector.TaskPolicy(
|
|
max_tokens=4_000,
|
|
max_cost_usd=0.10,
|
|
max_duration_ms=30_000,
|
|
),
|
|
)
|
|
task = await client.send_task(spec)
|
|
print(task.status, task.id)
|
|
|
|
# Stream task updates (SSE under the hood).
|
|
async for update in client.stream_task(task.id):
|
|
if update.kind == "artifact":
|
|
print("artifact:", update.artifact)
|
|
elif update.kind == "status":
|
|
print("status:", update.status)
|
|
|
|
# Cancel.
|
|
await client.cancel_task(task.id)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
`stream_task` returns an `AsyncIterator[TaskUpdate]`. `TaskUpdate` is a
|
|
tagged union exposed as a discriminated dataclass-like Python type
|
|
(`kind` field).
|
|
|
|
We do **not** expose the A2A *server* in v1 — Python users embed an
|
|
rvAgent server via the Rust binary; the Python SDK is client-only. This
|
|
keeps the wheel small and avoids dragging axum + tower into Python's
|
|
process.
|
|
|
|
## Error hierarchy
|
|
|
|
A single root, with subclasses that map onto the Rust error variants:
|
|
|
|
```
|
|
ruvector.RuVectorError # root, Exception subclass
|
|
├── ruvector.IndexError # ruvector_rabitq::RabitqError
|
|
│ ├── ruvector.DimensionMismatch # vector dim != index dim
|
|
│ ├── ruvector.EmptyIndex # search on n=0
|
|
│ └── ruvector.PersistError # save/load IO + format errors
|
|
├── ruvector.LakeError # ruvector_rulake::RuLakeError
|
|
│ ├── ruvector.BackendError # adapter failure, bubbles backend id
|
|
│ ├── ruvector.CacheMissError # consistency=STRICT and miss happened
|
|
│ └── ruvector.WitnessMismatch # bundle witness != cache witness
|
|
├── ruvector.A2aError # rvagent_a2a::A2aError
|
|
│ ├── ruvector.CardSignatureInvalid # ADR-159 r2 verify-on-discover failure
|
|
│ ├── ruvector.PolicyViolation # TaskPolicy guard fired
|
|
│ ├── ruvector.BudgetExceeded # GlobalBudget gate fired
|
|
│ └── ruvector.TransportError # HTTP / SSE plumbing
|
|
└── ruvector.EmbedError # model download / inference failures
|
|
```
|
|
|
|
Names are stable across milestones. `RuVectorError` is what users put
|
|
in their `except` blocks if they don't care which subsystem failed.
|
|
|
|
## Pythonic conveniences
|
|
|
|
| Operation | Behavior |
|
|
|---|---|
|
|
| `len(idx)` | n vectors |
|
|
| `idx[id]` | returns the original f32 vector if `RabitqPlusIndex` (which keeps originals); raises `LookupError` for `RabitqIndex` (which doesn't) |
|
|
| `for v in idx` | iterates `(id, vector)` pairs, only on indexes that retain originals |
|
|
| `idx in lake` | `__contains__` checks if a `RabitqPlusIndex` is currently primed in a `RuLake` cache (used for "did my warmup work?") |
|
|
| `np.asarray(idx)` | only on indexes that retain originals; returns the (n, d) float32 matrix without a copy |
|
|
| `with lake.session() as s` | optional context manager for batched ops; commits caches on exit |
|
|
| `repr(idx)` | shows variant, n, d, memory_bytes — diagnostic-friendly |
|
|
| `idx == idx2` | structural equality if both come from same data + seed (matches the determinism guarantee in `ruvector-rabitq/src/lib.rs` §Guarantees) |
|
|
|
|
## NumPy interop is a first-class contract
|
|
|
|
- Every vector input accepts `np.ndarray[np.float32]` directly.
|
|
- `list[float]` / `tuple[float, ...]` / Python sequences are accepted
|
|
for ergonomic one-shot calls but copy through a NumPy buffer
|
|
internally (documented as slower).
|
|
- Outputs are `np.ndarray[np.float32]` for vectors and Python `int` /
|
|
`float` scalars for ids and scores.
|
|
- We do not invent a `Vector` class. NumPy is the lingua franca of
|
|
Python ML.
|
|
|
|
## Versioning
|
|
|
|
`ruvector.__version__` mirrors the Cargo workspace version; the PyPI
|
|
release is cut at the same time as the Rust 2.x.y release. We use
|
|
trailing `.postN` for Python-only fixes (e.g. stub corrections) without
|
|
a Rust source change.
|