ruvector/crates/ruvector-py/tests/test_smoke.py
ruvnet e7f5a391f8 feat(ruvector-py): Python SDK M1 — RaBitQ wheel via PyO3 + maturin
First milestone of the ruvector Python SDK per
docs/sdk/04-milestones.md § "M1 — RaBitQ-only Python wheel". A new
workspace crate `crates/ruvector-py/` exposes ruvector-rabitq as a
Python extension module via PyO3 + maturin with an abi3-py39 wheel
target.

## Surface

```python
import numpy as np
import ruvector

vectors = np.random.randn(10_000, 768).astype(np.float32)
idx = ruvector.RabitqIndex.build(vectors, rerank_factor=20)
results = idx.search(vectors[0], k=10)  # → list[(id, distance)]

idx.save("vectors.rbpx")
idx2 = ruvector.RabitqIndex.load("vectors.rbpx")
```

## What ships

- `Cargo.toml`: cdylib crate, pyo3 0.22 with `extension-module` +
  `abi3-py39`, numpy 0.22, path dep on `ruvector-rabitq`.
- `pyproject.toml`: maturin build backend, `python-source = "python"`,
  `module-name = "ruvector._native"`. PyPI name: `ruvector`.
- `src/lib.rs`: defines the `_native` Python module, registers the
  `RabitqIndex` class and `RuVectorError` exception.
- `src/rabitq.rs`: `RabitqIndex` wrapping `RabitqPlusIndex` with
  `build` / `search` / `save` / `load` / `__len__` / `__repr__`.
  All hot paths release the GIL via `py.allow_threads`.
- `src/error.rs`: maps `RabitqError` → `RuVectorError(PyException)`.
- `python/ruvector/__init__.py`: thin re-export shim from `_native`.
- `python/ruvector/py.typed`: PEP 561 marker.
- Type stubs: `python/ruvector/__init__.pyi` + `stubs/ruvector/__init__.pyi`.
- `tests/test_smoke.py`: pytest coverage of build/search/save/load,
  dimension-mismatch error, len/repr, abi3 marker.
- `README.md`: install instructions + 30-second example.

## Real ruvector-rabitq API used

The plan's M1 sketch matched closely. Concrete surface:
- `RabitqPlusIndex::from_vectors_parallel(dim, seed, rerank_factor, items)`
  — used in `build()`. Added `seed` kwarg (default 42) since the ctor
  requires it.
- `idx.search_with_rerank(query, k, rerank_factor) -> Vec<SearchResult>`
  — used in `search()`.
- `persist::save_index` / `persist::load_index` / `persist::MAGIC`
  — `.rbpx` v1 wire format. `load()` peeks the 24-byte header to
  recover the seed before calling `load_index`.
- `idx.export_items()` — used in `save()` because the seed-based
  format needs the items handed back; `RabitqPlusIndex` doesn't
  expose `originals_flat` directly.

## Verification

  cargo build -p ruvector-py            → clean
  cargo clippy -p ruvector-py --all-targets --no-deps -- -D warnings  → exit 0
  cargo test -p ruvector-py             → 0 tests, 0 failed (no Rust unit
                                          tests yet; logic is in PyO3
                                          methods that need the Python
                                          interpreter)

`maturin develop` + `pytest` + `mypy --strict` not run — the
sandbox doesn't have those binaries. The Python tests are written
to the M1 acceptance shape and will run as soon as maturin is
present in the dev env.

## Deviations from the M1 plan (docs/sdk/04-milestones.md)

1. One `RabitqIndex` class instead of the plan's four
   (`FlatF32Index`, `RabitqIndex`, `RabitqPlusIndex`, `RabitqAsymIndex`).
   Adding the others is mechanical follow-up — same register pattern.
2. Single `RuVectorError` exception instead of the subclass tree
   (`DimensionMismatch`, `EmptyIndex`, `PersistError`). Subclasses
   are M2+ scope per the plan.
3. No `_typing.py`, no `_version.py`. `__version__` sourced from
   `env!("CARGO_PKG_VERSION")` via the compiled module.
4. No CI workflow, no Sphinx, no notebook — deferred. Scoped to
   "everything needed for pip install to work".
5. `build()` takes a `seed` kwarg (default 42) — not in the M1
   sketch but required by the underlying ctor.

## Two pyo3 0.22 quirks worth flagging

- `pyo3::create_exception!` macro emits `cfg(feature = "gil-refs")`
  unexpected_cfg warnings. Worked around with `#![allow(unexpected_cfgs)]`
  at crate root, comment explains the upstream issue.
- `#[pymethods]` macro expansion triggers
  `clippy::useless_conversion` false-positives on `?`-on-PyResult.
  Suppressed at crate root with comment.

LoC total: 881 (Cargo.lock excluded; 768 source + 113 lockfile drift).
M1 plan budgeted ~1300 — under because we shipped the user-requested
single-class scope, not the plan's full surface.

Refs: docs/sdk/04-milestones.md M1, docs/sdk/02-strategy.md

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-25 20:41:52 -04:00

120 lines
3.8 KiB
Python

"""M1 smoke tests for ``ruvector``.
These exercise the user-visible surface of the wheel:
- ``ruvector.__version__`` is set
- ``RabitqIndex.build`` accepts an ``(n, dim)`` float32 NumPy array
- ``RabitqIndex.search`` returns ``k`` ``(id, score)`` tuples
- first-result self-search returns id 0 at distance ~0
- dimension mismatch raises ``RuVectorError``
- save/load roundtrip preserves search results
Run via ``pytest tests/`` after ``maturin develop`` (see README).
"""
from __future__ import annotations
import os
import tempfile
import numpy as np
import pytest
import ruvector
def test_version() -> None:
assert ruvector.__version__
# Cargo.toml ships 0.1.0; if you bump there, bump here.
assert ruvector.__version__ == "0.1.0"
def test_build_and_search() -> None:
rng = np.random.default_rng(42)
n, dim = 1000, 128
vectors = rng.standard_normal((n, dim), dtype=np.float32)
idx = ruvector.RabitqIndex.build(vectors)
assert len(idx) == n
assert idx.dim == dim
assert idx.memory_bytes > 0
assert idx.rerank_factor == 20 # default
query = vectors[0]
results = idx.search(query, k=10)
assert len(results) == 10
# First hit must be the query vector itself: id 0, distance ~0 after
# exact f32 rerank.
top_id, top_dist = results[0]
assert top_id == 0, f"expected self-match at id 0, got id {top_id}"
assert top_dist < 1e-3, f"self-distance {top_dist} should be ~0"
# Scores must be ascending (squared L2 — lower is closer).
scores = [s for _, s in results]
assert scores == sorted(scores), f"scores not ascending: {scores}"
def test_repr_is_diagnostic() -> None:
rng = np.random.default_rng(0)
vectors = rng.standard_normal((50, 32), dtype=np.float32)
idx = ruvector.RabitqIndex.build(vectors)
r = repr(idx)
assert "RabitqIndex" in r
assert "n=50" in r
assert "dim=32" in r
def test_error_on_dim_mismatch() -> None:
rng = np.random.default_rng(42)
vectors = rng.standard_normal((100, 64), dtype=np.float32)
idx = ruvector.RabitqIndex.build(vectors)
bad_query = rng.standard_normal(32, dtype=np.float32)
with pytest.raises(ruvector.RuVectorError):
idx.search(bad_query, k=10)
def test_error_on_wrong_dtype() -> None:
# float64 must not silently coerce — it should hit the boundary
# PyO3 numpy crate's strict dtype check.
rng = np.random.default_rng(0)
vectors = rng.standard_normal((10, 8)) # float64
with pytest.raises((TypeError, ValueError)):
ruvector.RabitqIndex.build(vectors) # type: ignore[arg-type]
def test_save_load_roundtrip() -> None:
rng = np.random.default_rng(7)
n, dim = 200, 64
vectors = rng.standard_normal((n, dim), dtype=np.float32)
idx = ruvector.RabitqIndex.build(vectors, rerank_factor=5, seed=1234)
query = rng.standard_normal(dim, dtype=np.float32)
before = idx.search(query, k=5)
with tempfile.TemporaryDirectory() as td:
path = os.path.join(td, "idx.rbpx")
idx.save(path)
loaded = ruvector.RabitqIndex.load(path)
assert len(loaded) == n
assert loaded.dim == dim
assert loaded.rerank_factor == 5
after = loaded.search(query, k=5)
# `(dim, seed, items)` deterministic rebuild → bit-identical search.
assert before == after, f"roundtrip changed results: {before} vs {after}"
def test_search_with_per_call_rerank() -> None:
rng = np.random.default_rng(99)
n, dim = 500, 64
vectors = rng.standard_normal((n, dim), dtype=np.float32)
idx = ruvector.RabitqIndex.build(vectors, rerank_factor=2)
query = vectors[10]
# Override per call — should still self-match at id 10 with distance ~0.
results = idx.search(query, k=3, rerank_factor=20)
assert results[0][0] == 10
assert results[0][1] < 1e-3