mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-30 03:53:34 +00:00
feat(adr): update ADR-032 with invariants, contracts, failure modes, and decision matrix
Adds: single writer rule, crash ordering with epoch reconciliation, explicit backend selection (no silent fallback), cross-platform compat rule, phase contracts with success metrics, failure mode test matrix, hybrid persistence decision matrix, implementation checklist. Closes #169 Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
1be62d3429
commit
ad477ed11b
1 changed files with 175 additions and 15 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# ADR-032: RVF WASM Integration into npx ruvector and rvlite
|
||||
|
||||
**Status**: Proposed
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-02-14
|
||||
**Deciders**: ruv.io Team
|
||||
**Supersedes**: None
|
||||
|
|
@ -25,12 +25,60 @@ Two existing packages would benefit from RVF integration:
|
|||
|
||||
2. **`rvlite`** -- A lightweight multi-query vector database (SQL, SPARQL, Cypher) running entirely in WASM. It uses `ruvector-core` for vectors and IndexedDB for browser persistence. A Rust adapter already exists at `crates/rvf/rvf-adapters/rvlite/` wrapping `RvfStore` as `RvliteCollection`.
|
||||
|
||||
The main gap is operational truth: what happens on crash, partial migrate, concurrent writers, browser refresh, and mixed backends. This ADR locks the invariants that keep the integration boring and durable.
|
||||
|
||||
---
|
||||
|
||||
## Key Invariants
|
||||
|
||||
### 1. Single writer rule
|
||||
|
||||
Any open store has exactly one writer lease. Node uses a file lock (`flock`). Browser uses a lock record with heartbeat in IndexedDB. Readers are unlimited. A stale lease (heartbeat older than 30 seconds) is recoverable by a new writer.
|
||||
|
||||
### 2. Crash ordering rule (rvlite hybrid mode)
|
||||
|
||||
RVF is the source of truth for vectors. IndexedDB is a rebuildable cache for metadata.
|
||||
|
||||
**Write order:**
|
||||
1. Write vectors to RVF (append-only, crash-safe)
|
||||
2. Write metadata to IndexedDB
|
||||
3. Commit a shared monotonic epoch value in both stores
|
||||
|
||||
**On startup:** Compare epochs. If RVF epoch > IndexedDB epoch, rebuild metadata from RVF. If IndexedDB epoch > RVF epoch (should not happen), log warning and trust RVF.
|
||||
|
||||
### 3. Backend selection rule
|
||||
|
||||
Explicit override beats auto detection. If user passes `--backend rvf`, do not silently fall back to `core` or `memory`. Fail loud with a clear install hint. This prevents data going to the wrong place.
|
||||
|
||||
```
|
||||
Error: @ruvector/rvf is not installed.
|
||||
Run: npm install @ruvector/rvf
|
||||
The --backend rvf flag requires this package.
|
||||
```
|
||||
|
||||
### 4. Cross-platform compatibility rule
|
||||
|
||||
Every `.rvf` file written by WASM must be readable by Node N-API and vice versa for the same RVF wire version. If a file uses features from a newer version, the header must declare it and the CLI must refuse with an upgrade path:
|
||||
|
||||
```
|
||||
Error: vectors.rvf requires RVF wire version 2, but this CLI supports version 1.
|
||||
Run: npm update @ruvector/rvf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three phases:
|
||||
|
||||
### Phase 1: npx ruvector -- Add RVF as optional dependency + CLI command group
|
||||
|
||||
**Contract:**
|
||||
- **Input**: path, dimension, vectors
|
||||
- **Output**: deterministic `.rvf` file and status metadata
|
||||
- **Failure**: missing `@ruvector/rvf` package gives error with install instruction (never silent fallback)
|
||||
- **Success metric**: hooks memory persists across process restart
|
||||
|
||||
**Changes:**
|
||||
|
||||
1. **package.json** -- Add `@ruvector/rvf` as an optional dependency:
|
||||
|
|
@ -47,6 +95,7 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
2. @ruvector/rvf (RVF store -- persistent, file-backed)
|
||||
3. Stub fallback (in-memory, testing only)
|
||||
```
|
||||
If `--backend rvf` is explicit, skip detection and fail if unavailable.
|
||||
|
||||
3. **bin/cli.js** -- Add `rvf` command group before the `mcp` command (~line 7010):
|
||||
```
|
||||
|
|
@ -60,12 +109,18 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
ruvector rvf export <path> Export store
|
||||
```
|
||||
|
||||
4. **src/core/rvf-wrapper.ts** -- Create wrapper module exposing `RvfDatabase` through the existing core interface pattern. Exports added to `src/core/index.ts`.
|
||||
4. **src/core/rvf-wrapper.ts** -- Create wrapper module exposing `RvfDatabase` through the existing core interface pattern. Must match the core interface exactly so callers are backend-agnostic. Exports added to `src/core/index.ts`.
|
||||
|
||||
5. **Hooks integration** -- Add `ruvector hooks rvf-backend` subcommand to use `.rvf` files as persistent vector memory backend for the hooks/intelligence system (replacing in-memory storage).
|
||||
5. **Hooks integration** -- Add `ruvector hooks rvf-backend` subcommand to use `.rvf` files as persistent vector memory backend. The `--backend rvf` flag requires explicit selection; recall is read-only by default.
|
||||
|
||||
### Phase 2: rvlite -- RVF as storage backend for vector data
|
||||
|
||||
**Contract:**
|
||||
- **Input**: existing rvlite database state (vectors + metadata + graphs)
|
||||
- **Output**: `.rvf` file for vectors plus IndexedDB metadata cache
|
||||
- **Failure**: crash mid-sync triggers epoch reconciliation on next open (self-healing)
|
||||
- **Success metric**: migrate tool is idempotent and safe to rerun
|
||||
|
||||
**Changes:**
|
||||
|
||||
1. **Rust crate (`crates/rvlite`)** -- Add optional `rvf-runtime` dependency behind a feature flag:
|
||||
|
|
@ -74,11 +129,13 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
default = []
|
||||
rvf-backend = ["rvf-runtime", "rvf-types"]
|
||||
```
|
||||
Default stays unchanged. No behavior change unless feature is enabled.
|
||||
|
||||
2. **Hybrid persistence model:**
|
||||
- **Vectors**: Stored in `.rvf` file via `RvliteCollection` adapter (already exists at `rvf-adapters/rvlite/`)
|
||||
- **Metadata/Graphs**: Continue using IndexedDB JSON state (SQL tables, Cypher nodes/edges, SPARQL triples)
|
||||
- **Rationale**: RVF is optimized for vector storage with SIMD-aligned slabs and HNSW indexing. Graph and relational data are better served by the existing serialization.
|
||||
- **Epoch reconciliation**: Both stores share a monotonic epoch. On startup, compare and rebuild the lagging side.
|
||||
- RVF vector IDs map directly to rvlite SQL primary keys (no internal mapping layer -- IDs are u64 in both systems).
|
||||
|
||||
3. **npm package (`npm/packages/rvlite`)** -- Add `@ruvector/rvf-wasm` as optional dependency. Extend `RvLite` TypeScript class:
|
||||
```typescript
|
||||
|
|
@ -90,13 +147,22 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
async loadFromRvf(path: string): Promise<void>
|
||||
```
|
||||
|
||||
4. **Migration utility** -- `rvlite rvf-migrate` CLI command to convert existing IndexedDB vector data into `.rvf` files.
|
||||
4. **Migration utility** -- `rvlite rvf-migrate` CLI command to convert existing IndexedDB vector data into `.rvf` files. Supports `--dry-run` and `--verify` modes. Idempotent: rerunning on an already-migrated store is a no-op.
|
||||
|
||||
5. **Rebuild command** -- `rvlite rvf-rebuild` reconstructs IndexedDB metadata from RVF when cache is missing or corrupted.
|
||||
|
||||
### Phase 3: Shared WASM backend unification
|
||||
|
||||
**Contract:**
|
||||
- **Input**: browser environment with both `ruvector` and `rvlite` installed
|
||||
- **Output**: one shared WASM engine instance resolved through a single import path
|
||||
- **Success metric**: bundle diff shows zero duplicate WASM; CI check enforces this
|
||||
|
||||
**Changes:**
|
||||
|
||||
1. **Single WASM build** -- Both `rvlite` and `ruvector` share `@ruvector/rvf-wasm` as the vector computation engine in browser environments, eliminating duplicate WASM binaries.
|
||||
|
||||
2. **MCP bridge** -- The existing `@ruvector/rvf-mcp-server` exposes all RVF operations to AI agents. Extend with rvlite-specific tools:
|
||||
2. **MCP bridge** -- The existing `@ruvector/rvf-mcp-server` exposes all RVF operations to AI agents. Extend with rvlite-specific tools (read-only by default unless `--write` flag is set):
|
||||
```
|
||||
rvlite_sql(storeId, query) Execute SQL over RVF-backed store
|
||||
rvlite_cypher(storeId, query) Execute Cypher query
|
||||
|
|
@ -108,6 +174,10 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
import { RvfDatabase } from 'ruvector';
|
||||
```
|
||||
|
||||
4. **CI duplicate check** -- Build step that fails if two copies of the WASM artifact are present in the bundle.
|
||||
|
||||
---
|
||||
|
||||
## API Mapping
|
||||
|
||||
### ruvector hooks system -> RVF
|
||||
|
|
@ -115,7 +185,7 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
| Hooks Operation | Current Implementation | RVF Equivalent |
|
||||
|----------------|----------------------|----------------|
|
||||
| `hooks remember` | In-memory vector store | `RvfDatabase.ingestBatch()` |
|
||||
| `hooks recall` | In-memory k-NN | `RvfDatabase.query()` |
|
||||
| `hooks recall` | In-memory k-NN | `RvfDatabase.query()` (read-only) |
|
||||
| `hooks export` | JSON dump | `RvfDatabase.segments()` + file copy |
|
||||
| `hooks stats` | Runtime counters | `RvfDatabase.status()` |
|
||||
|
||||
|
|
@ -142,6 +212,8 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
| `rvf_store_count` | Both | Live vector count |
|
||||
| `rvf_store_status` | ruvector | Store statistics |
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
|
@ -151,17 +223,95 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
- **Reduced bundle size** -- Sharing `@ruvector/rvf-wasm` (~46 KB) between packages eliminates duplicate vector engines.
|
||||
- **Lineage tracking** -- `RvfDatabase.derive()` brings COW branching and provenance to both packages.
|
||||
- **Cross-platform** -- RVF auto-selects N-API (Node.js) or WASM (browser) without user configuration.
|
||||
- **Self-healing** -- Epoch reconciliation means crashes never corrupt data permanently.
|
||||
|
||||
### Negative
|
||||
|
||||
- **Optional dependency complexity** -- Both packages must gracefully handle missing `@ruvector/rvf` at runtime.
|
||||
- **Dual persistence in rvlite** -- Vectors in `.rvf` files + metadata in IndexedDB adds a split-brain risk if one store is modified without the other.
|
||||
- **Dual persistence in rvlite** -- Vectors in `.rvf` files + metadata in IndexedDB adds a split-brain risk. Mitigated by epoch reconciliation and treating IndexedDB as rebuildable cache.
|
||||
- **API surface growth** -- `npx ruvector` gains 8 new CLI subcommands.
|
||||
|
||||
### Risks
|
||||
|
||||
- **IndexedDB + RVF sync** -- In rvlite's hybrid mode, crash between RVF write and IndexedDB write could leave metadata inconsistent. Mitigated by writing RVF first (append-only, crash-safe) and treating IndexedDB as rebuildable cache.
|
||||
- **WASM size budget** -- Adding RVF WASM (~46 KB) to rvlite's existing WASM bundle (~850 KB) is acceptable (<6% increase).
|
||||
| Risk | Severity | Mitigation |
|
||||
|------|----------|------------|
|
||||
| IndexedDB + RVF sync crash | High | Write RVF first (append-only, crash-safe). IndexedDB is rebuildable. Epoch reconciliation on startup. |
|
||||
| WASM size budget | Low | Adding ~46 KB to rvlite's ~850 KB bundle is <6% increase. |
|
||||
| Concurrent open in two tabs | Medium | Writer lease with heartbeat in IndexedDB. Stale lease (>30s) is recoverable. Second writer gets clear error. |
|
||||
| Version skew across packages | Medium | RVF header version gate. CI compatibility test matrix: WASM-written files must be readable by Node and vice versa. |
|
||||
| Migration data loss | Medium | Migrate tool has `--dry-run` and `--verify` modes. Idempotent. Never deletes source data. |
|
||||
|
||||
---
|
||||
|
||||
## Decision Matrix: Hybrid Persistence
|
||||
|
||||
| Criteria | Option A: Vectors in RVF, metadata in IndexedDB | Option B: Everything in IndexedDB |
|
||||
|----------|----|----|
|
||||
| **Durability** | High (RVF is append-only, crash-safe) | Medium (IndexedDB has no crash ordering guarantee) |
|
||||
| **Simplicity** | Medium (two stores, epoch sync) | High (single store) |
|
||||
| **Performance** | High (SIMD-aligned slabs, HNSW indexing) | Medium (JSON serialization) |
|
||||
| **Recoverability** | High (rebuild metadata from RVF) | Medium (no independent source of truth) |
|
||||
| **User surprise** | Medium (two persistence targets) | Low (familiar single-store model) |
|
||||
|
||||
**Decision**: Option A wins if we implement epoch reconciliation and writer leases (both specified in this ADR).
|
||||
|
||||
---
|
||||
|
||||
## Failure Modes to Test
|
||||
|
||||
| # | Scenario | Expected Behavior |
|
||||
|---|----------|-------------------|
|
||||
| 1 | Power loss during ingest | Reopen succeeds. Last committed epoch is consistent. Partial append is invisible. |
|
||||
| 2 | Crash between RVF write and metadata write | Next open reconciles by epoch. Metadata rebuilt from RVF. |
|
||||
| 3 | Two writers attempting to open same store | Second writer gets `ELOCK` error with clear message. |
|
||||
| 4 | Migration rerun on already-migrated store | No-op. No duplication. Exit code 0. |
|
||||
| 5 | Write in Node, read in browser, write, read back in Node | Top-10 nearest neighbors match within 1e-6 distance tolerance. |
|
||||
| 6 | Browser refresh during write | Writer lease expires. Next open acquires fresh lease. No corruption. |
|
||||
| 7 | Mixed RVF versions (v1 file opened by v2 reader) | Forward-compatible read succeeds. v1 file opened by v0 reader fails with upgrade hint. |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### npx ruvector (Phase 1)
|
||||
|
||||
- [ ] Add backend adapter matching existing core interface exactly
|
||||
- [ ] Add `rvf` CLI group with create, ingest, query, status, segments, derive, compact, export
|
||||
- [ ] Add hooks `--backend rvf` flag requiring explicit selection (no silent fallback)
|
||||
- [ ] Smoke test: create, ingest, query, restart process, query again -- same results
|
||||
- [ ] Error messages for missing `@ruvector/rvf` include install command
|
||||
|
||||
### rvlite (Phase 2)
|
||||
|
||||
- [ ] Feature-flag RVF backend in Rust; default stays unchanged
|
||||
- [ ] Define and implement epoch reconciliation algorithm
|
||||
- [ ] Add `rvf-migrate` command with `--dry-run` and `--verify` modes
|
||||
- [ ] Add `rvf-rebuild` command to reconstruct metadata from RVF
|
||||
- [ ] Writer lease implementation (file lock on Node, heartbeat on browser)
|
||||
- [ ] Direct ID mapping: RVF vector IDs = SQL primary keys (no mapping layer)
|
||||
|
||||
### Shared (Phase 3)
|
||||
|
||||
- [ ] Both packages import same WASM module entry point
|
||||
- [ ] CI build step fails if two copies of WASM artifact are present
|
||||
- [ ] MCP server rvlite tools are read-only by default, write requires flag
|
||||
- [ ] Cross-platform compatibility test: WASM write -> Node read -> WASM read
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Test
|
||||
|
||||
A clean machine with no prior data can:
|
||||
1. `ruvector rvf create test.rvf --dimension 384`
|
||||
2. `ruvector rvf ingest test.rvf --input vectors.json`
|
||||
3. `ruvector rvf query test.rvf --vector "..." --k 10` -- returns results
|
||||
4. Restart the process
|
||||
5. `ruvector rvf query test.rvf --vector "..." --k 10` -- same results (persistence verified)
|
||||
6. `rvlite rvf-migrate` converts an existing rvlite store
|
||||
7. Open the migrated store in a browser via WASM
|
||||
8. Top-10 nearest neighbors match Node results within 1e-6 distance tolerance
|
||||
|
||||
---
|
||||
|
||||
## Implementation Files
|
||||
|
||||
|
|
@ -170,8 +320,8 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
| File | Action |
|
||||
|------|--------|
|
||||
| `npm/packages/ruvector/package.json` | Edit -- add `@ruvector/rvf` optional dep |
|
||||
| `npm/packages/ruvector/src/index.ts` | Edit -- add RVF to platform detection |
|
||||
| `npm/packages/ruvector/src/core/rvf-wrapper.ts` | Create -- RVF wrapper module |
|
||||
| `npm/packages/ruvector/src/index.ts` | Edit -- add RVF to platform detection with explicit backend support |
|
||||
| `npm/packages/ruvector/src/core/rvf-wrapper.ts` | Create -- RVF wrapper matching core interface |
|
||||
| `npm/packages/ruvector/src/core/index.ts` | Edit -- export rvf-wrapper |
|
||||
| `npm/packages/ruvector/bin/cli.js` | Edit -- add `rvf` command group (~line 7010) |
|
||||
|
||||
|
|
@ -179,16 +329,19 @@ Integrate `@ruvector/rvf` (and its WASM backend) into both packages in three pha
|
|||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `crates/rvlite/Cargo.toml` | Edit -- add optional `rvf-runtime` dep |
|
||||
| `crates/rvlite/Cargo.toml` | Edit -- add optional `rvf-runtime` dep behind feature flag |
|
||||
| `crates/rvlite/src/lib.rs` | Edit -- add RVF backend behind feature flag |
|
||||
| `crates/rvlite/src/storage/epoch.rs` | Create -- epoch reconciliation algorithm |
|
||||
| `npm/packages/rvlite/package.json` | Edit -- add `@ruvector/rvf-wasm` optional dep |
|
||||
| `npm/packages/rvlite/src/index.ts` | Edit -- add `createWithRvf()` factory |
|
||||
| `npm/packages/rvlite/src/index.ts` | Edit -- add `createWithRvf()` factory, migrate, rebuild |
|
||||
|
||||
### Shared (Phase 3)
|
||||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `npm/packages/rvf-mcp-server/src/server.ts` | Edit -- add rvlite query tools |
|
||||
| `npm/packages/rvf-mcp-server/src/server.ts` | Edit -- add rvlite query tools (read-only default) |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
|
|
@ -208,4 +361,11 @@ cargo test -p rvlite --features rvf-backend
|
|||
# Phase 3: Shared WASM
|
||||
# Verify single @ruvector/rvf-wasm instance in node_modules
|
||||
npm ls @ruvector/rvf-wasm
|
||||
|
||||
# Failure mode tests
|
||||
cargo test --test rvf_crash_recovery
|
||||
cargo test --test rvf_writer_lease
|
||||
cargo test --test rvf_epoch_reconciliation
|
||||
cargo test --test rvf_cross_platform_compat
|
||||
cargo test --test rvf_migration_idempotent
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue