ruvector/docs/research/rvf/benchmarks/acceptance-tests.md
rUv f8870b3c71 feat(rvf): RuVector Format — Universal Cognitive Container SDK (#166)
* feat(rvf): add RuVector Format universal substrate specification

Research and design for RVF — a streaming, progressive, adaptive, quantum-secure
binary format for vector intelligence. Covers append-only segment model, two-level
tail manifests, temperature tiering, progressive HNSW indexing, epoch-based overlay
system, SIMD-optimized query paths, WASM microkernel for Cognitum tiles, domain
profiles (RVDNA, RVText, RVGraph, RVVision), and post-quantum cryptography.

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add deletion, filtered search, concurrency, and operations specs

Fill four specification gaps in the RVF format design:
- spec/07: Vector deletion lifecycle, JOURNAL_SEG wire format, deletion bitmaps
- spec/08: Filtered search with META_SEG, METAIDX_SEG, filter expression language
- spec/09: Writer locking, reader-writer coordination, versioning, space reclamation
- spec/10: Batch operations API, error codes, network streaming protocol

Also fixes the segment header field conflict between spec/01 and wire/binary-layout.md
(checksum_algo/compression now u8, adds uncompressed_len at 0x38).

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add RuVector Format SDK, 40 examples, MCP server, and documentation

Complete RVF implementation including:
- 12 Rust crates (rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant,
  rvf-crypto, rvf-runtime, rvf-import, rvf-wasm, rvf-node, rvf-server,
  plus integration tests)
- 40 runnable examples covering core storage, agentic AI, production
  patterns, vertical domains, exotic capabilities, runtime targets,
  network/security, POSIX/systems, and network operations
- TypeScript SDK (npm/packages/rvf) with RvfDatabase class
- MCP server (npm/packages/rvf-mcp-server) with stdio and SSE transports
- Node.js N-API bindings (npm/packages/rvf-node)
- WASM package (npm/packages/rvf-wasm)
- ADR-029 (canonical format), ADR-030 (computational container),
  ADR-031 (example repository)
- DNA-style lineage provenance, computational containers (KERNEL_SEG,
  EBPF_SEG), witness chains, TEE attestation, domain profiles
- Superseded ADR annotations for ADR-001, ADR-005, ADR-006, ADR-018-021

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add CLI, WASM store, generate_all, and 46 output .rvf files

- Add rvf-cli crate (665 lines, 9 subcommands: create/ingest/query/delete/status/inspect/compact/derive/serve)
- Add WASM control plane store (alloc_setup, segment, store modules) for ~46 KB binary
- Add generate_all.rs example producing 46 persistent .rvf files in output/
- Add Node.js N-API bindings for lineage, kernel/eBPF, and inspection
- Add npm TypeScript backend/database/types for RVF integration
- Update READMEs with CLI sections, MCP server docs, and crate map (13 crates)
- All 40 examples verified passing

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add Claude Code appliance, improve Quick Start, fix API docs

- Add claude_code_appliance.rs: self-booting RVF with SSH + Claude Code
  install (curl -fsSL https://claude.ai/install.sh | bash), 3 SSH users,
  eBPF filter, 20-package manifest, witness chain, lineage snapshot
- Improve Quick Start: Install section (crate/CLI/npm/WASM/MCP), WASM
  browser example, generate_all reference, expanded Rust crate deps
- Fix embed_kernel/embed_ebpf API docs to match actual signatures
  (u8 params with `as u8` cast, 6-param kernel, Option<&[u8]> btf)
- Update generate_all.rs: add claude_code_appliance generator (47 files)
- Regenerate all 47 output .rvf files

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add RVCOW branching, real kernel/eBPF/launcher, 795 tests

Vector-native copy-on-write branching (ADR-031) with four new segment
types (COW_MAP 0x20, REFCOUNT 0x21, MEMBERSHIP 0x22, DELTA 0x23),
real Linux microkernel builder, QEMU microVM launcher, real eBPF
programs, and 128-byte KernelBinding for tamper-evident kernel-manifest
linkage.

New crates:
- rvf-kernel: Docker-based kernel build, real cpio/newc initramfs builder,
  SHA3-256 verification, prebuilt kernel support (37 tests)
- rvf-launch: QEMU microVM launcher with QMP shutdown, KVM/TCG detection,
  virtio-blk/net port forwarding, kernel extraction (8 tests)
- rvf-ebpf: 3 real BPF C programs (xdp_distance, socket_filter,
  tc_query_route) with clang compilation support (17 tests)

RVCOW runtime:
- CowEngine with read/write paths, write coalescing, snapshot-freeze
- CowMap (flat-array), MembershipFilter (bitmap), CowCompactor
- 3x read performance via pread optimization (1.3us/vector)
- Branch creation: 2.6ms for 10K vectors, child = 162 bytes

Security: 20-finding audit, 7 fixes applied including division-by-zero
guards, integer overflow checks, and KernelBinding::from_bytes_validated().

CLI: 8 new commands (launch, embed-kernel, embed-ebpf, filter, freeze,
verify-witness, verify-attestation, rebuild-refcounts), serve wired to
real rvf-server.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): update README, add crate/npm READMEs, publish to crates.io and npm

- Rewrite README with cognitive container terminology, grouped features,
  4 comparison tables (vs Docker, Vector DBs, Git LFS, SQLite), updated
  benchmarks, architecture diagram, and 45 examples
- Add READMEs for rvf-kernel, rvf-launch, rvf-ebpf, rvf-import crates
- Add READMEs for @ruvector/rvf, rvf-node, rvf-wasm, rvf-mcp-server npm packages
- Fix Cargo.toml metadata (homepage, readme, categories, keywords) and
  add version specs to all path dependencies for crates.io publishing
- Fix clippy warnings in rvf-kernel/initramfs.rs and rvf-launch/lib.rs
- Published to crates.io: rvf-types, rvf-wire, rvf-manifest, rvf-quant,
  rvf-index, rvf-crypto (remaining crates pending rate limit)
- Published to npm: @ruvector/rvf, @ruvector/rvf-node, @ruvector/rvf-wasm,
  @ruvector/rvf-mcp-server

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: add rvf-kernel, rvf-ebpf, rvf-launch, rvf-server, rvf-import, rvf-cli to workspace

Include all 15 RVF crates plus integration tests and benchmarks in the
root workspace members list so cargo publish can resolve them by name.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add published packages, cognitive container branding, grouped capabilities

- Add Published Packages section with 13 crates.io + 4 npm tables
- Add Platform Support table (Linux, macOS, Windows, WASM, no_std)
- Expand capability table from 9 to 15 rows in 4 groups
- Rewrite all "How" descriptions in plain language
- Update .rvf diagram to show all 20 segment types
- Rename ADRs: computational container -> cognitive container
- Add emojis to all section headers

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: update root README with RVF cognitive containers, expanded capabilities

- Update intro: "gets smarter + ships as cognitive container"
- Add self-booting microservice row to Pinecone comparison table
- Expand capabilities from 34 to 42 features with dedicated RVF section
- Update "Think of it as" to include Docker comparison and RVF explanation
- Add RVF collapsed group to Ecosystem (13 crates, 4 npm, install commands)
- Add RVF to Platform & Edge section with install commands
- Add RVF npm packages (4) and Rust crates (13) to package reference
- Add RVF rows to feature comparison table (6 new rows)
- Add ADR-030/031 to ADR list
- Add RVF to Installation table, Project Structure
- Update attention mechanisms count from 39 to 40+
- Update npm count to 49+, Rust crates to 83
- Update footer with crates.io and RVF links

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: expand comparison table with emojis, cost, audit, branching, single-file

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: rewrite comparison table in plain language

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: clean up empty code change sections in the changes log

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-14 13:14:49 -05:00

9.8 KiB

RVF Acceptance Tests and Performance Targets

1. Primary Acceptance Test

Cold start on a 10 million vector file: load and answer the first query with a useful result (recall@10 >= 0.70) without reading more than the last 4 MB, then converge to full quality (recall@10 >= 0.95) as it progressively maps more segments.

Test Parameters

Dataset:         10 million vectors
Dimensions:      384 (sentence embedding size)
Base dtype:      fp16 (768 bytes per vector)
Raw file size:   ~7.2 GB (vectors only)
With index:      ~10-12 GB total
Query set:       1000 queries from held-out test set
Ground truth:    Brute-force exact k-NN (k=10)
Metric:          L2 distance

Success Criteria

Phase Time Budget Data Read Min Recall@10 Description
Boot < 5 ms 4 KB (Level 0) N/A Parse root manifest
First query < 50 ms <= 4 MB >= 0.70 Layer A + hot cache
Working quality < 500 ms <= 200 MB >= 0.85 Layer A + B
Full quality < 5 s <= 4 GB >= 0.95 Layers A + B + C
Optimized < 30 s Full file >= 0.98 All layers + hot tier

Measurement Methodology

1. Create RVF file from 10M vector dataset
   - Build full HNSW index (M=16, ef_construction=200)
   - Compute temperature tiers (default: all warm initially)
   - Write with all segment types

2. Cold start measurement
   - Drop filesystem cache: echo 3 > /proc/sys/vm/drop_caches
   - Open file, start timer
   - Read Level 0 (4 KB), record time T_boot
   - Read hotset data, record time T_hotset
   - Execute first query, record time T_first_query and recall@10
   - Continue progressive loading
   - At each milestone: record time, data read, recall@10

3. Throughput measurement (warm)
   - After full load, execute 1000 queries
   - Measure queries per second (QPS)
   - Measure p50, p95, p99 latency
   - Measure recall@10 average

4. Streaming ingest measurement
   - Start with empty file
   - Ingest 10M vectors in streaming mode
   - Measure ingest rate (vectors/second)
   - Measure file size over time
   - Verify crash safety (kill -9 at random points, verify recovery)

2. Performance Targets

Query Latency (10M vectors, 384 dim, fp16)

Hardware QPS (single thread) p50 Latency p95 Latency p99 Latency
Desktop (AVX-512) 5,000-15,000 0.1 ms 0.3 ms 1.0 ms
Desktop (AVX2) 3,000-8,000 0.2 ms 0.5 ms 2.0 ms
Laptop (NEON) 2,000-5,000 0.3 ms 1.0 ms 3.0 ms
WASM (browser) 500-2,000 1.0 ms 3.0 ms 10.0 ms
Cognitum tile 100-500 2.0 ms 5.0 ms 15.0 ms

Streaming Ingest Rate

Hardware Vectors/Second Bytes/Second Notes
NVMe SSD 200K-500K 150-380 MB/s fsync every 1000 vectors
SATA SSD 50K-100K 38-76 MB/s fsync every 1000 vectors
HDD 10K-30K 7-23 MB/s Sequential append
Network (1 Gbps) 50K-100K 38-76 MB/s Streaming over network

Progressive Load Times

Phase NVMe SSD SATA SSD HDD Network
Boot (4 KB) < 0.1 ms < 0.5 ms < 10 ms < 50 ms
First query (4 MB) < 2 ms < 10 ms < 100 ms < 500 ms
Working quality (200 MB) < 100 ms < 500 ms < 5 s < 20 s
Full quality (4 GB) < 2 s < 10 s < 120 s < 400 s

Space Efficiency

Configuration Bytes/Vector File Size (10M) Ratio vs Raw
Raw fp32 1,536 14.3 GB 1.0x
RVF uniform fp16 768 + overhead 8.0 GB 0.56x
RVF adaptive (equilibrium) ~300 avg 3.2 GB 0.22x
RVF aggressive (binary cold) ~100 avg 1.1 GB 0.08x

3. Crash Safety Tests

Test 1: Kill During Vector Ingest

1. Start ingesting 1M vectors
2. After 500K vectors: kill -9 the writer
3. Verify: file is readable
4. Verify: latest valid manifest is found
5. Verify: all vectors referenced by latest manifest are intact
6. Verify: no data corruption (all segment hashes valid)

Pass criteria: Zero data loss for committed segments. At most the last incomplete segment is lost (bounded by fsync interval).

Test 2: Kill During Manifest Write

1. Create file with 1M vectors
2. Trigger manifest rewrite (add metadata, trigger compaction)
3. Kill -9 during manifest write
4. Verify: file falls back to previous valid manifest
5. Verify: all queries work correctly with previous manifest

Pass criteria: Automatic fallback to previous manifest. No manual recovery needed.

Test 3: Kill During Compaction

1. Create file with 1M vectors across 100 small VEC_SEGs
2. Trigger compaction
3. Kill -9 during compaction
4. Verify: file is readable (old segments still valid)
5. Verify: partial compaction output is safely ignored

Pass criteria: Old segments remain valid. Incomplete compaction output has no manifest reference and is safely orphaned.

Test 4: Bit Flip Detection

1. Create valid RVF file
2. Flip random bits in various locations
3. Verify: corruption detected by hash/CRC checks
4. Verify: specific corrupted segment identified
5. Verify: other segments still readable

Pass criteria: 100% detection of single-bit flips. Corruption isolated to affected segment.

4. Scalability Tests

Test: 1 Billion Vectors

Dataset:     1B vectors, 384 dimensions, fp16
File size:   ~700 GB (raw) -> ~200 GB (adaptive RVF)
Hardware:    Server with 256 GB RAM, NVMe array

Verify:
  - Boot time < 10 ms
  - First query < 100 ms
  - Full quality convergence < 60 s
  - Recall@10 >= 0.95 at full quality
  - Streaming ingest sustained at 100K+ vectors/second

Test: High Dimensionality

Dataset:     1M vectors, 4096 dimensions (LLM embeddings)
File size:   ~8 GB (fp16)

Verify:
  - PQ compression to 5-bit achieves >= 10x compression
  - Recall@10 >= 0.90 with PQ
  - Query latency < 5 ms (p95) with PQ + HNSW

Test: Multi-File Sharding

Dataset:     100M vectors across 10 shard files
Verify:
  - Transparent query across all shards
  - Shard addition without full rebuild
  - Individual shard compaction
  - Shard removal with manifest update only

5. WASM Performance Tests

Browser Environment

Runtime:     Chrome V8 / Firefox SpiderMonkey
SIMD:        WASM v128
Memory:      Limited to 4 GB WASM heap

Test: Load 1M vector RVF file via fetch()
  - Boot time < 50 ms
  - First query < 200 ms (after boot)
  - QPS >= 500 (single thread)
  - Memory usage < 500 MB

Cognitum Tile Simulation

Runtime:     wasmtime with memory limits
Code limit:  8 KB
Data limit:  8 KB
Scratch:     64 KB

Test: Process 1000 blocks via hub protocol
  - Distance computation matches reference implementation
  - Top-K results match brute-force within quantization tolerance
  - No memory access out of bounds
  - Tile recovers from simulated faults

6. Interoperability Tests

Round-Trip Test

1. Create RVF file from numpy arrays
2. Read back with independent implementation
3. Verify: all vectors bit-identical
4. Verify: all metadata preserved
5. Verify: index produces same results

Profile Compatibility Test

1. Create RVDNA file with genomic data
2. Create RVText file with text embeddings
3. Read both with generic RVF reader
4. Verify: generic reader can access vectors and metadata
5. Verify: profile-specific features degrade gracefully

Version Forward Compatibility Test

1. Create RVF file with version 1
2. Add segments with hypothetical version 2 features (unknown tags)
3. Read with version 1 reader
4. Verify: version 1 reader skips unknown segments/tags
5. Verify: version 1 data is fully accessible

7. Security Tests

Signature Verification

1. Create signed RVF file (ML-DSA-65)
2. Verify all segment signatures
3. Modify one byte in a signed segment
4. Verify: modification detected
5. Verify: other segments still valid

Encryption Round-Trip

1. Create encrypted RVF file (ML-KEM-768 + AES-256-GCM)
2. Decrypt with correct key
3. Verify: plaintext matches original
4. Attempt decrypt with wrong key
5. Verify: decryption fails (GCM auth tag mismatch)

Key Rotation

1. Create file signed with key A
2. Rotate to key B (write CRYPTO_SEG rotation record)
3. Write new segments signed with key B
4. Verify: old segments valid with key A
5. Verify: new segments valid with key B
6. Verify: cross-signature in rotation record is valid

8. Benchmark Harness

Purpose Tool Notes
Latency measurement criterion (Rust) / benchmark.js Statistical rigor
Recall measurement Custom recall@K computation Against brute-force ground truth
Memory profiling valgrind massif / Chrome DevTools Peak and sustained
I/O profiling blktrace / iostat Verify read patterns
SIMD verification Intel SDE / ARM emulator Correct SIMD codegen
Crash testing Custom harness with kill -9 Random timing

Report Format

Each benchmark run produces a report:

{
  "test_name": "cold_start_10m",
  "dataset": {
    "vector_count": 10000000,
    "dimensions": 384,
    "dtype": "fp16",
    "file_size_bytes": 10737418240
  },
  "hardware": {
    "cpu": "Intel Xeon w5-3435X",
    "simd": "AVX-512",
    "ram_gb": 256,
    "storage": "NVMe Samsung 990 Pro"
  },
  "results": {
    "boot_ms": 0.08,
    "first_query_ms": 12.3,
    "first_query_recall_at_10": 0.73,
    "working_quality_ms": 340,
    "working_quality_recall_at_10": 0.87,
    "full_quality_ms": 3200,
    "full_quality_recall_at_10": 0.96,
    "steady_state_qps": 8500,
    "steady_state_p50_ms": 0.12,
    "steady_state_p95_ms": 0.28,
    "steady_state_p99_ms": 0.85,
    "data_read_first_query_mb": 3.2,
    "data_read_working_quality_mb": 180
  }
}