ruvector/docs/research/rvf/microkernel/wasm-runtime.md
rUv f8870b3c71 feat(rvf): RuVector Format — Universal Cognitive Container SDK (#166)
* feat(rvf): add RuVector Format universal substrate specification

Research and design for RVF — a streaming, progressive, adaptive, quantum-secure
binary format for vector intelligence. Covers append-only segment model, two-level
tail manifests, temperature tiering, progressive HNSW indexing, epoch-based overlay
system, SIMD-optimized query paths, WASM microkernel for Cognitum tiles, domain
profiles (RVDNA, RVText, RVGraph, RVVision), and post-quantum cryptography.

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add deletion, filtered search, concurrency, and operations specs

Fill four specification gaps in the RVF format design:
- spec/07: Vector deletion lifecycle, JOURNAL_SEG wire format, deletion bitmaps
- spec/08: Filtered search with META_SEG, METAIDX_SEG, filter expression language
- spec/09: Writer locking, reader-writer coordination, versioning, space reclamation
- spec/10: Batch operations API, error codes, network streaming protocol

Also fixes the segment header field conflict between spec/01 and wire/binary-layout.md
(checksum_algo/compression now u8, adds uncompressed_len at 0x38).

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add RuVector Format SDK, 40 examples, MCP server, and documentation

Complete RVF implementation including:
- 12 Rust crates (rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant,
  rvf-crypto, rvf-runtime, rvf-import, rvf-wasm, rvf-node, rvf-server,
  plus integration tests)
- 40 runnable examples covering core storage, agentic AI, production
  patterns, vertical domains, exotic capabilities, runtime targets,
  network/security, POSIX/systems, and network operations
- TypeScript SDK (npm/packages/rvf) with RvfDatabase class
- MCP server (npm/packages/rvf-mcp-server) with stdio and SSE transports
- Node.js N-API bindings (npm/packages/rvf-node)
- WASM package (npm/packages/rvf-wasm)
- ADR-029 (canonical format), ADR-030 (computational container),
  ADR-031 (example repository)
- DNA-style lineage provenance, computational containers (KERNEL_SEG,
  EBPF_SEG), witness chains, TEE attestation, domain profiles
- Superseded ADR annotations for ADR-001, ADR-005, ADR-006, ADR-018-021

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add CLI, WASM store, generate_all, and 46 output .rvf files

- Add rvf-cli crate (665 lines, 9 subcommands: create/ingest/query/delete/status/inspect/compact/derive/serve)
- Add WASM control plane store (alloc_setup, segment, store modules) for ~46 KB binary
- Add generate_all.rs example producing 46 persistent .rvf files in output/
- Add Node.js N-API bindings for lineage, kernel/eBPF, and inspection
- Add npm TypeScript backend/database/types for RVF integration
- Update READMEs with CLI sections, MCP server docs, and crate map (13 crates)
- All 40 examples verified passing

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add Claude Code appliance, improve Quick Start, fix API docs

- Add claude_code_appliance.rs: self-booting RVF with SSH + Claude Code
  install (curl -fsSL https://claude.ai/install.sh | bash), 3 SSH users,
  eBPF filter, 20-package manifest, witness chain, lineage snapshot
- Improve Quick Start: Install section (crate/CLI/npm/WASM/MCP), WASM
  browser example, generate_all reference, expanded Rust crate deps
- Fix embed_kernel/embed_ebpf API docs to match actual signatures
  (u8 params with `as u8` cast, 6-param kernel, Option<&[u8]> btf)
- Update generate_all.rs: add claude_code_appliance generator (47 files)
- Regenerate all 47 output .rvf files

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add RVCOW branching, real kernel/eBPF/launcher, 795 tests

Vector-native copy-on-write branching (ADR-031) with four new segment
types (COW_MAP 0x20, REFCOUNT 0x21, MEMBERSHIP 0x22, DELTA 0x23),
real Linux microkernel builder, QEMU microVM launcher, real eBPF
programs, and 128-byte KernelBinding for tamper-evident kernel-manifest
linkage.

New crates:
- rvf-kernel: Docker-based kernel build, real cpio/newc initramfs builder,
  SHA3-256 verification, prebuilt kernel support (37 tests)
- rvf-launch: QEMU microVM launcher with QMP shutdown, KVM/TCG detection,
  virtio-blk/net port forwarding, kernel extraction (8 tests)
- rvf-ebpf: 3 real BPF C programs (xdp_distance, socket_filter,
  tc_query_route) with clang compilation support (17 tests)

RVCOW runtime:
- CowEngine with read/write paths, write coalescing, snapshot-freeze
- CowMap (flat-array), MembershipFilter (bitmap), CowCompactor
- 3x read performance via pread optimization (1.3us/vector)
- Branch creation: 2.6ms for 10K vectors, child = 162 bytes

Security: 20-finding audit, 7 fixes applied including division-by-zero
guards, integer overflow checks, and KernelBinding::from_bytes_validated().

CLI: 8 new commands (launch, embed-kernel, embed-ebpf, filter, freeze,
verify-witness, verify-attestation, rebuild-refcounts), serve wired to
real rvf-server.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): update README, add crate/npm READMEs, publish to crates.io and npm

- Rewrite README with cognitive container terminology, grouped features,
  4 comparison tables (vs Docker, Vector DBs, Git LFS, SQLite), updated
  benchmarks, architecture diagram, and 45 examples
- Add READMEs for rvf-kernel, rvf-launch, rvf-ebpf, rvf-import crates
- Add READMEs for @ruvector/rvf, rvf-node, rvf-wasm, rvf-mcp-server npm packages
- Fix Cargo.toml metadata (homepage, readme, categories, keywords) and
  add version specs to all path dependencies for crates.io publishing
- Fix clippy warnings in rvf-kernel/initramfs.rs and rvf-launch/lib.rs
- Published to crates.io: rvf-types, rvf-wire, rvf-manifest, rvf-quant,
  rvf-index, rvf-crypto (remaining crates pending rate limit)
- Published to npm: @ruvector/rvf, @ruvector/rvf-node, @ruvector/rvf-wasm,
  @ruvector/rvf-mcp-server

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: add rvf-kernel, rvf-ebpf, rvf-launch, rvf-server, rvf-import, rvf-cli to workspace

Include all 15 RVF crates plus integration tests and benchmarks in the
root workspace members list so cargo publish can resolve them by name.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add published packages, cognitive container branding, grouped capabilities

- Add Published Packages section with 13 crates.io + 4 npm tables
- Add Platform Support table (Linux, macOS, Windows, WASM, no_std)
- Expand capability table from 9 to 15 rows in 4 groups
- Rewrite all "How" descriptions in plain language
- Update .rvf diagram to show all 20 segment types
- Rename ADRs: computational container -> cognitive container
- Add emojis to all section headers

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: update root README with RVF cognitive containers, expanded capabilities

- Update intro: "gets smarter + ships as cognitive container"
- Add self-booting microservice row to Pinecone comparison table
- Expand capabilities from 34 to 42 features with dedicated RVF section
- Update "Think of it as" to include Docker comparison and RVF explanation
- Add RVF collapsed group to Ecosystem (13 crates, 4 npm, install commands)
- Add RVF to Platform & Edge section with install commands
- Add RVF npm packages (4) and Rust crates (13) to package reference
- Add RVF rows to feature comparison table (6 new rows)
- Add ADR-030/031 to ADR list
- Add RVF to Installation table, Project Structure
- Update attention mechanisms count from 39 to 40+
- Update npm count to 49+, Rust crates to 83
- Update footer with crates.io and RVF links

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: expand comparison table with emojis, cost, audit, branching, single-file

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: rewrite comparison table in plain language

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: clean up empty code change sections in the changes log

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-14 13:14:49 -05:00

13 KiB

RVF WASM Microkernel and Cognitum Hardware Mapping

1. Design Philosophy

RVF must run on hardware ranging from a 64 KB WASM tile to a petabyte cluster. The WASM microkernel is the minimal runtime that makes a tile a first-class RVF citizen — capable of answering queries, ingesting streams, and participating in distributed search.

The microkernel is not a shrunken version of the full runtime. It is a purpose-built execution core that exposes the exact set of operations a tile needs, and nothing more.

2. Cognitum Tile Architecture

Hardware Constraints

+-----------------------------------+
| Cognitum Tile                     |
|                                   |
|  Code Memory:    8 KB             |
|  Data Memory:    8 KB             |
|  SIMD Scratch:   64 KB            |
|  Registers:      v128 (WASM SIMD) |
|  Clock:          ~1 GHz           |
|  Interconnect:   Mesh to hub      |
|                                   |
|  No filesystem. No mmap.          |
|  No allocator beyond scratch.     |
|  All I/O through hub messages.    |
+-----------------------------------+

Memory Map

Code (8 KB):
  0x0000 - 0x0FFF   Microkernel WASM bytecode (4 KB)
  0x1000 - 0x17FF   Distance function hot path (2 KB)
  0x1800 - 0x1FFF   Decode / quantization stubs (2 KB)

Data (8 KB):
  0x0000 - 0x003F   Tile configuration (64 B)
  0x0040 - 0x00FF   Query scratch (192 B: query vector fp16)
  0x0100 - 0x01FF   Result buffer (256 B: top-K candidates)
  0x0200 - 0x03FF   Routing table (512 B: entry points + centroids)
  0x0400 - 0x07FF   Decode workspace (1 KB)
  0x0800 - 0x0FFF   Message I/O buffer (2 KB)
  0x1000 - 0x1FFF   Neighbor list cache (4 KB)

SIMD Scratch (64 KB):
  0x0000 - 0x7FFF   Vector block (up to 85 vectors @ 384-dim fp16)
  0x8000 - 0xBFFF   Distance accumulator / PQ tables (16 KB)
  0xC000 - 0xEFFF   Hot cache subset (12 KB)
  0xF000 - 0xFFFF   Temporary / spill (4 KB)

Tile Budget

For 384-dim fp16 vectors:

  • One vector: 768 bytes
  • SIMD scratch holds: 64 KB / 768 = ~85 vectors
  • Top-K result buffer: 16 candidates * 16 B = 256 B
  • Query vector: 768 B

A tile can process one block of ~85 vectors per cycle, computing distances and maintaining a top-K heap entirely within scratch memory.

3. Microkernel Exports

The WASM microkernel exports exactly these functions:

;; === Core Query Path ===

;; Initialize tile with configuration
;; config_ptr: pointer to 64B tile config in data memory
(export "rvf_init" (func $rvf_init (param $config_ptr i32) (result i32)))

;; Load query vector into query scratch
;; query_ptr: pointer to fp16 vector in data memory
;; dim: vector dimensionality
(export "rvf_load_query" (func $rvf_load_query
    (param $query_ptr i32) (param $dim i32) (result i32)))

;; Load a block of vectors into SIMD scratch
;; block_ptr: pointer to vector block in SIMD scratch
;; count: number of vectors
;; dtype: data type enum
(export "rvf_load_block" (func $rvf_load_block
    (param $block_ptr i32) (param $count i32)
    (param $dtype i32) (result i32)))

;; Compute distances between query and loaded block
;; metric: 0=L2, 1=IP, 2=cosine, 3=hamming
;; result_ptr: pointer to write distances
(export "rvf_distances" (func $rvf_distances
    (param $metric i32) (param $result_ptr i32) (result i32)))

;; Merge distances into top-K heap
;; dist_ptr: pointer to distance array
;; id_ptr: pointer to vector ID array
;; count: number of candidates
;; k: top-K to maintain
(export "rvf_topk_merge" (func $rvf_topk_merge
    (param $dist_ptr i32) (param $id_ptr i32)
    (param $count i32) (param $k i32) (result i32)))

;; Read current top-K results
;; out_ptr: pointer to write results (id, distance pairs)
(export "rvf_topk_read" (func $rvf_topk_read
    (param $out_ptr i32) (result i32)))

;; === Quantization ===

;; Load scalar quantization parameters (min/max per dim)
(export "rvf_load_sq_params" (func $rvf_load_sq_params
    (param $params_ptr i32) (param $dim i32) (result i32)))

;; Dequantize int8 block to fp16 in SIMD scratch
(export "rvf_dequant_i8" (func $rvf_dequant_i8
    (param $src_ptr i32) (param $dst_ptr i32)
    (param $count i32) (result i32)))

;; Load PQ codebook subset
(export "rvf_load_pq_codebook" (func $rvf_load_pq_codebook
    (param $codebook_ptr i32) (param $M i32)
    (param $K i32) (result i32)))

;; Compute PQ asymmetric distances
(export "rvf_pq_distances" (func $rvf_pq_distances
    (param $codes_ptr i32) (param $count i32)
    (param $result_ptr i32) (result i32)))

;; === HNSW Navigation ===

;; Load neighbor list for a node
(export "rvf_load_neighbors" (func $rvf_load_neighbors
    (param $node_id i64) (param $layer i32)
    (param $out_ptr i32) (result i32)))

;; Greedy search step: given current node, find nearest neighbor
(export "rvf_greedy_step" (func $rvf_greedy_step
    (param $current_id i64) (param $layer i32) (result i64)))

;; === Segment Verification ===

;; Verify segment header hash
(export "rvf_verify_header" (func $rvf_verify_header
    (param $header_ptr i32) (result i32)))

;; Compute CRC32C of a data region
(export "rvf_crc32c" (func $rvf_crc32c
    (param $data_ptr i32) (param $len i32) (result i32)))

Export Count

14 exports. Each maps to a tight inner loop that fits in the 8 KB code budget. The host (hub) is responsible for all I/O, segment parsing, and orchestration.

4. Host-Tile Protocol

Communication between the hub and tile uses fixed-size messages through the 2 KB I/O buffer:

Message Format

Offset  Size  Field        Description
------  ----  -----        -----------
0x00    2     msg_type     Message type enum
0x02    2     msg_length   Payload length
0x04    4     msg_id       Correlation ID
0x08    var   payload      Type-specific payload

Message Types

Hub -> Tile:
  0x01  LOAD_QUERY       Send query vector (768 B for 384-dim fp16)
  0x02  LOAD_BLOCK       Send vector block (up to ~1.5 KB compressed)
  0x03  LOAD_NEIGHBORS   Send neighbor list for a node
  0x04  LOAD_PARAMS      Send quantization parameters
  0x05  COMPUTE          Trigger distance computation
  0x06  READ_TOPK        Request current top-K results
  0x07  RESET            Clear tile state for new query

Tile -> Hub:
  0x81  TOPK_RESULT      Top-K results (id, distance pairs)
  0x82  NEED_BLOCK       Request a specific vector block
  0x83  NEED_NEIGHBORS   Request neighbor list for a node
  0x84  DONE             Computation complete
  0x85  ERROR            Error with code

Execution Flow

Hub                                 Tile
 |                                    |
 |--- LOAD_QUERY (768B) ------------>|
 |                                    | rvf_load_query()
 |--- LOAD_PARAMS (SQ params) ------>|
 |                                    | rvf_load_sq_params()
 |--- LOAD_BLOCK (block 0) -------->|
 |                                    | rvf_load_block()
 |                                    | rvf_distances()
 |                                    | rvf_topk_merge()
 |--- LOAD_BLOCK (block 1) -------->|
 |                                    | rvf_load_block()
 |                                    | rvf_distances()
 |                                    | rvf_topk_merge()
 |    ...                             |
 |--- READ_TOPK -------------------->|
 |                                    | rvf_topk_read()
 |<--- TOPK_RESULT ------------------|
 |                                    |

Pull Mode

For HNSW search, the tile drives the traversal:

Hub                                 Tile
 |                                    |
 |--- LOAD_QUERY -------------------->|
 |--- LOAD_NEIGHBORS (entry point) -->|
 |                                    | rvf_greedy_step()
 |<--- NEED_NEIGHBORS (next node) ----|
 |--- LOAD_NEIGHBORS (next node) ---->|
 |                                    | rvf_greedy_step()
 |<--- NEED_BLOCK (for candidate) ----|
 |--- LOAD_BLOCK -------------------->|
 |                                    | rvf_distances()
 |                                    | rvf_topk_merge()
 |<--- DONE ----------------------------|
 |--- READ_TOPK --------------------->|
 |<--- TOPK_RESULT ------------------|

5. Three Hardware Profiles

RVF Core Profile (Tile)

Target:         Cognitum tile (8KB + 8KB + 64KB)
Features:       Distance compute, top-K, SQ dequant, CRC32C verify
Max vectors:    ~85 per block load
Max dimensions: 384 (fp16) or 768 (i8)
Index:          None (hub routes, tile computes)
Streaming:      Receive blocks from hub
Quantization:   i8 scalar only (no PQ on tile)
Compression:    None (hub decompresses before sending)

RVF Hot Profile (Chip)

Target:         Cognitum chip (multiple tiles + shared memory)
Features:       Core + PQ distance, HNSW navigation, parallel tiles
Max vectors:    Limited by shared memory (~10K in shared cache)
Max dimensions: 1024
Index:          Layer A in shared memory
Streaming:      Block streaming across tiles
Quantization:   i8 scalar + PQ (6-bit)
Compression:    LZ4 decompress in shared memory

RVF Full Profile (Hub/Desktop)

Target:         Desktop CPU, server, hub controller
Features:       All features, all segment types, all quantization
Max vectors:    Billions (limited by storage)
Max dimensions: Unlimited
Index:          Full HNSW (Layers A + B + C)
Streaming:      Full append-only segment model
Quantization:   All tiers (fp16, i8, PQ, binary)
Compression:    All (LZ4, ZSTD, custom)
Crypto:         Full (ML-DSA-65 signatures, SHAKE-256)
Temperature:    Full adaptive tiering
Overlay:        Full epoch model with compaction

Profile Detection

The root manifest's profile_id field declares the minimum profile needed:

0x00    generic     Requires Full Profile features
0x01    core        Fully usable with Core Profile
0x02    hot         Requires Hot Profile minimum
0x03    full        Requires Full Profile

A Full Profile reader can always read Core or Hot files. A Core Profile reader rejects Full Profile files but can read Core files. Hot Profile readers can read Core and Hot files.

6. SIMD Strategy by Platform

WASM v128 (Tile/Browser)

;; L2 distance: fp16 vectors, 384 dimensions
;; Process 8 fp16 values per v128 operation

(func $l2_fp16_384 (param $a_ptr i32) (param $b_ptr i32) (result f32)
    (local $acc v128)
    (local $i i32)
    (local.set $acc (v128.const i64x2 0 0))
    (local.set $i (i32.const 0))

    (block $done
        (loop $loop
            ;; Load 8 fp16 values, widen to f32x4 pairs
            ;; Subtract, square, accumulate
            ;; ... (8 values per iteration, 48 iterations for 384 dims)

            (br_if $done (i32.ge_u (local.get $i) (i32.const 384)))
            (br $loop)
        )
    )
    ;; Horizontal sum of accumulator
    ;; Return L2 distance
)

AVX-512 (Desktop/Server)

; Process 32 fp16 values per cycle with VCVTPH2PS + VFMADD231PS
; 384 dims = 12 iterations of 32 values
; ~12 cycles per distance computation

ARM NEON (Mobile/Edge)

; Process 8 fp16 values per cycle with FMLA
; 384 dims = 48 iterations of 8 values
; ~48 cycles per distance computation

7. Microkernel Size Budget

Function                    Estimated Size
--------                    --------------
rvf_init                    128 B
rvf_load_query              64 B
rvf_load_block              256 B
rvf_distances (L2 fp16)     512 B
rvf_distances (L2 i8)       384 B
rvf_distances (IP fp16)     512 B
rvf_distances (hamming)     256 B
rvf_topk_merge              384 B
rvf_topk_read               64 B
rvf_load_sq_params          64 B
rvf_dequant_i8              256 B
rvf_load_pq_codebook        128 B
rvf_pq_distances            512 B
rvf_load_neighbors          128 B
rvf_greedy_step             512 B
rvf_verify_header           128 B
rvf_crc32c                  256 B
Message dispatch loop       384 B
Utility functions           256 B
WASM overhead               512 B
                            ----------
Total                       ~5,500 B (< 8 KB code budget)

Remaining ~2.5 KB of code space is available for domain-specific extensions (e.g., codon distance for RVDNA profile, token overlap for RVText profile).

8. Fault Isolation

Each tile runs in a WASM sandbox. A tile cannot:

  • Access hub memory directly
  • Communicate with other tiles except through the hub
  • Allocate memory beyond its 8 KB data + 64 KB scratch
  • Execute code beyond its 8 KB code space
  • Trap without the hub catching and recovering

If a tile traps (out-of-bounds, unreachable, stack overflow):

  1. Hub catches the trap
  2. Hub marks tile as faulted
  3. Hub reassigns the tile's work to another tile (or processes locally)
  4. Hub optionally restarts the faulted tile with fresh state

This makes the system resilient to individual tile failures — important for large tile arrays where hardware faults are inevitable.