mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-22 19:56:25 +00:00

rUv f8870b3c71 feat(rvf): RuVector Format — Universal Cognitive Container SDK (#166 )

* feat(rvf): add RuVector Format universal substrate specification

Research and design for RVF — a streaming, progressive, adaptive, quantum-secure
binary format for vector intelligence. Covers append-only segment model, two-level
tail manifests, temperature tiering, progressive HNSW indexing, epoch-based overlay
system, SIMD-optimized query paths, WASM microkernel for Cognitum tiles, domain
profiles (RVDNA, RVText, RVGraph, RVVision), and post-quantum cryptography.

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add deletion, filtered search, concurrency, and operations specs

Fill four specification gaps in the RVF format design:
- spec/07: Vector deletion lifecycle, JOURNAL_SEG wire format, deletion bitmaps
- spec/08: Filtered search with META_SEG, METAIDX_SEG, filter expression language
- spec/09: Writer locking, reader-writer coordination, versioning, space reclamation
- spec/10: Batch operations API, error codes, network streaming protocol

Also fixes the segment header field conflict between spec/01 and wire/binary-layout.md
(checksum_algo/compression now u8, adds uncompressed_len at 0x38).

https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW

* feat(rvf): add RuVector Format SDK, 40 examples, MCP server, and documentation

Complete RVF implementation including:
- 12 Rust crates (rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant,
  rvf-crypto, rvf-runtime, rvf-import, rvf-wasm, rvf-node, rvf-server,
  plus integration tests)
- 40 runnable examples covering core storage, agentic AI, production
  patterns, vertical domains, exotic capabilities, runtime targets,
  network/security, POSIX/systems, and network operations
- TypeScript SDK (npm/packages/rvf) with RvfDatabase class
- MCP server (npm/packages/rvf-mcp-server) with stdio and SSE transports
- Node.js N-API bindings (npm/packages/rvf-node)
- WASM package (npm/packages/rvf-wasm)
- ADR-029 (canonical format), ADR-030 (computational container),
  ADR-031 (example repository)
- DNA-style lineage provenance, computational containers (KERNEL_SEG,
  EBPF_SEG), witness chains, TEE attestation, domain profiles
- Superseded ADR annotations for ADR-001, ADR-005, ADR-006, ADR-018-021

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add CLI, WASM store, generate_all, and 46 output .rvf files

- Add rvf-cli crate (665 lines, 9 subcommands: create/ingest/query/delete/status/inspect/compact/derive/serve)
- Add WASM control plane store (alloc_setup, segment, store modules) for ~46 KB binary
- Add generate_all.rs example producing 46 persistent .rvf files in output/
- Add Node.js N-API bindings for lineage, kernel/eBPF, and inspection
- Add npm TypeScript backend/database/types for RVF integration
- Update READMEs with CLI sections, MCP server docs, and crate map (13 crates)
- All 40 examples verified passing

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add Claude Code appliance, improve Quick Start, fix API docs

- Add claude_code_appliance.rs: self-booting RVF with SSH + Claude Code
  install (curl -fsSL https://claude.ai/install.sh | bash), 3 SSH users,
  eBPF filter, 20-package manifest, witness chain, lineage snapshot
- Improve Quick Start: Install section (crate/CLI/npm/WASM/MCP), WASM
  browser example, generate_all reference, expanded Rust crate deps
- Fix embed_kernel/embed_ebpf API docs to match actual signatures
  (u8 params with `as u8` cast, 6-param kernel, Option<&[u8]> btf)
- Update generate_all.rs: add claude_code_appliance generator (47 files)
- Regenerate all 47 output .rvf files

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add RVCOW branching, real kernel/eBPF/launcher, 795 tests

Vector-native copy-on-write branching (ADR-031) with four new segment
types (COW_MAP 0x20, REFCOUNT 0x21, MEMBERSHIP 0x22, DELTA 0x23),
real Linux microkernel builder, QEMU microVM launcher, real eBPF
programs, and 128-byte KernelBinding for tamper-evident kernel-manifest
linkage.

New crates:
- rvf-kernel: Docker-based kernel build, real cpio/newc initramfs builder,
  SHA3-256 verification, prebuilt kernel support (37 tests)
- rvf-launch: QEMU microVM launcher with QMP shutdown, KVM/TCG detection,
  virtio-blk/net port forwarding, kernel extraction (8 tests)
- rvf-ebpf: 3 real BPF C programs (xdp_distance, socket_filter,
  tc_query_route) with clang compilation support (17 tests)

RVCOW runtime:
- CowEngine with read/write paths, write coalescing, snapshot-freeze
- CowMap (flat-array), MembershipFilter (bitmap), CowCompactor
- 3x read performance via pread optimization (1.3us/vector)
- Branch creation: 2.6ms for 10K vectors, child = 162 bytes

Security: 20-finding audit, 7 fixes applied including division-by-zero
guards, integer overflow checks, and KernelBinding::from_bytes_validated().

CLI: 8 new commands (launch, embed-kernel, embed-ebpf, filter, freeze,
verify-witness, verify-attestation, rebuild-refcounts), serve wired to
real rvf-server.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): update README, add crate/npm READMEs, publish to crates.io and npm

- Rewrite README with cognitive container terminology, grouped features,
  4 comparison tables (vs Docker, Vector DBs, Git LFS, SQLite), updated
  benchmarks, architecture diagram, and 45 examples
- Add READMEs for rvf-kernel, rvf-launch, rvf-ebpf, rvf-import crates
- Add READMEs for @ruvector/rvf, rvf-node, rvf-wasm, rvf-mcp-server npm packages
- Fix Cargo.toml metadata (homepage, readme, categories, keywords) and
  add version specs to all path dependencies for crates.io publishing
- Fix clippy warnings in rvf-kernel/initramfs.rs and rvf-launch/lib.rs
- Published to crates.io: rvf-types, rvf-wire, rvf-manifest, rvf-quant,
  rvf-index, rvf-crypto (remaining crates pending rate limit)
- Published to npm: @ruvector/rvf, @ruvector/rvf-node, @ruvector/rvf-wasm,
  @ruvector/rvf-mcp-server

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: add rvf-kernel, rvf-ebpf, rvf-launch, rvf-server, rvf-import, rvf-cli to workspace

Include all 15 RVF crates plus integration tests and benchmarks in the
root workspace members list so cargo publish can resolve them by name.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(rvf): add published packages, cognitive container branding, grouped capabilities

- Add Published Packages section with 13 crates.io + 4 npm tables
- Add Platform Support table (Linux, macOS, Windows, WASM, no_std)
- Expand capability table from 9 to 15 rows in 4 groups
- Rewrite all "How" descriptions in plain language
- Update .rvf diagram to show all 20 segment types
- Rename ADRs: computational container -> cognitive container
- Add emojis to all section headers

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: update root README with RVF cognitive containers, expanded capabilities

- Update intro: "gets smarter + ships as cognitive container"
- Add self-booting microservice row to Pinecone comparison table
- Expand capabilities from 34 to 42 features with dedicated RVF section
- Update "Think of it as" to include Docker comparison and RVF explanation
- Add RVF collapsed group to Ecosystem (13 crates, 4 npm, install commands)
- Add RVF to Platform & Edge section with install commands
- Add RVF npm packages (4) and Rust crates (13) to package reference
- Add RVF rows to feature comparison table (6 new rows)
- Add ADR-030/031 to ADR list
- Add RVF to Installation table, Project Structure
- Update attention mechanisms count from 39 to 40+
- Update npm count to 49+, Rust crates to 83
- Update footer with crates.io and RVF links

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: expand comparison table with emojis, cost, audit, branching, single-file

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: rewrite comparison table in plain language

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: clean up empty code change sections in the changes log

---------

Co-authored-by: Claude <noreply@anthropic.com>

2026-02-14 13:14:49 -05:00

23 KiB

Raw Permalink Blame History

ADR-005: WASM Runtime Integration

Field	Value
Status	Proposed
Date	2026-01-18
Authors	RuvLLM Architecture Team
Reviewers	-
Supersedes	-
Superseded by	-

Note: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.

1. Context

1.1 Problem Statement

RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:

Rotary Position Embeddings (RoPE)
RMS Normalization (RMSNorm)
SwiGLU activation functions
KV cache quantization/dequantization
LoRA delta application

Without proper isolation, malicious or buggy kernels could:

Access unauthorized memory regions
Consume unbounded compute resources
Compromise the host system
Corrupt model state

1.2 Requirements

Requirement	Priority	Rationale
Sandboxed execution	Critical	Prevent kernel code from accessing host resources
Execution budgets	Critical	Prevent runaway code and DoS conditions
Low overhead	High	Kernels are in the inference hot path
Cross-platform	High	Support x86, ARM, embedded devices
Framework agnostic	Medium	Enable ML inference without vendor lock-in
Hot-swappable kernels	Medium	Update kernels without service restart

1.3 Constraints

Memory: Embedded targets have as little as 256KB RAM
Latency: Kernel invocation overhead must be <10us for small tensors
Compatibility: Must support existing Rust/C kernel implementations
Security: Kernel supply chain must be verifiable

2. Decision

We will adopt WebAssembly (WASM) as the sandboxed execution environment for compute kernels, with the following architecture:

2.1 Runtime Selection

Device Class	Runtime	Rationale
Edge servers (x86/ARM64)	Wasmtime	Mature, well-optimized, excellent tooling
Embedded/MCU (<1MB RAM)	WAMR	<85KB footprint, AOT compilation support
Browser/WASI Preview 2	wasmtime/browser	Future consideration

2.2 Interruption Strategy: Epoch-Based (Not Fuel)

We choose epoch-based interruption over fuel-based metering:

Aspect	Epoch	Fuel
Overhead	~2-5%	~15-30%
Granularity	Coarse (polling points)	Fine (per instruction)
Determinism	Non-deterministic	Deterministic
Implementation	Store-level epoch counter	Instruction instrumentation

Rationale: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.

// Epoch configuration example
let mut config = Config::new();
config.epoch_interruption(true);

let engine = Engine::new(&config)?;
let mut store = Store::new(&engine, ());

// Set epoch deadline (e.g., 100ms budget)
store.set_epoch_deadline(100);

// Increment epoch from async timer
engine.increment_epoch();

2.3 WASI-NN Integration

WASI-NN provides framework-agnostic ML inference capabilities:

+-------------------+
|   RuvLLM Host     |
+-------------------+
         |
         v
+-------------------+
|   WASI-NN API     |
+-------------------+
         |
    +----+----+
    |         |
    v         v
+-------+ +--------+
| ONNX  | | Custom |
| RT    | | Kernel |
+-------+ +--------+

WASI-NN Backends:

ONNX Runtime (portable)
Native kernels (performance-critical paths)
Custom quantized formats (memory efficiency)

3. WASM Boundary Design

3.1 ABI Strategy: Raw ABI (Not Component Model)

We use raw WASM ABI rather than the Component Model:

Aspect	Raw ABI	Component Model
Maturity	Stable	Evolving (Preview 2)
Overhead	Minimal	Higher (canonical ABI)
Tooling	Excellent	Improving
Adoption	Universal	Growing

Migration Path: Design interfaces to be Component Model-compatible for future migration.

3.2 Memory Layout

Host Linear Memory
+--------------------------------------------------+
| Tensor A    | Tensor B    | Output    | Scratch  |
| (read-only) | (read-only) | (write)   | (r/w)    |
+--------------------------------------------------+
     ^              ^            ^           ^
     |              |            |           |
   offset_a     offset_b    offset_out   offset_scratch

Shared Memory Protocol:

/// Kernel invocation descriptor passed to WASM
#[repr(C)]
pub struct KernelDescriptor {
    /// Input tensor A offset in linear memory
    pub input_a_offset: u32,
    /// Input tensor A size in bytes
    pub input_a_size: u32,
    /// Input tensor B offset (0 if unused)
    pub input_b_offset: u32,
    /// Input tensor B size in bytes
    pub input_b_size: u32,
    /// Output tensor offset
    pub output_offset: u32,
    /// Output tensor size in bytes
    pub output_size: u32,
    /// Scratch space offset
    pub scratch_offset: u32,
    /// Scratch space size in bytes
    pub scratch_size: u32,
    /// Kernel-specific parameters offset
    pub params_offset: u32,
    /// Kernel-specific parameters size
    pub params_size: u32,
}

3.3 Trap Handling

WASM traps are handled as non-fatal errors:

pub enum KernelError {
    /// Execution budget exceeded
    EpochDeadline,
    /// Out of bounds memory access
    MemoryAccessViolation {
        offset: u32,
        size: u32,
    },
    /// Integer overflow/underflow
    IntegerOverflow,
    /// Unreachable code executed
    Unreachable,
    /// Stack overflow
    StackOverflow,
    /// Invalid function call
    IndirectCallTypeMismatch,
    /// Custom trap from kernel
    KernelTrap {
        code: u32,
        message: Option<String>,
    },
}

impl From<wasmtime::Trap> for KernelError {
    fn from(trap: wasmtime::Trap) -> Self {
        match trap.trap_code() {
            Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
            Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
                offset: 0, // Extract from trap info
                size: 0,
            },
            // ... other mappings
        }
    }
}

Recovery Strategy:

Log trap with full context
Release kernel resources
Fall back to reference implementation (if available)
Report degraded performance to metrics

4. Kernel Pack System

4.1 Kernel Pack Structure

kernel-pack-v1.0.0/
├── kernels.json          # Manifest
├── kernels.json.sig      # Ed25519 signature
├── rope/
│   ├── rope_f32.wasm
│   ├── rope_f16.wasm
│   └── rope_q8.wasm
├── rmsnorm/
│   ├── rmsnorm_f32.wasm
│   └── rmsnorm_f16.wasm
├── swiglu/
│   ├── swiglu_f32.wasm
│   └── swiglu_f16.wasm
├── kv/
│   ├── kv_pack_q4.wasm
│   ├── kv_pack_q8.wasm
│   ├── kv_unpack_q4.wasm
│   └── kv_unpack_q8.wasm
└── lora/
    ├── lora_apply_f32.wasm
    └── lora_apply_f16.wasm

4.2 Manifest Schema (kernels.json)

{
  "$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
  "version": "1.0.0",
  "name": "ruvllm-core-kernels",
  "description": "Core compute kernels for RuvLLM inference",
  "min_runtime_version": "0.5.0",
  "max_runtime_version": "1.0.0",
  "created_at": "2026-01-18T00:00:00Z",
  "author": {
    "name": "RuvLLM Team",
    "email": "kernels@ruvllm.dev",
    "signing_key": "ed25519:AAAA..."
  },
  "kernels": [
    {
      "id": "rope_f32",
      "name": "Rotary Position Embedding (FP32)",
      "category": "positional_encoding",
      "path": "rope/rope_f32.wasm",
      "hash": "sha256:abc123...",
      "entry_point": "rope_forward",
      "inputs": [
        {
          "name": "x",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        },
        {
          "name": "freqs",
          "dtype": "f32",
          "shape": ["seq", "dim_half"]
        }
      ],
      "outputs": [
        {
          "name": "y",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        }
      ],
      "params": {
        "theta": {
          "type": "f32",
          "default": 10000.0
        }
      },
      "resource_limits": {
        "max_memory_pages": 256,
        "max_epoch_ticks": 1000,
        "max_table_elements": 1024
      },
      "platforms": {
        "wasmtime": {
          "min_version": "15.0.0",
          "features": ["simd", "bulk-memory"]
        },
        "wamr": {
          "min_version": "1.3.0",
          "aot_available": true
        }
      },
      "benchmarks": {
        "seq_512_dim_128": {
          "latency_us": 45,
          "throughput_gflops": 2.1
        }
      }
    }
  ],
  "fallbacks": {
    "rope_f32": "rope_reference",
    "rmsnorm_f32": "rmsnorm_reference"
  }
}

4.3 Included Kernel Packs

Category	Kernels	Notes
Positional	RoPE (f32, f16, q8)	Rotary embeddings
Normalization	RMSNorm (f32, f16)	Pre-attention normalization
Activation	SwiGLU (f32, f16)	Gated activation
KV Cache	pack_q4, pack_q8, unpack_q4, unpack_q8	Quantize/dequantize
Adapter	LoRA apply (f32, f16)	Delta weight application

Attention Note: Attention kernels remain native initially due to:

Complex memory access patterns
Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
Significant overhead from WASM boundary crossing for large tensors

5. Supply Chain Security

5.1 Signature Verification

use ed25519_dalek::{Signature, VerifyingKey, Verifier};

pub struct KernelPackVerifier {
    trusted_keys: Vec<VerifyingKey>,
}

impl KernelPackVerifier {
    /// Verify kernel pack signature
    pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
        let sig = Signature::try_from(signature)?;

        for key in &self.trusted_keys {
            if key.verify(manifest, &sig).is_ok() {
                return Ok(());
            }
        }

        Err(VerifyError::NoTrustedKey)
    }

    /// Verify individual kernel hash
    pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
        use sha2::{Sha256, Digest};

        let mut hasher = Sha256::new();
        hasher.update(kernel_bytes);
        let hash = format!("sha256:{:x}", hasher.finalize());

        if hash == expected_hash {
            Ok(())
        } else {
            Err(VerifyError::HashMismatch {
                expected: expected_hash.to_string(),
                actual: hash,
            })
        }
    }
}

5.2 Version Compatibility Gates

pub struct CompatibilityChecker {
    runtime_version: Version,
}

impl CompatibilityChecker {
    pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
        // Check runtime version bounds
        if self.runtime_version < manifest.min_runtime_version {
            return CompatibilityResult::RuntimeTooOld {
                required: manifest.min_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        if self.runtime_version > manifest.max_runtime_version {
            return CompatibilityResult::RuntimeTooNew {
                max_supported: manifest.max_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        // Check WASM feature requirements
        for kernel in &manifest.kernels {
            if let Some(platform) = kernel.platforms.get("wasmtime") {
                for feature in &platform.features {
                    if !self.has_feature(feature) {
                        return CompatibilityResult::MissingFeature {
                            kernel: kernel.id.clone(),
                            feature: feature.clone(),
                        };
                    }
                }
            }
        }

        CompatibilityResult::Compatible
    }
}

5.3 Safe Rollback Protocol

pub struct KernelManager {
    active_pack: Arc<RwLock<KernelPack>>,
    previous_pack: Arc<RwLock<Option<KernelPack>>>,
    metrics: KernelMetrics,
}

impl KernelManager {
    /// Upgrade to new kernel pack with automatic rollback on failure
    pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
        // Step 1: Verify new pack
        self.verifier.verify(&new_pack)?;
        self.compatibility.check(&new_pack.manifest)?;

        // Step 2: Compile kernels (AOT if supported)
        let compiled = self.compile_pack(&new_pack).await?;

        // Step 3: Atomic swap with rollback capability
        {
            let mut active = self.active_pack.write().await;
            let mut previous = self.previous_pack.write().await;

            // Store current as rollback target
            *previous = Some(std::mem::replace(&mut *active, compiled));
        }

        // Step 4: Health check with new kernels
        if let Err(e) = self.health_check().await {
            tracing::error!("Kernel health check failed: {}", e);
            self.rollback().await?;
            return Err(UpgradeError::HealthCheckFailed(e));
        }

        // Step 5: Clear rollback after grace period
        tokio::spawn({
            let previous = self.previous_pack.clone();
            async move {
                tokio::time::sleep(Duration::from_secs(300)).await;
                *previous.write().await = None;
            }
        });

        Ok(())
    }

    /// Rollback to previous kernel pack
    pub async fn rollback(&self) -> Result<(), RollbackError> {
        let mut active = self.active_pack.write().await;
        let mut previous = self.previous_pack.write().await;

        if let Some(prev) = previous.take() {
            *active = prev;
            tracing::info!("Rolled back to previous kernel pack");
            Ok(())
        } else {
            Err(RollbackError::NoPreviousPack)
        }
    }
}

6. Device Class Configurations

6.1 Edge Server Configuration (Wasmtime + Epoch)

pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Performance optimizations
    config.cranelift_opt_level(OptLevel::Speed);
    config.cranelift_nan_canonicalization(false);
    config.parallel_compilation(true);

    // SIMD support for vectorized operations
    config.wasm_simd(true);
    config.wasm_bulk_memory(true);
    config.wasm_multi_value(true);

    // Memory configuration
    config.static_memory_maximum_size(1 << 32); // 4GB max
    config.dynamic_memory_guard_size(1 << 16);  // 64KB guard

    // Epoch-based interruption
    config.epoch_interruption(true);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        epoch_tick_interval: Duration::from_millis(10),
        default_epoch_budget: 1000, // 10 seconds max
    })
}

6.2 Embedded Configuration (WAMR AOT)

pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
    let mut config = WamrConfig::new();

    // Minimal footprint configuration
    config.set_stack_size(32 * 1024);        // 32KB stack
    config.set_heap_size(128 * 1024);        // 128KB heap
    config.enable_aot(true);                  // Pre-compiled modules
    config.enable_simd(false);                // Often unavailable on MCU
    config.enable_bulk_memory(true);

    // Interpreter fallback for debugging
    config.enable_interp(cfg!(debug_assertions));

    // Execution limits
    config.set_exec_timeout_ms(100);          // 100ms max per invocation

    Ok(WamrRuntime::new(config)?)
}

6.3 WASI Threads (Optional)

For platforms supporting WASI threads:

pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Enable threading support
    config.wasm_threads(true);
    config.wasm_shared_memory(true);

    // Thread pool configuration
    config.async_support(true);
    config.max_wasm_threads(4);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        thread_pool_size: 4,
    })
}

Platform Support Matrix:

Platform	WASI Threads	Notes
Linux x86_64	Yes	Full support
Linux ARM64	Yes	Full support
macOS	Yes	Full support
Windows	Yes	Full support
WAMR	No	Single-threaded only
Browser	Yes	Via SharedArrayBuffer

7. Performance Considerations

7.1 Invocation Overhead

Operation	Latency	Notes
Kernel lookup	~100ns	Hash table lookup
Instance creation	~1us	Pre-compiled module
Memory setup	~500ns	Shared memory mapping
Epoch check	~2ns	Single atomic read
Return value	~100ns	Register transfer
Total	~2us	Per invocation

7.2 Optimization Strategies

Module Caching: Pre-compile and cache WASM modules
Instance Pooling: Reuse instances across invocations
Memory Sharing: Map host tensors directly into WASM linear memory
Batch Invocations: Process multiple requests per kernel call

7.3 When to Bypass WASM

WASM sandboxing should be bypassed (with explicit opt-in) for:

Attention kernels (complex memory patterns)
Large matrix multiplications (>1000x1000)
Operations with <1ms latency requirements
Trusted, verified native kernels

8. Alternatives Considered

8.1 eBPF

Aspect	eBPF	WASM
Platform	Linux only	Cross-platform
Verification	Static, strict	Dynamic, flexible
Memory model	Constrained	Linear memory
Tooling	Improving	Mature

Decision: WASM chosen for cross-platform support.

8.2 Lua/LuaJIT

Aspect	Lua	WASM
Performance	Good (JIT)	Excellent (AOT)
Sandboxing	Manual effort	Built-in
Type safety	Dynamic	Static
Ecosystem	Large	Growing

Decision: WASM chosen for type safety and native compilation.

8.3 Native Plugins with seccomp

Aspect	seccomp	WASM
Isolation	Process-level	In-process
Overhead	IPC cost	Minimal
Portability	Linux only	Cross-platform
Complexity	High	Moderate

Decision: WASM chosen for in-process efficiency and portability.

9. Consequences

9.1 Positive

Security: Strong isolation prevents kernel code from compromising host
Portability: Same kernels run on servers and embedded devices
Hot Updates: Kernels can be updated without service restart
Ecosystem: Large WASM toolchain and community support
Auditability: WASM modules can be inspected and verified

9.2 Negative

Overhead: ~2us per invocation vs. native direct call
Complexity: Additional abstraction layer to maintain
Tooling: WASM debugging tools less mature than native
Learning Curve: Team needs WASM expertise

9.3 Risks

Risk	Likelihood	Impact	Mitigation
Performance regression	Medium	High	Benchmark suite, native fallbacks
WASI-NN instability	Low	Medium	Abstract behind internal API
Supply chain attack	Low	Critical	Signature verification, trusted keys
Epoch timing variability	Medium	Low	Generous budgets, monitoring

10. Implementation Plan

Phase 1: Foundation (Weeks 1-2)

Set up Wasmtime integration
Implement kernel descriptor ABI
Create basic kernel loader

Phase 2: Core Kernels (Weeks 3-4)

Implement RoPE kernel
Implement RMSNorm kernel
Implement SwiGLU kernel

Phase 3: KV Cache (Weeks 5-6)

Implement quantization kernels
Implement dequantization kernels
Integration with cache manager

Phase 4: Security (Weeks 7-8)

Implement signature verification
Create version compatibility checker
Build rollback system

Phase 5: Embedded (Weeks 9-10)

WAMR integration
AOT compilation pipeline
Resource-constrained testing

11. References

12. Appendix

A. Kernel Interface Definition

/// Standard kernel interface (exported by WASM modules)
#[link(wasm_import_module = "ruvllm")]
extern "C" {
    /// Initialize kernel with parameters
    fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;

    /// Execute kernel forward pass
    fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Execute kernel backward pass (optional)
    fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Get kernel metadata
    fn kernel_info(info_ptr: *mut KernelInfo) -> i32;

    /// Cleanup kernel resources
    fn kernel_cleanup() -> i32;
}

B. Error Codes

Code	Name	Description
0	OK	Success
1	INVALID_INPUT	Invalid input tensor
2	INVALID_OUTPUT	Invalid output tensor
3	INVALID_PARAMS	Invalid kernel parameters
4	OUT_OF_MEMORY	Insufficient memory
5	NOT_IMPLEMENTED	Operation not supported
6	INTERNAL_ERROR	Internal kernel error

C. Benchmark Template

#[cfg(test)]
mod benchmarks {
    use criterion::{criterion_group, criterion_main, Criterion};

    fn bench_rope_f32(c: &mut Criterion) {
        let runtime = create_server_runtime().unwrap();
        let kernel = runtime.load_kernel("rope_f32").unwrap();

        let input = Tensor::random([1, 512, 32, 128], DType::F32);
        let freqs = Tensor::random([512, 64], DType::F32);

        c.bench_function("rope_f32_seq512", |b| {
            b.iter(|| {
                kernel.forward(&input, &freqs).unwrap()
            })
        });
    }

    criterion_group!(benches, bench_rope_f32);
    criterion_main!(benches);
}

ADR-001: Ruvector Core Architecture
ADR-002: RuvLLM Integration
ADR-003: SIMD Optimization Strategy
ADR-007: Security Review & Technical Debt

Security Status (v2.1)

Component	Status	Notes
SharedArrayBuffer	✅ Secure	Safety documentation for race conditions
WASM Memory	✅ Secure	Bounds checking via WASM sandbox
Kernel Loading	⚠️ Planned	Signature verification pending

Fixes Applied:

Added comprehensive safety comments documenting race condition prevention in shared.rs
JavaScript/WASM coordination patterns documented

Outstanding Items:

TD-007 (P2): Embedded JavaScript should be extracted to separate files

See ADR-007 for full security audit trail.

Revision History

Version	Date	Author	Changes
1.0	2026-01-18	RuVector Architecture Team	Initial version
1.1	2026-01-19	Security Review Agent	Added security status, related decisions

23 KiB Raw Permalink Blame History