mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-27 00:25:10 +00:00
* feat(rvf): add RuVector Format universal substrate specification Research and design for RVF — a streaming, progressive, adaptive, quantum-secure binary format for vector intelligence. Covers append-only segment model, two-level tail manifests, temperature tiering, progressive HNSW indexing, epoch-based overlay system, SIMD-optimized query paths, WASM microkernel for Cognitum tiles, domain profiles (RVDNA, RVText, RVGraph, RVVision), and post-quantum cryptography. https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW * feat(rvf): add deletion, filtered search, concurrency, and operations specs Fill four specification gaps in the RVF format design: - spec/07: Vector deletion lifecycle, JOURNAL_SEG wire format, deletion bitmaps - spec/08: Filtered search with META_SEG, METAIDX_SEG, filter expression language - spec/09: Writer locking, reader-writer coordination, versioning, space reclamation - spec/10: Batch operations API, error codes, network streaming protocol Also fixes the segment header field conflict between spec/01 and wire/binary-layout.md (checksum_algo/compression now u8, adds uncompressed_len at 0x38). https://claude.ai/code/session_01DDqjGE51JpsRE3DgUjFyjW * feat(rvf): add RuVector Format SDK, 40 examples, MCP server, and documentation Complete RVF implementation including: - 12 Rust crates (rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant, rvf-crypto, rvf-runtime, rvf-import, rvf-wasm, rvf-node, rvf-server, plus integration tests) - 40 runnable examples covering core storage, agentic AI, production patterns, vertical domains, exotic capabilities, runtime targets, network/security, POSIX/systems, and network operations - TypeScript SDK (npm/packages/rvf) with RvfDatabase class - MCP server (npm/packages/rvf-mcp-server) with stdio and SSE transports - Node.js N-API bindings (npm/packages/rvf-node) - WASM package (npm/packages/rvf-wasm) - ADR-029 (canonical format), ADR-030 (computational container), ADR-031 (example repository) - DNA-style lineage provenance, computational containers (KERNEL_SEG, EBPF_SEG), witness chains, TEE attestation, domain profiles - Superseded ADR annotations for ADR-001, ADR-005, ADR-006, ADR-018-021 Co-Authored-By: claude-flow <ruv@ruv.net> * feat(rvf): add CLI, WASM store, generate_all, and 46 output .rvf files - Add rvf-cli crate (665 lines, 9 subcommands: create/ingest/query/delete/status/inspect/compact/derive/serve) - Add WASM control plane store (alloc_setup, segment, store modules) for ~46 KB binary - Add generate_all.rs example producing 46 persistent .rvf files in output/ - Add Node.js N-API bindings for lineage, kernel/eBPF, and inspection - Add npm TypeScript backend/database/types for RVF integration - Update READMEs with CLI sections, MCP server docs, and crate map (13 crates) - All 40 examples verified passing Co-Authored-By: claude-flow <ruv@ruv.net> * feat(rvf): add Claude Code appliance, improve Quick Start, fix API docs - Add claude_code_appliance.rs: self-booting RVF with SSH + Claude Code install (curl -fsSL https://claude.ai/install.sh | bash), 3 SSH users, eBPF filter, 20-package manifest, witness chain, lineage snapshot - Improve Quick Start: Install section (crate/CLI/npm/WASM/MCP), WASM browser example, generate_all reference, expanded Rust crate deps - Fix embed_kernel/embed_ebpf API docs to match actual signatures (u8 params with `as u8` cast, 6-param kernel, Option<&[u8]> btf) - Update generate_all.rs: add claude_code_appliance generator (47 files) - Regenerate all 47 output .rvf files Co-Authored-By: claude-flow <ruv@ruv.net> * feat(rvf): add RVCOW branching, real kernel/eBPF/launcher, 795 tests Vector-native copy-on-write branching (ADR-031) with four new segment types (COW_MAP 0x20, REFCOUNT 0x21, MEMBERSHIP 0x22, DELTA 0x23), real Linux microkernel builder, QEMU microVM launcher, real eBPF programs, and 128-byte KernelBinding for tamper-evident kernel-manifest linkage. New crates: - rvf-kernel: Docker-based kernel build, real cpio/newc initramfs builder, SHA3-256 verification, prebuilt kernel support (37 tests) - rvf-launch: QEMU microVM launcher with QMP shutdown, KVM/TCG detection, virtio-blk/net port forwarding, kernel extraction (8 tests) - rvf-ebpf: 3 real BPF C programs (xdp_distance, socket_filter, tc_query_route) with clang compilation support (17 tests) RVCOW runtime: - CowEngine with read/write paths, write coalescing, snapshot-freeze - CowMap (flat-array), MembershipFilter (bitmap), CowCompactor - 3x read performance via pread optimization (1.3us/vector) - Branch creation: 2.6ms for 10K vectors, child = 162 bytes Security: 20-finding audit, 7 fixes applied including division-by-zero guards, integer overflow checks, and KernelBinding::from_bytes_validated(). CLI: 8 new commands (launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts), serve wired to real rvf-server. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(rvf): update README, add crate/npm READMEs, publish to crates.io and npm - Rewrite README with cognitive container terminology, grouped features, 4 comparison tables (vs Docker, Vector DBs, Git LFS, SQLite), updated benchmarks, architecture diagram, and 45 examples - Add READMEs for rvf-kernel, rvf-launch, rvf-ebpf, rvf-import crates - Add READMEs for @ruvector/rvf, rvf-node, rvf-wasm, rvf-mcp-server npm packages - Fix Cargo.toml metadata (homepage, readme, categories, keywords) and add version specs to all path dependencies for crates.io publishing - Fix clippy warnings in rvf-kernel/initramfs.rs and rvf-launch/lib.rs - Published to crates.io: rvf-types, rvf-wire, rvf-manifest, rvf-quant, rvf-index, rvf-crypto (remaining crates pending rate limit) - Published to npm: @ruvector/rvf, @ruvector/rvf-node, @ruvector/rvf-wasm, @ruvector/rvf-mcp-server Co-Authored-By: claude-flow <ruv@ruv.net> * chore: add rvf-kernel, rvf-ebpf, rvf-launch, rvf-server, rvf-import, rvf-cli to workspace Include all 15 RVF crates plus integration tests and benchmarks in the root workspace members list so cargo publish can resolve them by name. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(rvf): add published packages, cognitive container branding, grouped capabilities - Add Published Packages section with 13 crates.io + 4 npm tables - Add Platform Support table (Linux, macOS, Windows, WASM, no_std) - Expand capability table from 9 to 15 rows in 4 groups - Rewrite all "How" descriptions in plain language - Update .rvf diagram to show all 20 segment types - Rename ADRs: computational container -> cognitive container - Add emojis to all section headers Co-Authored-By: claude-flow <ruv@ruv.net> * feat: update root README with RVF cognitive containers, expanded capabilities - Update intro: "gets smarter + ships as cognitive container" - Add self-booting microservice row to Pinecone comparison table - Expand capabilities from 34 to 42 features with dedicated RVF section - Update "Think of it as" to include Docker comparison and RVF explanation - Add RVF collapsed group to Ecosystem (13 crates, 4 npm, install commands) - Add RVF to Platform & Edge section with install commands - Add RVF npm packages (4) and Rust crates (13) to package reference - Add RVF rows to feature comparison table (6 new rows) - Add ADR-030/031 to ADR list - Add RVF to Installation table, Project Structure - Update attention mechanisms count from 39 to 40+ - Update npm count to 49+, Rust crates to 83 - Update footer with crates.io and RVF links Co-Authored-By: claude-flow <ruv@ruv.net> * feat: expand comparison table with emojis, cost, audit, branching, single-file Co-Authored-By: claude-flow <ruv@ruv.net> * docs: rewrite comparison table in plain language Co-Authored-By: claude-flow <ruv@ruv.net> * chore: clean up empty code change sections in the changes log --------- Co-authored-by: Claude <noreply@anthropic.com>
814 lines
23 KiB
Markdown
814 lines
23 KiB
Markdown
# ADR-005: WASM Runtime Integration
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| **Status** | Proposed |
|
||
| **Date** | 2026-01-18 |
|
||
| **Authors** | RuvLLM Architecture Team |
|
||
| **Reviewers** | - |
|
||
| **Supersedes** | - |
|
||
| **Superseded by** | - |
|
||
|
||
**Note**: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.
|
||
|
||
## 1. Context
|
||
|
||
### 1.1 Problem Statement
|
||
|
||
RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:
|
||
|
||
- Rotary Position Embeddings (RoPE)
|
||
- RMS Normalization (RMSNorm)
|
||
- SwiGLU activation functions
|
||
- KV cache quantization/dequantization
|
||
- LoRA delta application
|
||
|
||
Without proper isolation, malicious or buggy kernels could:
|
||
- Access unauthorized memory regions
|
||
- Consume unbounded compute resources
|
||
- Compromise the host system
|
||
- Corrupt model state
|
||
|
||
### 1.2 Requirements
|
||
|
||
| Requirement | Priority | Rationale |
|
||
|-------------|----------|-----------|
|
||
| Sandboxed execution | Critical | Prevent kernel code from accessing host resources |
|
||
| Execution budgets | Critical | Prevent runaway code and DoS conditions |
|
||
| Low overhead | High | Kernels are in the inference hot path |
|
||
| Cross-platform | High | Support x86, ARM, embedded devices |
|
||
| Framework agnostic | Medium | Enable ML inference without vendor lock-in |
|
||
| Hot-swappable kernels | Medium | Update kernels without service restart |
|
||
|
||
### 1.3 Constraints
|
||
|
||
- **Memory**: Embedded targets have as little as 256KB RAM
|
||
- **Latency**: Kernel invocation overhead must be <10us for small tensors
|
||
- **Compatibility**: Must support existing Rust/C kernel implementations
|
||
- **Security**: Kernel supply chain must be verifiable
|
||
|
||
## 2. Decision
|
||
|
||
We will adopt **WebAssembly (WASM)** as the sandboxed execution environment for compute kernels, with the following architecture:
|
||
|
||
### 2.1 Runtime Selection
|
||
|
||
| Device Class | Runtime | Rationale |
|
||
|--------------|---------|-----------|
|
||
| Edge servers (x86/ARM64) | **Wasmtime** | Mature, well-optimized, excellent tooling |
|
||
| Embedded/MCU (<1MB RAM) | **WAMR** | <85KB footprint, AOT compilation support |
|
||
| Browser/WASI Preview 2 | **wasmtime/browser** | Future consideration |
|
||
|
||
### 2.2 Interruption Strategy: Epoch-Based (Not Fuel)
|
||
|
||
We choose **epoch-based interruption** over fuel-based metering:
|
||
|
||
| Aspect | Epoch | Fuel |
|
||
|--------|-------|------|
|
||
| Overhead | ~2-5% | ~15-30% |
|
||
| Granularity | Coarse (polling points) | Fine (per instruction) |
|
||
| Determinism | Non-deterministic | Deterministic |
|
||
| Implementation | Store-level epoch counter | Instruction instrumentation |
|
||
|
||
**Rationale**: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.
|
||
|
||
```rust
|
||
// Epoch configuration example
|
||
let mut config = Config::new();
|
||
config.epoch_interruption(true);
|
||
|
||
let engine = Engine::new(&config)?;
|
||
let mut store = Store::new(&engine, ());
|
||
|
||
// Set epoch deadline (e.g., 100ms budget)
|
||
store.set_epoch_deadline(100);
|
||
|
||
// Increment epoch from async timer
|
||
engine.increment_epoch();
|
||
```
|
||
|
||
### 2.3 WASI-NN Integration
|
||
|
||
WASI-NN provides framework-agnostic ML inference capabilities:
|
||
|
||
```
|
||
+-------------------+
|
||
| RuvLLM Host |
|
||
+-------------------+
|
||
|
|
||
v
|
||
+-------------------+
|
||
| WASI-NN API |
|
||
+-------------------+
|
||
|
|
||
+----+----+
|
||
| |
|
||
v v
|
||
+-------+ +--------+
|
||
| ONNX | | Custom |
|
||
| RT | | Kernel |
|
||
+-------+ +--------+
|
||
```
|
||
|
||
**WASI-NN Backends**:
|
||
- ONNX Runtime (portable)
|
||
- Native kernels (performance-critical paths)
|
||
- Custom quantized formats (memory efficiency)
|
||
|
||
## 3. WASM Boundary Design
|
||
|
||
### 3.1 ABI Strategy: Raw ABI (Not Component Model)
|
||
|
||
We use **raw WASM ABI** rather than the Component Model:
|
||
|
||
| Aspect | Raw ABI | Component Model |
|
||
|--------|---------|-----------------|
|
||
| Maturity | Stable | Evolving (Preview 2) |
|
||
| Overhead | Minimal | Higher (canonical ABI) |
|
||
| Tooling | Excellent | Improving |
|
||
| Adoption | Universal | Growing |
|
||
|
||
**Migration Path**: Design interfaces to be Component Model-compatible for future migration.
|
||
|
||
### 3.2 Memory Layout
|
||
|
||
```
|
||
Host Linear Memory
|
||
+--------------------------------------------------+
|
||
| Tensor A | Tensor B | Output | Scratch |
|
||
| (read-only) | (read-only) | (write) | (r/w) |
|
||
+--------------------------------------------------+
|
||
^ ^ ^ ^
|
||
| | | |
|
||
offset_a offset_b offset_out offset_scratch
|
||
```
|
||
|
||
**Shared Memory Protocol**:
|
||
|
||
```rust
|
||
/// Kernel invocation descriptor passed to WASM
|
||
#[repr(C)]
|
||
pub struct KernelDescriptor {
|
||
/// Input tensor A offset in linear memory
|
||
pub input_a_offset: u32,
|
||
/// Input tensor A size in bytes
|
||
pub input_a_size: u32,
|
||
/// Input tensor B offset (0 if unused)
|
||
pub input_b_offset: u32,
|
||
/// Input tensor B size in bytes
|
||
pub input_b_size: u32,
|
||
/// Output tensor offset
|
||
pub output_offset: u32,
|
||
/// Output tensor size in bytes
|
||
pub output_size: u32,
|
||
/// Scratch space offset
|
||
pub scratch_offset: u32,
|
||
/// Scratch space size in bytes
|
||
pub scratch_size: u32,
|
||
/// Kernel-specific parameters offset
|
||
pub params_offset: u32,
|
||
/// Kernel-specific parameters size
|
||
pub params_size: u32,
|
||
}
|
||
```
|
||
|
||
### 3.3 Trap Handling
|
||
|
||
WASM traps are handled as **non-fatal errors**:
|
||
|
||
```rust
|
||
pub enum KernelError {
|
||
/// Execution budget exceeded
|
||
EpochDeadline,
|
||
/// Out of bounds memory access
|
||
MemoryAccessViolation {
|
||
offset: u32,
|
||
size: u32,
|
||
},
|
||
/// Integer overflow/underflow
|
||
IntegerOverflow,
|
||
/// Unreachable code executed
|
||
Unreachable,
|
||
/// Stack overflow
|
||
StackOverflow,
|
||
/// Invalid function call
|
||
IndirectCallTypeMismatch,
|
||
/// Custom trap from kernel
|
||
KernelTrap {
|
||
code: u32,
|
||
message: Option<String>,
|
||
},
|
||
}
|
||
|
||
impl From<wasmtime::Trap> for KernelError {
|
||
fn from(trap: wasmtime::Trap) -> Self {
|
||
match trap.trap_code() {
|
||
Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
|
||
Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
|
||
offset: 0, // Extract from trap info
|
||
size: 0,
|
||
},
|
||
// ... other mappings
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Recovery Strategy**:
|
||
|
||
1. Log trap with full context
|
||
2. Release kernel resources
|
||
3. Fall back to reference implementation (if available)
|
||
4. Report degraded performance to metrics
|
||
|
||
## 4. Kernel Pack System
|
||
|
||
### 4.1 Kernel Pack Structure
|
||
|
||
```
|
||
kernel-pack-v1.0.0/
|
||
├── kernels.json # Manifest
|
||
├── kernels.json.sig # Ed25519 signature
|
||
├── rope/
|
||
│ ├── rope_f32.wasm
|
||
│ ├── rope_f16.wasm
|
||
│ └── rope_q8.wasm
|
||
├── rmsnorm/
|
||
│ ├── rmsnorm_f32.wasm
|
||
│ └── rmsnorm_f16.wasm
|
||
├── swiglu/
|
||
│ ├── swiglu_f32.wasm
|
||
│ └── swiglu_f16.wasm
|
||
├── kv/
|
||
│ ├── kv_pack_q4.wasm
|
||
│ ├── kv_pack_q8.wasm
|
||
│ ├── kv_unpack_q4.wasm
|
||
│ └── kv_unpack_q8.wasm
|
||
└── lora/
|
||
├── lora_apply_f32.wasm
|
||
└── lora_apply_f16.wasm
|
||
```
|
||
|
||
### 4.2 Manifest Schema (kernels.json)
|
||
|
||
```json
|
||
{
|
||
"$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
|
||
"version": "1.0.0",
|
||
"name": "ruvllm-core-kernels",
|
||
"description": "Core compute kernels for RuvLLM inference",
|
||
"min_runtime_version": "0.5.0",
|
||
"max_runtime_version": "1.0.0",
|
||
"created_at": "2026-01-18T00:00:00Z",
|
||
"author": {
|
||
"name": "RuvLLM Team",
|
||
"email": "kernels@ruvllm.dev",
|
||
"signing_key": "ed25519:AAAA..."
|
||
},
|
||
"kernels": [
|
||
{
|
||
"id": "rope_f32",
|
||
"name": "Rotary Position Embedding (FP32)",
|
||
"category": "positional_encoding",
|
||
"path": "rope/rope_f32.wasm",
|
||
"hash": "sha256:abc123...",
|
||
"entry_point": "rope_forward",
|
||
"inputs": [
|
||
{
|
||
"name": "x",
|
||
"dtype": "f32",
|
||
"shape": ["batch", "seq", "heads", "dim"]
|
||
},
|
||
{
|
||
"name": "freqs",
|
||
"dtype": "f32",
|
||
"shape": ["seq", "dim_half"]
|
||
}
|
||
],
|
||
"outputs": [
|
||
{
|
||
"name": "y",
|
||
"dtype": "f32",
|
||
"shape": ["batch", "seq", "heads", "dim"]
|
||
}
|
||
],
|
||
"params": {
|
||
"theta": {
|
||
"type": "f32",
|
||
"default": 10000.0
|
||
}
|
||
},
|
||
"resource_limits": {
|
||
"max_memory_pages": 256,
|
||
"max_epoch_ticks": 1000,
|
||
"max_table_elements": 1024
|
||
},
|
||
"platforms": {
|
||
"wasmtime": {
|
||
"min_version": "15.0.0",
|
||
"features": ["simd", "bulk-memory"]
|
||
},
|
||
"wamr": {
|
||
"min_version": "1.3.0",
|
||
"aot_available": true
|
||
}
|
||
},
|
||
"benchmarks": {
|
||
"seq_512_dim_128": {
|
||
"latency_us": 45,
|
||
"throughput_gflops": 2.1
|
||
}
|
||
}
|
||
}
|
||
],
|
||
"fallbacks": {
|
||
"rope_f32": "rope_reference",
|
||
"rmsnorm_f32": "rmsnorm_reference"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4.3 Included Kernel Packs
|
||
|
||
| Category | Kernels | Notes |
|
||
|----------|---------|-------|
|
||
| **Positional** | RoPE (f32, f16, q8) | Rotary embeddings |
|
||
| **Normalization** | RMSNorm (f32, f16) | Pre-attention normalization |
|
||
| **Activation** | SwiGLU (f32, f16) | Gated activation |
|
||
| **KV Cache** | pack_q4, pack_q8, unpack_q4, unpack_q8 | Quantize/dequantize |
|
||
| **Adapter** | LoRA apply (f32, f16) | Delta weight application |
|
||
|
||
**Attention Note**: Attention kernels remain **native** initially due to:
|
||
- Complex memory access patterns
|
||
- Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
|
||
- Significant overhead from WASM boundary crossing for large tensors
|
||
|
||
## 5. Supply Chain Security
|
||
|
||
### 5.1 Signature Verification
|
||
|
||
```rust
|
||
use ed25519_dalek::{Signature, VerifyingKey, Verifier};
|
||
|
||
pub struct KernelPackVerifier {
|
||
trusted_keys: Vec<VerifyingKey>,
|
||
}
|
||
|
||
impl KernelPackVerifier {
|
||
/// Verify kernel pack signature
|
||
pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
|
||
let sig = Signature::try_from(signature)?;
|
||
|
||
for key in &self.trusted_keys {
|
||
if key.verify(manifest, &sig).is_ok() {
|
||
return Ok(());
|
||
}
|
||
}
|
||
|
||
Err(VerifyError::NoTrustedKey)
|
||
}
|
||
|
||
/// Verify individual kernel hash
|
||
pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
|
||
use sha2::{Sha256, Digest};
|
||
|
||
let mut hasher = Sha256::new();
|
||
hasher.update(kernel_bytes);
|
||
let hash = format!("sha256:{:x}", hasher.finalize());
|
||
|
||
if hash == expected_hash {
|
||
Ok(())
|
||
} else {
|
||
Err(VerifyError::HashMismatch {
|
||
expected: expected_hash.to_string(),
|
||
actual: hash,
|
||
})
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 5.2 Version Compatibility Gates
|
||
|
||
```rust
|
||
pub struct CompatibilityChecker {
|
||
runtime_version: Version,
|
||
}
|
||
|
||
impl CompatibilityChecker {
|
||
pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
|
||
// Check runtime version bounds
|
||
if self.runtime_version < manifest.min_runtime_version {
|
||
return CompatibilityResult::RuntimeTooOld {
|
||
required: manifest.min_runtime_version.clone(),
|
||
actual: self.runtime_version.clone(),
|
||
};
|
||
}
|
||
|
||
if self.runtime_version > manifest.max_runtime_version {
|
||
return CompatibilityResult::RuntimeTooNew {
|
||
max_supported: manifest.max_runtime_version.clone(),
|
||
actual: self.runtime_version.clone(),
|
||
};
|
||
}
|
||
|
||
// Check WASM feature requirements
|
||
for kernel in &manifest.kernels {
|
||
if let Some(platform) = kernel.platforms.get("wasmtime") {
|
||
for feature in &platform.features {
|
||
if !self.has_feature(feature) {
|
||
return CompatibilityResult::MissingFeature {
|
||
kernel: kernel.id.clone(),
|
||
feature: feature.clone(),
|
||
};
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
CompatibilityResult::Compatible
|
||
}
|
||
}
|
||
```
|
||
|
||
### 5.3 Safe Rollback Protocol
|
||
|
||
```rust
|
||
pub struct KernelManager {
|
||
active_pack: Arc<RwLock<KernelPack>>,
|
||
previous_pack: Arc<RwLock<Option<KernelPack>>>,
|
||
metrics: KernelMetrics,
|
||
}
|
||
|
||
impl KernelManager {
|
||
/// Upgrade to new kernel pack with automatic rollback on failure
|
||
pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
|
||
// Step 1: Verify new pack
|
||
self.verifier.verify(&new_pack)?;
|
||
self.compatibility.check(&new_pack.manifest)?;
|
||
|
||
// Step 2: Compile kernels (AOT if supported)
|
||
let compiled = self.compile_pack(&new_pack).await?;
|
||
|
||
// Step 3: Atomic swap with rollback capability
|
||
{
|
||
let mut active = self.active_pack.write().await;
|
||
let mut previous = self.previous_pack.write().await;
|
||
|
||
// Store current as rollback target
|
||
*previous = Some(std::mem::replace(&mut *active, compiled));
|
||
}
|
||
|
||
// Step 4: Health check with new kernels
|
||
if let Err(e) = self.health_check().await {
|
||
tracing::error!("Kernel health check failed: {}", e);
|
||
self.rollback().await?;
|
||
return Err(UpgradeError::HealthCheckFailed(e));
|
||
}
|
||
|
||
// Step 5: Clear rollback after grace period
|
||
tokio::spawn({
|
||
let previous = self.previous_pack.clone();
|
||
async move {
|
||
tokio::time::sleep(Duration::from_secs(300)).await;
|
||
*previous.write().await = None;
|
||
}
|
||
});
|
||
|
||
Ok(())
|
||
}
|
||
|
||
/// Rollback to previous kernel pack
|
||
pub async fn rollback(&self) -> Result<(), RollbackError> {
|
||
let mut active = self.active_pack.write().await;
|
||
let mut previous = self.previous_pack.write().await;
|
||
|
||
if let Some(prev) = previous.take() {
|
||
*active = prev;
|
||
tracing::info!("Rolled back to previous kernel pack");
|
||
Ok(())
|
||
} else {
|
||
Err(RollbackError::NoPreviousPack)
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 6. Device Class Configurations
|
||
|
||
### 6.1 Edge Server Configuration (Wasmtime + Epoch)
|
||
|
||
```rust
|
||
pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
|
||
let mut config = Config::new();
|
||
|
||
// Performance optimizations
|
||
config.cranelift_opt_level(OptLevel::Speed);
|
||
config.cranelift_nan_canonicalization(false);
|
||
config.parallel_compilation(true);
|
||
|
||
// SIMD support for vectorized operations
|
||
config.wasm_simd(true);
|
||
config.wasm_bulk_memory(true);
|
||
config.wasm_multi_value(true);
|
||
|
||
// Memory configuration
|
||
config.static_memory_maximum_size(1 << 32); // 4GB max
|
||
config.dynamic_memory_guard_size(1 << 16); // 64KB guard
|
||
|
||
// Epoch-based interruption
|
||
config.epoch_interruption(true);
|
||
|
||
let engine = Engine::new(&config)?;
|
||
|
||
Ok(WasmRuntime {
|
||
engine,
|
||
epoch_tick_interval: Duration::from_millis(10),
|
||
default_epoch_budget: 1000, // 10 seconds max
|
||
})
|
||
}
|
||
```
|
||
|
||
### 6.2 Embedded Configuration (WAMR AOT)
|
||
|
||
```rust
|
||
pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
|
||
let mut config = WamrConfig::new();
|
||
|
||
// Minimal footprint configuration
|
||
config.set_stack_size(32 * 1024); // 32KB stack
|
||
config.set_heap_size(128 * 1024); // 128KB heap
|
||
config.enable_aot(true); // Pre-compiled modules
|
||
config.enable_simd(false); // Often unavailable on MCU
|
||
config.enable_bulk_memory(true);
|
||
|
||
// Interpreter fallback for debugging
|
||
config.enable_interp(cfg!(debug_assertions));
|
||
|
||
// Execution limits
|
||
config.set_exec_timeout_ms(100); // 100ms max per invocation
|
||
|
||
Ok(WamrRuntime::new(config)?)
|
||
}
|
||
```
|
||
|
||
### 6.3 WASI Threads (Optional)
|
||
|
||
For platforms supporting WASI threads:
|
||
|
||
```rust
|
||
pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
|
||
let mut config = Config::new();
|
||
|
||
// Enable threading support
|
||
config.wasm_threads(true);
|
||
config.wasm_shared_memory(true);
|
||
|
||
// Thread pool configuration
|
||
config.async_support(true);
|
||
config.max_wasm_threads(4);
|
||
|
||
let engine = Engine::new(&config)?;
|
||
|
||
Ok(WasmRuntime {
|
||
engine,
|
||
thread_pool_size: 4,
|
||
})
|
||
}
|
||
```
|
||
|
||
**Platform Support Matrix**:
|
||
|
||
| Platform | WASI Threads | Notes |
|
||
|----------|--------------|-------|
|
||
| Linux x86_64 | Yes | Full support |
|
||
| Linux ARM64 | Yes | Full support |
|
||
| macOS | Yes | Full support |
|
||
| Windows | Yes | Full support |
|
||
| WAMR | No | Single-threaded only |
|
||
| Browser | Yes | Via SharedArrayBuffer |
|
||
|
||
## 7. Performance Considerations
|
||
|
||
### 7.1 Invocation Overhead
|
||
|
||
| Operation | Latency | Notes |
|
||
|-----------|---------|-------|
|
||
| Kernel lookup | ~100ns | Hash table lookup |
|
||
| Instance creation | ~1us | Pre-compiled module |
|
||
| Memory setup | ~500ns | Shared memory mapping |
|
||
| Epoch check | ~2ns | Single atomic read |
|
||
| Return value | ~100ns | Register transfer |
|
||
| **Total** | **~2us** | Per invocation |
|
||
|
||
### 7.2 Optimization Strategies
|
||
|
||
1. **Module Caching**: Pre-compile and cache WASM modules
|
||
2. **Instance Pooling**: Reuse instances across invocations
|
||
3. **Memory Sharing**: Map host tensors directly into WASM linear memory
|
||
4. **Batch Invocations**: Process multiple requests per kernel call
|
||
|
||
### 7.3 When to Bypass WASM
|
||
|
||
WASM sandboxing should be bypassed (with explicit opt-in) for:
|
||
|
||
- Attention kernels (complex memory patterns)
|
||
- Large matrix multiplications (>1000x1000)
|
||
- Operations with <1ms latency requirements
|
||
- Trusted, verified native kernels
|
||
|
||
## 8. Alternatives Considered
|
||
|
||
### 8.1 eBPF
|
||
|
||
| Aspect | eBPF | WASM |
|
||
|--------|------|------|
|
||
| Platform | Linux only | Cross-platform |
|
||
| Verification | Static, strict | Dynamic, flexible |
|
||
| Memory model | Constrained | Linear memory |
|
||
| Tooling | Improving | Mature |
|
||
|
||
**Decision**: WASM chosen for cross-platform support.
|
||
|
||
### 8.2 Lua/LuaJIT
|
||
|
||
| Aspect | Lua | WASM |
|
||
|--------|-----|------|
|
||
| Performance | Good (JIT) | Excellent (AOT) |
|
||
| Sandboxing | Manual effort | Built-in |
|
||
| Type safety | Dynamic | Static |
|
||
| Ecosystem | Large | Growing |
|
||
|
||
**Decision**: WASM chosen for type safety and native compilation.
|
||
|
||
### 8.3 Native Plugins with seccomp
|
||
|
||
| Aspect | seccomp | WASM |
|
||
|--------|---------|------|
|
||
| Isolation | Process-level | In-process |
|
||
| Overhead | IPC cost | Minimal |
|
||
| Portability | Linux only | Cross-platform |
|
||
| Complexity | High | Moderate |
|
||
|
||
**Decision**: WASM chosen for in-process efficiency and portability.
|
||
|
||
## 9. Consequences
|
||
|
||
### 9.1 Positive
|
||
|
||
- **Security**: Strong isolation prevents kernel code from compromising host
|
||
- **Portability**: Same kernels run on servers and embedded devices
|
||
- **Hot Updates**: Kernels can be updated without service restart
|
||
- **Ecosystem**: Large WASM toolchain and community support
|
||
- **Auditability**: WASM modules can be inspected and verified
|
||
|
||
### 9.2 Negative
|
||
|
||
- **Overhead**: ~2us per invocation vs. native direct call
|
||
- **Complexity**: Additional abstraction layer to maintain
|
||
- **Tooling**: WASM debugging tools less mature than native
|
||
- **Learning Curve**: Team needs WASM expertise
|
||
|
||
### 9.3 Risks
|
||
|
||
| Risk | Likelihood | Impact | Mitigation |
|
||
|------|------------|--------|------------|
|
||
| Performance regression | Medium | High | Benchmark suite, native fallbacks |
|
||
| WASI-NN instability | Low | Medium | Abstract behind internal API |
|
||
| Supply chain attack | Low | Critical | Signature verification, trusted keys |
|
||
| Epoch timing variability | Medium | Low | Generous budgets, monitoring |
|
||
|
||
## 10. Implementation Plan
|
||
|
||
### Phase 1: Foundation (Weeks 1-2)
|
||
- [ ] Set up Wasmtime integration
|
||
- [ ] Implement kernel descriptor ABI
|
||
- [ ] Create basic kernel loader
|
||
|
||
### Phase 2: Core Kernels (Weeks 3-4)
|
||
- [ ] Implement RoPE kernel
|
||
- [ ] Implement RMSNorm kernel
|
||
- [ ] Implement SwiGLU kernel
|
||
|
||
### Phase 3: KV Cache (Weeks 5-6)
|
||
- [ ] Implement quantization kernels
|
||
- [ ] Implement dequantization kernels
|
||
- [ ] Integration with cache manager
|
||
|
||
### Phase 4: Security (Weeks 7-8)
|
||
- [ ] Implement signature verification
|
||
- [ ] Create version compatibility checker
|
||
- [ ] Build rollback system
|
||
|
||
### Phase 5: Embedded (Weeks 9-10)
|
||
- [ ] WAMR integration
|
||
- [ ] AOT compilation pipeline
|
||
- [ ] Resource-constrained testing
|
||
|
||
## 11. References
|
||
|
||
- [Wasmtime Documentation](https://docs.wasmtime.dev/)
|
||
- [WAMR Documentation](https://github.com/bytecodealliance/wasm-micro-runtime)
|
||
- [WASI-NN Specification](https://github.com/WebAssembly/wasi-nn)
|
||
- [WebAssembly Security Model](https://webassembly.org/docs/security/)
|
||
- [Component Model Proposal](https://github.com/WebAssembly/component-model)
|
||
|
||
## 12. Appendix
|
||
|
||
### A. Kernel Interface Definition
|
||
|
||
```rust
|
||
/// Standard kernel interface (exported by WASM modules)
|
||
#[link(wasm_import_module = "ruvllm")]
|
||
extern "C" {
|
||
/// Initialize kernel with parameters
|
||
fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;
|
||
|
||
/// Execute kernel forward pass
|
||
fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;
|
||
|
||
/// Execute kernel backward pass (optional)
|
||
fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;
|
||
|
||
/// Get kernel metadata
|
||
fn kernel_info(info_ptr: *mut KernelInfo) -> i32;
|
||
|
||
/// Cleanup kernel resources
|
||
fn kernel_cleanup() -> i32;
|
||
}
|
||
```
|
||
|
||
### B. Error Codes
|
||
|
||
| Code | Name | Description |
|
||
|------|------|-------------|
|
||
| 0 | OK | Success |
|
||
| 1 | INVALID_INPUT | Invalid input tensor |
|
||
| 2 | INVALID_OUTPUT | Invalid output tensor |
|
||
| 3 | INVALID_PARAMS | Invalid kernel parameters |
|
||
| 4 | OUT_OF_MEMORY | Insufficient memory |
|
||
| 5 | NOT_IMPLEMENTED | Operation not supported |
|
||
| 6 | INTERNAL_ERROR | Internal kernel error |
|
||
|
||
### C. Benchmark Template
|
||
|
||
```rust
|
||
#[cfg(test)]
|
||
mod benchmarks {
|
||
use criterion::{criterion_group, criterion_main, Criterion};
|
||
|
||
fn bench_rope_f32(c: &mut Criterion) {
|
||
let runtime = create_server_runtime().unwrap();
|
||
let kernel = runtime.load_kernel("rope_f32").unwrap();
|
||
|
||
let input = Tensor::random([1, 512, 32, 128], DType::F32);
|
||
let freqs = Tensor::random([512, 64], DType::F32);
|
||
|
||
c.bench_function("rope_f32_seq512", |b| {
|
||
b.iter(|| {
|
||
kernel.forward(&input, &freqs).unwrap()
|
||
})
|
||
});
|
||
}
|
||
|
||
criterion_group!(benches, bench_rope_f32);
|
||
criterion_main!(benches);
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Related Decisions
|
||
|
||
- **ADR-001**: Ruvector Core Architecture
|
||
- **ADR-002**: RuvLLM Integration
|
||
- **ADR-003**: SIMD Optimization Strategy
|
||
- **ADR-007**: Security Review & Technical Debt
|
||
|
||
---
|
||
|
||
## Security Status (v2.1)
|
||
|
||
| Component | Status | Notes |
|
||
|-----------|--------|-------|
|
||
| SharedArrayBuffer | ✅ Secure | Safety documentation for race conditions |
|
||
| WASM Memory | ✅ Secure | Bounds checking via WASM sandbox |
|
||
| Kernel Loading | ⚠️ Planned | Signature verification pending |
|
||
|
||
**Fixes Applied:**
|
||
- Added comprehensive safety comments documenting race condition prevention in `shared.rs`
|
||
- JavaScript/WASM coordination patterns documented
|
||
|
||
**Outstanding Items:**
|
||
- TD-007 (P2): Embedded JavaScript should be extracted to separate files
|
||
|
||
See ADR-007 for full security audit trail.
|
||
|
||
---
|
||
|
||
## Revision History
|
||
|
||
| Version | Date | Author | Changes |
|
||
|---------|------|--------|---------|
|
||
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
|
||
| 1.1 | 2026-01-19 | Security Review Agent | Added security status, related decisions |
|