ruvector/docs/adr/ADR-005-wasm-runtime-integration.md

# ADR-005: WASM Runtime Integration

| Field | Value |
|-------|-------|
| **Status** | Proposed |
| **Date** | 2026-01-18 |
| **Authors** | RuvLLM Architecture Team |
| **Reviewers** | - |
| **Supersedes** | - |
| **Superseded by** | - |

**Note**: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.

## 1. Context

### 1.1 Problem Statement

RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:

- Rotary Position Embeddings (RoPE)
- RMS Normalization (RMSNorm)
- SwiGLU activation functions
- KV cache quantization/dequantization
- LoRA delta application

Without proper isolation, malicious or buggy kernels could:
- Access unauthorized memory regions
- Consume unbounded compute resources
- Compromise the host system
- Corrupt model state

### 1.2 Requirements

| Requirement | Priority | Rationale |
|-------------|----------|-----------|
| Sandboxed execution | Critical | Prevent kernel code from accessing host resources |
| Execution budgets | Critical | Prevent runaway code and DoS conditions |
| Low overhead | High | Kernels are in the inference hot path |
| Cross-platform | High | Support x86, ARM, embedded devices |
| Framework agnostic | Medium | Enable ML inference without vendor lock-in |
| Hot-swappable kernels | Medium | Update kernels without service restart |

### 1.3 Constraints

- **Memory**: Embedded targets have as little as 256KB RAM
- **Latency**: Kernel invocation overhead must be <10us for small tensors
- **Compatibility**: Must support existing Rust/C kernel implementations
- **Security**: Kernel supply chain must be verifiable

## 2. Decision

We will adopt **WebAssembly (WASM)** as the sandboxed execution environment for compute kernels, with the following architecture:

### 2.1 Runtime Selection

| Device Class | Runtime | Rationale |
|--------------|---------|-----------|
| Edge servers (x86/ARM64) | **Wasmtime** | Mature, well-optimized, excellent tooling |
| Embedded/MCU (<1MB RAM) | **WAMR** | <85KB footprint, AOT compilation support |
| Browser/WASI Preview 2 | **wasmtime/browser** | Future consideration |

### 2.2 Interruption Strategy: Epoch-Based (Not Fuel)

We choose **epoch-based interruption** over fuel-based metering:

| Aspect | Epoch | Fuel |
|--------|-------|------|
| Overhead | ~2-5% | ~15-30% |
| Granularity | Coarse (polling points) | Fine (per instruction) |
| Determinism | Non-deterministic | Deterministic |
| Implementation | Store-level epoch counter | Instruction instrumentation |

**Rationale**: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.

```rust
// Epoch configuration example
let mut config = Config::new();
config.epoch_interruption(true);

let engine = Engine::new(&config)?;
let mut store = Store::new(&engine, ());

// Set epoch deadline (e.g., 100ms budget)
store.set_epoch_deadline(100);

// Increment epoch from async timer
engine.increment_epoch();
```

### 2.3 WASI-NN Integration

WASI-NN provides framework-agnostic ML inference capabilities:

```
+-------------------+
|   RuvLLM Host     |
+-------------------+
         |
         v
+-------------------+
|   WASI-NN API     |
+-------------------+
         |
    +----+----+
    |         |
    v         v
+-------+ +--------+
| ONNX  | | Custom |
| RT    | | Kernel |
+-------+ +--------+
```

**WASI-NN Backends**:
- ONNX Runtime (portable)
- Native kernels (performance-critical paths)
- Custom quantized formats (memory efficiency)

## 3. WASM Boundary Design

### 3.1 ABI Strategy: Raw ABI (Not Component Model)

We use **raw WASM ABI** rather than the Component Model:

| Aspect | Raw ABI | Component Model |
|--------|---------|-----------------|
| Maturity | Stable | Evolving (Preview 2) |
| Overhead | Minimal | Higher (canonical ABI) |
| Tooling | Excellent | Improving |
| Adoption | Universal | Growing |

**Migration Path**: Design interfaces to be Component Model-compatible for future migration.

### 3.2 Memory Layout

```
Host Linear Memory
+--------------------------------------------------+
| Tensor A    | Tensor B    | Output    | Scratch  |
| (read-only) | (read-only) | (write)   | (r/w)    |
+--------------------------------------------------+
     ^              ^            ^           ^
     |              |            |           |
   offset_a     offset_b    offset_out   offset_scratch
```

**Shared Memory Protocol**:

```rust
/// Kernel invocation descriptor passed to WASM
#[repr(C)]
pub struct KernelDescriptor {
    /// Input tensor A offset in linear memory
    pub input_a_offset: u32,
    /// Input tensor A size in bytes
    pub input_a_size: u32,
    /// Input tensor B offset (0 if unused)
    pub input_b_offset: u32,
    /// Input tensor B size in bytes
    pub input_b_size: u32,
    /// Output tensor offset
    pub output_offset: u32,
    /// Output tensor size in bytes
    pub output_size: u32,
    /// Scratch space offset
    pub scratch_offset: u32,
    /// Scratch space size in bytes
    pub scratch_size: u32,
    /// Kernel-specific parameters offset
    pub params_offset: u32,
    /// Kernel-specific parameters size
    pub params_size: u32,
}
```

### 3.3 Trap Handling

WASM traps are handled as **non-fatal errors**:

```rust
pub enum KernelError {
    /// Execution budget exceeded
    EpochDeadline,
    /// Out of bounds memory access
    MemoryAccessViolation {
        offset: u32,
        size: u32,
    },
    /// Integer overflow/underflow
    IntegerOverflow,
    /// Unreachable code executed
    Unreachable,
    /// Stack overflow
    StackOverflow,
    /// Invalid function call
    IndirectCallTypeMismatch,
    /// Custom trap from kernel
    KernelTrap {
        code: u32,
        message: Option<String>,
    },
}

impl From<wasmtime::Trap> for KernelError {
    fn from(trap: wasmtime::Trap) -> Self {
        match trap.trap_code() {
            Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
            Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
                offset: 0, // Extract from trap info
                size: 0,
            },
            // ... other mappings
        }
    }
}
```

**Recovery Strategy**:

1. Log trap with full context
2. Release kernel resources
3. Fall back to reference implementation (if available)
4. Report degraded performance to metrics

## 4. Kernel Pack System

### 4.1 Kernel Pack Structure

```
kernel-pack-v1.0.0/
├── kernels.json          # Manifest
├── kernels.json.sig      # Ed25519 signature
├── rope/
│   ├── rope_f32.wasm
│   ├── rope_f16.wasm
│   └── rope_q8.wasm
├── rmsnorm/
│   ├── rmsnorm_f32.wasm
│   └── rmsnorm_f16.wasm
├── swiglu/
│   ├── swiglu_f32.wasm
│   └── swiglu_f16.wasm
├── kv/
│   ├── kv_pack_q4.wasm
│   ├── kv_pack_q8.wasm
│   ├── kv_unpack_q4.wasm
│   └── kv_unpack_q8.wasm
└── lora/
    ├── lora_apply_f32.wasm
    └── lora_apply_f16.wasm
```

### 4.2 Manifest Schema (kernels.json)

```json
{
  "$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
  "version": "1.0.0",
  "name": "ruvllm-core-kernels",
  "description": "Core compute kernels for RuvLLM inference",
  "min_runtime_version": "0.5.0",
  "max_runtime_version": "1.0.0",
  "created_at": "2026-01-18T00:00:00Z",
  "author": {
    "name": "RuvLLM Team",
    "email": "kernels@ruvllm.dev",
    "signing_key": "ed25519:AAAA..."
  },
  "kernels": [
    {
      "id": "rope_f32",
      "name": "Rotary Position Embedding (FP32)",
      "category": "positional_encoding",
      "path": "rope/rope_f32.wasm",
      "hash": "sha256:abc123...",
      "entry_point": "rope_forward",
      "inputs": [
        {
          "name": "x",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        },
        {
          "name": "freqs",
          "dtype": "f32",
          "shape": ["seq", "dim_half"]
        }
      ],
      "outputs": [
        {
          "name": "y",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        }
      ],
      "params": {
        "theta": {
          "type": "f32",
          "default": 10000.0
        }
      },
      "resource_limits": {
        "max_memory_pages": 256,
        "max_epoch_ticks": 1000,
        "max_table_elements": 1024
      },
      "platforms": {
        "wasmtime": {
          "min_version": "15.0.0",
          "features": ["simd", "bulk-memory"]
        },
        "wamr": {
          "min_version": "1.3.0",
          "aot_available": true
        }
      },
      "benchmarks": {
        "seq_512_dim_128": {
          "latency_us": 45,
          "throughput_gflops": 2.1
        }
      }
    }
  ],
  "fallbacks": {
    "rope_f32": "rope_reference",
    "rmsnorm_f32": "rmsnorm_reference"
  }
}
```

### 4.3 Included Kernel Packs

| Category | Kernels | Notes |
|----------|---------|-------|
| **Positional** | RoPE (f32, f16, q8) | Rotary embeddings |
| **Normalization** | RMSNorm (f32, f16) | Pre-attention normalization |
| **Activation** | SwiGLU (f32, f16) | Gated activation |
| **KV Cache** | pack_q4, pack_q8, unpack_q4, unpack_q8 | Quantize/dequantize |
| **Adapter** | LoRA apply (f32, f16) | Delta weight application |

**Attention Note**: Attention kernels remain **native** initially due to:
- Complex memory access patterns
- Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
- Significant overhead from WASM boundary crossing for large tensors

## 5. Supply Chain Security

### 5.1 Signature Verification

```rust
use ed25519_dalek::{Signature, VerifyingKey, Verifier};

pub struct KernelPackVerifier {
    trusted_keys: Vec<VerifyingKey>,
}

impl KernelPackVerifier {
    /// Verify kernel pack signature
    pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
        let sig = Signature::try_from(signature)?;

        for key in &self.trusted_keys {
            if key.verify(manifest, &sig).is_ok() {
                return Ok(());
            }
        }

        Err(VerifyError::NoTrustedKey)
    }

    /// Verify individual kernel hash
    pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
        use sha2::{Sha256, Digest};

        let mut hasher = Sha256::new();
        hasher.update(kernel_bytes);
        let hash = format!("sha256:{:x}", hasher.finalize());

        if hash == expected_hash {
            Ok(())
        } else {
            Err(VerifyError::HashMismatch {
                expected: expected_hash.to_string(),
                actual: hash,
            })
        }
    }
}
```

### 5.2 Version Compatibility Gates

```rust
pub struct CompatibilityChecker {
    runtime_version: Version,
}

impl CompatibilityChecker {
    pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
        // Check runtime version bounds
        if self.runtime_version < manifest.min_runtime_version {
            return CompatibilityResult::RuntimeTooOld {
                required: manifest.min_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        if self.runtime_version > manifest.max_runtime_version {
            return CompatibilityResult::RuntimeTooNew {
                max_supported: manifest.max_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        // Check WASM feature requirements
        for kernel in &manifest.kernels {
            if let Some(platform) = kernel.platforms.get("wasmtime") {
                for feature in &platform.features {
                    if !self.has_feature(feature) {
                        return CompatibilityResult::MissingFeature {
                            kernel: kernel.id.clone(),
                            feature: feature.clone(),
                        };
                    }
                }
            }
        }

        CompatibilityResult::Compatible
    }
}
```

### 5.3 Safe Rollback Protocol

```rust
pub struct KernelManager {
    active_pack: Arc<RwLock<KernelPack>>,
    previous_pack: Arc<RwLock<Option<KernelPack>>>,
    metrics: KernelMetrics,
}

impl KernelManager {
    /// Upgrade to new kernel pack with automatic rollback on failure
    pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
        // Step 1: Verify new pack
        self.verifier.verify(&new_pack)?;
        self.compatibility.check(&new_pack.manifest)?;

        // Step 2: Compile kernels (AOT if supported)
        let compiled = self.compile_pack(&new_pack).await?;

        // Step 3: Atomic swap with rollback capability
        {
            let mut active = self.active_pack.write().await;
            let mut previous = self.previous_pack.write().await;

            // Store current as rollback target
            *previous = Some(std::mem::replace(&mut *active, compiled));
        }

        // Step 4: Health check with new kernels
        if let Err(e) = self.health_check().await {
            tracing::error!("Kernel health check failed: {}", e);
            self.rollback().await?;
            return Err(UpgradeError::HealthCheckFailed(e));
        }

        // Step 5: Clear rollback after grace period
        tokio::spawn({
            let previous = self.previous_pack.clone();
            async move {
                tokio::time::sleep(Duration::from_secs(300)).await;
                *previous.write().await = None;
            }
        });

        Ok(())
    }

    /// Rollback to previous kernel pack
    pub async fn rollback(&self) -> Result<(), RollbackError> {
        let mut active = self.active_pack.write().await;
        let mut previous = self.previous_pack.write().await;

        if let Some(prev) = previous.take() {
            *active = prev;
            tracing::info!("Rolled back to previous kernel pack");
            Ok(())
        } else {
            Err(RollbackError::NoPreviousPack)
        }
    }
}
```

## 6. Device Class Configurations

### 6.1 Edge Server Configuration (Wasmtime + Epoch)

```rust
pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Performance optimizations
    config.cranelift_opt_level(OptLevel::Speed);
    config.cranelift_nan_canonicalization(false);
    config.parallel_compilation(true);

    // SIMD support for vectorized operations
    config.wasm_simd(true);
    config.wasm_bulk_memory(true);
    config.wasm_multi_value(true);

    // Memory configuration
    config.static_memory_maximum_size(1 << 32); // 4GB max
    config.dynamic_memory_guard_size(1 << 16);  // 64KB guard

    // Epoch-based interruption
    config.epoch_interruption(true);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        epoch_tick_interval: Duration::from_millis(10),
        default_epoch_budget: 1000, // 10 seconds max
    })
}
```

### 6.2 Embedded Configuration (WAMR AOT)

```rust
pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
    let mut config = WamrConfig::new();

    // Minimal footprint configuration
    config.set_stack_size(32 * 1024);        // 32KB stack
    config.set_heap_size(128 * 1024);        // 128KB heap
    config.enable_aot(true);                  // Pre-compiled modules
    config.enable_simd(false);                // Often unavailable on MCU
    config.enable_bulk_memory(true);

    // Interpreter fallback for debugging
    config.enable_interp(cfg!(debug_assertions));

    // Execution limits
    config.set_exec_timeout_ms(100);          // 100ms max per invocation

    Ok(WamrRuntime::new(config)?)
}
```

### 6.3 WASI Threads (Optional)

For platforms supporting WASI threads:

```rust
pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Enable threading support
    config.wasm_threads(true);
    config.wasm_shared_memory(true);

    // Thread pool configuration
    config.async_support(true);
    config.max_wasm_threads(4);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        thread_pool_size: 4,
    })
}
```

**Platform Support Matrix**:

| Platform | WASI Threads | Notes |
|----------|--------------|-------|
| Linux x86_64 | Yes | Full support |
| Linux ARM64 | Yes | Full support |
| macOS | Yes | Full support |
| Windows | Yes | Full support |
| WAMR | No | Single-threaded only |
| Browser | Yes | Via SharedArrayBuffer |

## 7. Performance Considerations

### 7.1 Invocation Overhead

| Operation | Latency | Notes |
|-----------|---------|-------|
| Kernel lookup | ~100ns | Hash table lookup |
| Instance creation | ~1us | Pre-compiled module |
| Memory setup | ~500ns | Shared memory mapping |
| Epoch check | ~2ns | Single atomic read |
| Return value | ~100ns | Register transfer |
| **Total** | **~2us** | Per invocation |

### 7.2 Optimization Strategies

1. **Module Caching**: Pre-compile and cache WASM modules
2. **Instance Pooling**: Reuse instances across invocations
3. **Memory Sharing**: Map host tensors directly into WASM linear memory
4. **Batch Invocations**: Process multiple requests per kernel call

### 7.3 When to Bypass WASM

WASM sandboxing should be bypassed (with explicit opt-in) for:

- Attention kernels (complex memory patterns)
- Large matrix multiplications (>1000x1000)
- Operations with <1ms latency requirements
- Trusted, verified native kernels

## 8. Alternatives Considered

### 8.1 eBPF

| Aspect | eBPF | WASM |
|--------|------|------|
| Platform | Linux only | Cross-platform |
| Verification | Static, strict | Dynamic, flexible |
| Memory model | Constrained | Linear memory |
| Tooling | Improving | Mature |

**Decision**: WASM chosen for cross-platform support.

### 8.2 Lua/LuaJIT

| Aspect | Lua | WASM |
|--------|-----|------|
| Performance | Good (JIT) | Excellent (AOT) |
| Sandboxing | Manual effort | Built-in |
| Type safety | Dynamic | Static |
| Ecosystem | Large | Growing |

**Decision**: WASM chosen for type safety and native compilation.

### 8.3 Native Plugins with seccomp

| Aspect | seccomp | WASM |
|--------|---------|------|
| Isolation | Process-level | In-process |
| Overhead | IPC cost | Minimal |
| Portability | Linux only | Cross-platform |
| Complexity | High | Moderate |

**Decision**: WASM chosen for in-process efficiency and portability.

## 9. Consequences

### 9.1 Positive

- **Security**: Strong isolation prevents kernel code from compromising host
- **Portability**: Same kernels run on servers and embedded devices
- **Hot Updates**: Kernels can be updated without service restart
- **Ecosystem**: Large WASM toolchain and community support
- **Auditability**: WASM modules can be inspected and verified

### 9.2 Negative

- **Overhead**: ~2us per invocation vs. native direct call
- **Complexity**: Additional abstraction layer to maintain
- **Tooling**: WASM debugging tools less mature than native
- **Learning Curve**: Team needs WASM expertise

### 9.3 Risks

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Performance regression | Medium | High | Benchmark suite, native fallbacks |
| WASI-NN instability | Low | Medium | Abstract behind internal API |
| Supply chain attack | Low | Critical | Signature verification, trusted keys |
| Epoch timing variability | Medium | Low | Generous budgets, monitoring |

## 10. Implementation Plan

### Phase 1: Foundation (Weeks 1-2)
- [ ] Set up Wasmtime integration
- [ ] Implement kernel descriptor ABI
- [ ] Create basic kernel loader

### Phase 2: Core Kernels (Weeks 3-4)
- [ ] Implement RoPE kernel
- [ ] Implement RMSNorm kernel
- [ ] Implement SwiGLU kernel

### Phase 3: KV Cache (Weeks 5-6)
- [ ] Implement quantization kernels
- [ ] Implement dequantization kernels
- [ ] Integration with cache manager

### Phase 4: Security (Weeks 7-8)
- [ ] Implement signature verification
- [ ] Create version compatibility checker
- [ ] Build rollback system

### Phase 5: Embedded (Weeks 9-10)
- [ ] WAMR integration
- [ ] AOT compilation pipeline
- [ ] Resource-constrained testing

## 11. References

- [Wasmtime Documentation](https://docs.wasmtime.dev/)
- [WAMR Documentation](https://github.com/bytecodealliance/wasm-micro-runtime)
- [WASI-NN Specification](https://github.com/WebAssembly/wasi-nn)
- [WebAssembly Security Model](https://webassembly.org/docs/security/)
- [Component Model Proposal](https://github.com/WebAssembly/component-model)

## 12. Appendix

### A. Kernel Interface Definition

```rust
/// Standard kernel interface (exported by WASM modules)
#[link(wasm_import_module = "ruvllm")]
extern "C" {
    /// Initialize kernel with parameters
    fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;

    /// Execute kernel forward pass
    fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Execute kernel backward pass (optional)
    fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Get kernel metadata
    fn kernel_info(info_ptr: *mut KernelInfo) -> i32;

    /// Cleanup kernel resources
    fn kernel_cleanup() -> i32;
}
```

### B. Error Codes

| Code | Name | Description |
|------|------|-------------|
| 0 | OK | Success |
| 1 | INVALID_INPUT | Invalid input tensor |
| 2 | INVALID_OUTPUT | Invalid output tensor |
| 3 | INVALID_PARAMS | Invalid kernel parameters |
| 4 | OUT_OF_MEMORY | Insufficient memory |
| 5 | NOT_IMPLEMENTED | Operation not supported |
| 6 | INTERNAL_ERROR | Internal kernel error |

### C. Benchmark Template

```rust
#[cfg(test)]
mod benchmarks {
    use criterion::{criterion_group, criterion_main, Criterion};

    fn bench_rope_f32(c: &mut Criterion) {
        let runtime = create_server_runtime().unwrap();
        let kernel = runtime.load_kernel("rope_f32").unwrap();

        let input = Tensor::random([1, 512, 32, 128], DType::F32);
        let freqs = Tensor::random([512, 64], DType::F32);

        c.bench_function("rope_f32_seq512", |b| {
            b.iter(|| {
                kernel.forward(&input, &freqs).unwrap()
            })
        });
    }

    criterion_group!(benches, bench_rope_f32);
    criterion_main!(benches);
}
```

---

## Related Decisions

- **ADR-001**: Ruvector Core Architecture
- **ADR-002**: RuvLLM Integration
- **ADR-003**: SIMD Optimization Strategy
- **ADR-007**: Security Review & Technical Debt

---

## Security Status (v2.1)

| Component | Status | Notes |
|-----------|--------|-------|
| SharedArrayBuffer | ✅ Secure | Safety documentation for race conditions |
| WASM Memory | ✅ Secure | Bounds checking via WASM sandbox |
| Kernel Loading | ⚠️ Planned | Signature verification pending |

**Fixes Applied:**
- Added comprehensive safety comments documenting race condition prevention in `shared.rs`
- JavaScript/WASM coordination patterns documented

**Outstanding Items:**
- TD-007 (P2): Embedded JavaScript should be extracted to separate files

See ADR-007 for full security audit trail.

---

## Revision History

| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added security status, related decisions |