mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-23 12:55:26 +00:00

Reuven a929fde654 feat(rvm): RVM — Coherence-Native Microhypervisor for the Agentic Age

Complete implementation of the RVM microhypervisor:

13 Rust crates (all #![no_std], #![forbid(unsafe_code)]):
- rvm-types: Foundation types (64-byte WitnessRecord, ~40 ActionKind variants)
- rvm-hal: AArch64 EL2 HAL (stage-2 page tables, PL011 UART, GICv2, timer)
- rvm-cap: Capability system (P1/P2 proof verification, derivation trees)
- rvm-witness: Witness logging (FNV-1a hash chain, ring buffer, replay)
- rvm-proof: Proof engine (3-tier, constant-time P2 evaluation)
- rvm-partition: Partition model (lifecycle, split/merge, IPC, device leases)
- rvm-sched: Scheduler (2-signal priority, SMP coordinator, switch hot path)
- rvm-memory: Memory tiers (buddy allocator, 4-tier, RLE compression)
- rvm-coherence: Coherence engine (Stoer-Wagner mincut, adaptive frequency)
- rvm-boot: Bare-metal boot (7-phase measured, EL2 entry, linker script)
- rvm-wasm: Agent runtime (7-state lifecycle, migration, quotas)
- rvm-security: Security gate (validation, attestation, DMA budget)
- rvm-kernel: Integration kernel (boot/tick/create/destroy)

602 tests, 0 failures, 0 clippy warnings.
21 criterion benchmarks (all ADR targets exceeded).
9 ADRs (132-140), 15 design constraints (DC-1 through DC-15).
11 security findings addressed.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-04-04 12:10:19 -04:00

56 KiB

Raw Permalink Blame History

RVM Security Model

Status

Draft -- Research document for RVM bare-metal microhypervisor security architecture.

Date

2026-04-04

Scope

This document specifies the security model for RVM as a standalone, bare-metal, Rust-first microhypervisor for agents and edge computing. RVM does NOT depend on Linux or KVM. It boots directly on hardware (AArch64 primary, x86_64 secondary) and enforces all isolation through its own MMU page tables, capability system, and proof-gated mutation protocol.

The security model builds on the primitives already implemented in Phase A (ruvix-types, ruvix-cap, ruvix-proof, ruvix-region, ruvix-queue, ruvix-vecgraph, ruvix-nucleus) and extends them for bare-metal operation with hardware-enforced isolation.

1. Capability-Based Authority

1.1 Design Philosophy

RVM enforces the principle of least authority through capabilities. There is no ambient authority anywhere in the system. Every syscall requires an explicit capability handle that authorizes the operation. This means:

No global namespaces (no filesystem paths, no PIDs, no network ports accessible by name)
No superuser or root -- the root task holds initial capabilities but cannot bypass the model
No default permissions -- a newly spawned task has exactly the capabilities its parent explicitly grants via cap_grant
No ambient access to hardware -- device MMIO regions, interrupt lines, and DMA channels are all gated by capabilities

1.2 Capability Structure

Capabilities are kernel-resident objects. User-space code never sees the raw capability; it holds an opaque CapHandle that the kernel resolves through a per-task capability table.

/// The kernel-side capability. User space never sees this directly.
/// File: crates/ruvix/crates/types/src/capability.rs
#[repr(C)]
pub struct Capability {
    pub object_id: u64,          // Kernel object being referenced
    pub object_type: ObjectType, // Region, Queue, VectorStore, Task, etc.
    pub rights: CapRights,       // Bitmap of permitted operations
    pub badge: u64,              // Caller-visible demux identifier
    pub epoch: u64,              // Revocation epoch (stale handles detected)
}

Rights bitmap (from crates/ruvix/crates/types/src/capability.rs):

Right	Bit	Authorizes
`READ`	0	`vector_get`, `queue_recv`, region read
`WRITE`	1	`queue_send`, region append/slab write
`GRANT`	2	`cap_grant` to another task
`REVOKE`	3	Revoke capabilities derived from this one
`EXECUTE`	4	Task entry point, RVF component execution
`PROVE`	5	Generate proof tokens (`vector_put_proved`, `graph_apply_proved`)
`GRANT_ONCE`	6	Non-transitive grant (derived cap cannot re-grant)

1.3 Capability Delegation and Attenuation

Delegation follows strict monotonic attenuation: a task can only grant capabilities it holds, and the granted rights must be a subset of the held rights. This is enforced at the type level in Capability::derive():

/// Derive a capability with equal or fewer rights.
/// Returns None if rights escalation is attempted or GRANT right is absent.
pub fn derive(&self, new_rights: CapRights, new_badge: u64) -> Option<Self> {
    if !self.has_rights(CapRights::GRANT) { return None; }
    if !new_rights.is_subset_of(self.rights) { return None; }
    // GRANT_ONCE strips GRANT from the derived cap
    let final_rights = if self.rights.contains(CapRights::GRANT_ONCE) {
        new_rights.difference(CapRights::GRANT).difference(CapRights::GRANT_ONCE)
    } else {
        new_rights
    };
    Some(Self { rights: final_rights, badge: new_badge, ..*self })
}

Delegation depth limit: Maximum 8 levels (configurable per RVF manifest). The derivation tree tracks the full chain, and audit flags chains deeper than 4 (AUDIT_DEPTH_WARNING_THRESHOLD).

1.4 Capability Revocation

Revocation propagates through the derivation tree. When a capability is revoked:

The capability's epoch is incremented in the kernel's object table
All entries in the derivation tree rooted at the revoked capability are invalidated
Any held CapHandle referencing the old epoch returns KernelError::StaleCapability

This is O(d) where d is the number of derived capabilities, bounded by the delegation depth limit and the per-task capability table size (1024 entries max).

1.5 How This Differs from DAC/MAC

Property	DAC (Unix)	MAC (SELinux)	Capability (RVM)
Authority source	User identity	System-wide policy labels	Explicit token per object
Ambient authority	Yes (UID 0)	Yes (unconfined domain)	None
Confused deputy	Possible	Mitigated by labels	Prevented by design
Delegation	chmod/chown	Policy reload	`cap_grant` with attenuation
Revocation	File permission change	Policy reload	Tree-propagating, epoch-based
Granularity	File/directory	Type/role/level	Per-object, per-right

The critical difference: in RVM, authority is carried by the message, not the sender's identity. A task cannot access a resource simply because of "who it is" -- it must present a valid capability handle that was explicitly granted to it through a traceable delegation chain.

2. Proof-Gated Mutation

2.1 Invariant

No state mutation without a valid proof token. This is a kernel invariant, not a policy. The kernel physically prevents mutation of vector stores, graph stores, and RVF mounts without a ProofToken that passes all verification steps. Read operations (vector_get, queue_recv) do not require proofs.

2.2 What Constitutes a Valid Proof

A proof token must pass six verification steps (implemented in crates/ruvix/crates/vecgraph/src/proof_policy.rs ProofVerifier::verify()):

Capability check: The calling task must hold a capability with PROVE right on the target object
Hash match: proof.mutation_hash == expected_mutation_hash -- the proof authorizes exactly the mutation being applied
Tier satisfaction: proof.tier >= policy.required_tier -- higher tiers satisfy lower requirements (Deep satisfies Standard satisfies Reflex)
Time bound: current_time_ns <= proof.valid_until_ns -- proofs expire
Validity window: The window proof.valid_until_ns - current_time_ns must not exceed policy.max_validity_window_ns (prevents pre-computing proofs far in advance)
Nonce uniqueness: Each nonce can be consumed exactly once (ring buffer of 64 recent nonces prevents replay)

2.3 Proof Tiers

Three tiers provide a latency/assurance tradeoff:

Tier	Name	Latency Budget	Payload	Use Case
0	Reflex	<1 us	SHA-256 hash	High-frequency vector updates
1	Standard	<100 us	Merkle witness (root + path)	Graph mutations
2	Deep	<10 ms	Coherence certificate (scores + partition + signature)	Structural changes

2.4 Proof Lifecycle

  Task                    Proof Engine (RVF component)              Kernel
    |                              |                                  |
    |-- prepare mutation --------->|                                  |
    |   (compute mutation_hash)    |                                  |
    |                              |-- evaluate coherence state ----->|
    |                              |<-- current state ----------------|
    |                              |                                  |
    |<-- ProofToken ---------------|                                  |
    |   (hash, tier, payload,      |                                  |
    |    expiry, nonce)            |                                  |
    |                              |                                  |
    |-- syscall (token) ------------------------------------------>|
    |                                                               |
    |                              Kernel verifies 6 steps:         |
    |                              1. PROVE right on cap            |
    |                              2. Hash match                    |
    |                              3. Tier >= policy                |
    |                              4. Not expired                   |
    |                              5. Window not too wide           |
    |                              6. Nonce not reused              |
    |                                                               |
    |<-- ProofAttestation (82 bytes) -------------------------------|
    |                              |                                  |
    |                              |   Witness record appended       |

2.5 What Requires a Proof

Operation	Proof Required	Minimum Tier
`region_map`	Yes (capability proof)	N/A -- capability check only
`vector_put_proved`	Yes	Per-store ProofPolicy
`graph_apply_proved`	Yes	Per-store ProofPolicy
`rvf_mount`	Yes	Deep (signature verification)
`vector_get`	No	N/A
`queue_send` / `queue_recv`	No	N/A (capability-gated only)
`task_spawn`	No	N/A (capability-gated only)
`cap_grant`	No	N/A (GRANT right required)
`timer_wait`	No	N/A
`attest_emit`	Yes (proof consumed)	Per-operation
`sensor_subscribe`	No	N/A (capability-gated only)

2.6 Proof-Gated Device Mapping (Bare-Metal Extension)

On bare metal, device MMIO regions are mapped into a task's address space through region_map with a RegionPolicy::DeviceMmio variant (new for Phase B). This mapping requires:

A capability with READ and/or WRITE rights on the device object
A ProofToken with tier >= Standard proving the task's intent matches the device mapping
The device must not already be mapped to another partition (exclusive lease)

/// Extended region policy for bare-metal device access.
/// New for Phase B -- extends the existing RegionPolicy enum.
pub enum RegionPolicy {
    Immutable,
    AppendOnly { max_size: usize },
    Slab { slot_size: usize, slot_count: usize },
    /// Device MMIO region. Mapped as uncacheable, device memory.
    /// Requires proof-gated capability for mapping.
    DeviceMmio {
        phys_base: u64,    // Physical base address of MMIO range
        size: usize,       // Size in bytes
        device_id: u32,    // Kernel-assigned device identifier
    },
}

2.7 Proof-Gated Migration

Partition migration (moving a task and its state from one physical node to another in a RVM mesh) requires a Deep-tier proof containing:

Coherence certificate showing the partition's state is consistent
Source and destination node attestation (both nodes are trusted)
Hash of the serialized partition state

Without this proof, the kernel refuses to serialize or deserialize partition state.

/// Trait for migration authorization. Implemented by the migration subsystem.
pub trait MigrationAuthority {
    /// Verify that migration of this partition is authorized.
    /// Returns the serialized partition state only if proof validates.
    fn authorize_migration(
        &mut self,
        partition_id: u32,
        destination_attestation: &ProofAttestation,
        proof: &ProofToken,
    ) -> Result<SerializedPartition, KernelError>;

    /// Accept an incoming migrated partition.
    /// Verifies the source attestation and proof before instantiating.
    fn accept_migration(
        &mut self,
        serialized: &SerializedPartition,
        source_attestation: &ProofAttestation,
        proof: &ProofToken,
    ) -> Result<PartitionHandle, KernelError>;
}

2.8 Proof-Gated Partition Merge/Split

Graph partitions (mincut boundaries in the vecgraph store) can only be merged or split with a Deep-tier proof that includes the coherence impact analysis:

pub enum GraphMutationKind {
    AddNode { /* ... */ },
    RemoveNode { /* ... */ },
    AddEdge { /* ... */ },
    RemoveEdge { /* ... */ },
    UpdateWeight { /* ... */ },
    /// Merge two partitions. Requires Deep-tier proof with coherence cert.
    MergePartitions {
        source_partition: u32,
        target_partition: u32,
    },
    /// Split a partition at a mincut boundary. Requires Deep-tier proof.
    SplitPartition {
        partition: u32,
        cut_specification: MinCutSpec,
    },
}

3. Witness-Native Audit

3.1 Design Principle

Every privileged action in RVM emits a witness record to the kernel's append-only witness log. "Privileged action" means any syscall that mutates kernel state: vector writes, graph mutations, RVF mounts, task spawns, capability grants, region mappings.

3.2 Witness Record Format

Each record is 96 bytes, compact enough to sustain thousands of records per second on embedded hardware without blocking the syscall path:

/// 96-byte witness record.
/// File: crates/ruvix/crates/nucleus/src/witness_log.rs
#[repr(C)]
pub struct WitnessRecord {
    pub sequence: u64,            // Monotonically increasing (8 bytes)
    pub kind: WitnessRecordKind,  // Boot, Mount, VectorMutation, etc. (1 byte)
    pub timestamp_ns: u64,        // Nanoseconds since boot (8 bytes)
    pub mutation_hash: [u8; 32],  // SHA-256 of the mutation data (32 bytes)
    pub attestation_hash: [u8; 32], // Hash of the proof attestation (32 bytes)
    pub resource_id: u64,         // Object identifier (8 bytes)
    // 7 bytes padding to 96
}

Record kinds:

Kind	Value	Emitted By
`Boot`	0	`kernel_entry` at boot completion
`Mount`	1	`rvf_mount` syscall
`VectorMutation`	2	`vector_put_proved` syscall
`GraphMutation`	3	`graph_apply_proved` syscall
`Checkpoint`	4	Periodic state snapshots
`ReplayComplete`	5	After replaying from checkpoint
`CapGrant`	6	`cap_grant` syscall (proposed extension)
`CapRevoke`	7	Capability revocation (proposed extension)
`TaskSpawn`	8	`task_spawn` syscall (proposed extension)
`DeviceMap`	9	Device MMIO mapping (proposed extension)

3.3 Tamper Evidence

The witness log must be tamper-evident. The current Phase A implementation uses simple append-only semantics with FNV-1a hashing. For bare-metal, the following extensions are required:

Hash chaining: Each witness record includes the hash of the previous record, forming a Merkle-like chain. Tampering with any record invalidates all subsequent records.

/// Extended witness record with hash chaining for tamper evidence.
pub struct ChainedWitnessRecord {
    /// The base witness record (96 bytes).
    pub record: WitnessRecord,
    /// SHA-256 hash of the previous record's serialized bytes.
    /// For the first record (sequence 0), this is all zeros.
    pub prev_hash: [u8; 32],
    /// SHA-256(serialize(record) || prev_hash). Computed by the kernel.
    pub chain_hash: [u8; 32],
}

TEE signing (when available): On hardware with TrustZone (Raspberry Pi 4/5), witness records can be signed by the Secure World using a device-unique key. This means even a compromised kernel (EL1) cannot forge witness entries:

/// Trait for hardware-backed witness signing.
pub trait WitnessSigner {
    /// Sign a chained witness record using hardware-bound key.
    /// On AArch64 with TrustZone, this issues an SMC to Secure World.
    /// On platforms without TEE, returns None (software chain only).
    fn sign_witness(&self, record: &ChainedWitnessRecord) -> Option<[u8; 64]>;

    /// Verify a signed witness record.
    fn verify_witness_signature(
        &self,
        record: &ChainedWitnessRecord,
        signature: &[u8; 64],
    ) -> bool;
}

3.4 Replayability and Forensics

The witness log, combined with periodic checkpoints, enables deterministic replay:

Checkpoint: The kernel serializes all vector stores, graph stores, capability tables, and scheduler state to an immutable region. A WitnessRecordKind::Checkpoint record captures the state hash and the witness sequence number at checkpoint time.
Replay: Starting from a checkpoint, the kernel replays all witness records in sequence order, re-applying each mutation. Because mutations are deterministic (same proof token + same state = same result), the final state is identical.
Forensic query: External tools can load the witness log and answer questions like:
- "Which task mutated vector store X between timestamps T1 and T2?"
- "What was the coherence score before and after each graph mutation?"
- "Has the hash chain been broken?" (indicates tampering)

3.5 Witness-Enabled Rollback/Recovery

If a coherence violation is detected (coherence score drops below the configured threshold), the kernel can:

Stop accepting new mutations to the affected partition
Find the most recent checkpoint where coherence was above threshold
Replay witnesses from that checkpoint, skipping the offending mutation
Resume normal operation from the corrected state

This requires the offending mutation to be identified by its witness record (the mutation_hash and attestation_hash pinpoint exactly which operation caused the violation).

4. Isolation Model

4.1 Partition Isolation Guarantees

RVM partitions are the unit of isolation. Each partition consists of:

One or more tasks sharing a capability namespace
A set of regions (memory objects) accessible only through capabilities held by those tasks
Queue endpoints for controlled inter-partition communication

Isolation guarantee: A partition cannot access any memory, device, or kernel object for which it does not hold a valid capability. This is enforced at two levels:

Software: The capability table lookup in every syscall rejects invalid or stale handles
Hardware: MMU page tables enforce that each partition's regions are mapped only in that partition's address space, with no overlapping physical pages between partitions (except explicitly shared immutable regions)

4.2 MMU-Enforced Memory Isolation (Bare Metal)

On bare metal, RVM directly controls the AArch64 MMU. Each partition gets its own translation tables loaded via TTBR0_EL1 on context switch:

/// Per-partition page table management.
/// Kernel mappings use TTBR1_EL1 (shared across all partitions).
/// Partition mappings use TTBR0_EL1 (swapped on context switch).
pub trait PartitionAddressSpace {
    /// Create a new empty address space for a partition.
    fn create() -> Result<Self, KernelError> where Self: Sized;

    /// Map a region into this partition's address space.
    /// Physical pages are allocated from the kernel's physical allocator.
    /// Page table entries enforce the region's policy:
    ///   Immutable  -> PTE_USER | PTE_RO | PTE_CACHEABLE
    ///   AppendOnly -> PTE_KERNEL_RW | PTE_CACHEABLE (user writes via syscall)
    ///   Slab       -> PTE_KERNEL_RW | PTE_CACHEABLE (user writes via syscall)
    ///   DeviceMmio -> PTE_USER | PTE_DEVICE | PTE_nG (non-global, per-partition)
    fn map_region(
        &mut self,
        region: &RegionDescriptor,
        phys_pages: &[PhysFrame],
    ) -> Result<VirtAddr, KernelError>;

    /// Unmap a region, invalidating all TLB entries for those pages.
    fn unmap_region(&mut self, virt_addr: VirtAddr, size: usize) -> Result<(), KernelError>;

    /// Activate this address space (write to TTBR0_EL1 + TLBI).
    unsafe fn activate(&self);
}

Critical invariant: The kernel NEVER maps the same physical page as writable in two different partitions' address spaces simultaneously. Immutable regions may be shared read-only (content-addressable deduplication is safe for immutable data).

4.3 EL1/EL0 Separation

EL1 (kernel mode): All kernel code, syscall handlers, interrupt handlers, scheduler, capability table, proof verifier, witness log
EL0 (user mode): All RVF components, WASM runtimes, AgentDB, all application code

Syscalls transition EL0 -> EL1 via the SVC instruction. The exception handler in EL1 validates the capability before dispatching to the syscall implementation. Return to EL0 uses ERET.

No EL0 code can:

Read or write kernel memory (TTBR1_EL1 mappings are PTE_KERNEL_RW)
Modify page tables (page table pages are not mapped in EL0)
Disable interrupts (only EL1 can mask IRQs via DAIF)
Access device MMIO unless explicitly mapped through a capability

4.4 Side-Channel Mitigation

4.4.1 Spectre v1 (Bounds Check Bypass)

All array accesses in the kernel use bounds-checked indexing (Rust's default)
The CapabilityTable uses get() returning Option<&T>, never unchecked indexing
Critical paths include an lfence / csdb barrier after bounds checks on the syscall dispatch path

/// Spectre-safe capability table lookup.
/// The index is bounds-checked, and a speculation barrier follows.
pub fn lookup(&self, handle: CapHandle) -> Option<&Capability> {
    let idx = handle.raw().id as usize;
    if idx >= self.entries.len() {
        return None;
    }
    // AArch64: CSDB (Consumption of Speculative Data Barrier)
    // Prevents speculative use of the result before bounds check resolves
    #[cfg(target_arch = "aarch64")]
    unsafe { core::arch::asm!("csdb"); }
    self.entries.get(idx).and_then(|e| e.as_ref())
}

4.4.2 Spectre v2 (Branch Target Injection)

AArch64: Enable branch prediction barriers via SCTLR_EL1 configuration
On context switch between partitions: flush branch predictor state (IC IALLU + TLBI VMALLE1IS + DSB ISH + ISB)
Kernel compiled with -Zbranch-protection=bti (Branch Target Identification)

4.4.3 Meltdown (Rogue Data Cache Load)

AArch64 is not vulnerable to Meltdown when Privileged Access Never (PAN) is enabled
RVM enables PAN via SCTLR_EL1.PAN = 1 at boot
Kernel accesses user memory only through explicit copy routines that temporarily disable PAN

4.4.4 Microarchitectural Data Sampling (MDS)

On x86_64 (secondary target): VERW-based buffer clearing on every kernel exit
On AArch64 (primary target): Not vulnerable to known MDS variants
Defense in depth: all sensitive kernel data structures are allocated in dedicated slab regions that are never shared across partitions

4.5 Time Isolation

Timing side channels are mitigated through several mechanisms:

Fixed-time capability lookup: The capability table lookup path executes in constant time regardless of whether the capability is found or not (compare all entries, select result at the end)
Scheduler noise injection: The scheduler adds a small random jitter (0-10 us) to context switch timing to prevent a partition from inferring another partition's behavior from scheduling patterns
Timer virtualization: Each partition sees a virtual timer (CNTVCT_EL0) that advances at the configured rate but does not leak information about other partitions' execution. The kernel programs CNTV_CVAL_EL0 per-partition.
Constant-time proof verification: The ProofVerifier::verify() path is written to avoid early returns that would leak information about which check failed. All six checks execute, and only the final result is returned.

/// Constant-time proof verification to prevent timing side channels.
/// All checks execute regardless of early failures.
pub fn verify_constant_time(
    &mut self,
    proof: &ProofToken,
    expected_hash: &[u8; 32],
    current_time_ns: u64,
    capability: &Capability,
) -> Result<ProofAttestation, KernelError> {
    let mut valid = true;

    // All checks execute -- no early return
    valid &= capability.has_rights(CapRights::PROVE);
    valid &= proof.mutation_hash == *expected_hash;
    valid &= self.policy.tier_satisfies(proof.tier);
    valid &= !proof.is_expired(current_time_ns);
    valid &= (proof.valid_until_ns.saturating_sub(current_time_ns))
        <= self.policy.max_validity_window_ns;
    let nonce_ok = self.nonce_tracker.check_and_mark(proof.nonce);
    valid &= nonce_ok;

    if valid {
        Ok(self.create_attestation(proof, current_time_ns))
    } else {
        // Roll back nonce if overall verification failed
        if nonce_ok {
            self.nonce_tracker.unmark(proof.nonce);
        }
        Err(KernelError::ProofRejected)
    }
}

4.6 Coherence Domain Isolation

Each vector store and graph store belongs to a coherence domain. Coherence domains provide an additional layer of isolation at the semantic level:

Mutations within a coherence domain are evaluated against that domain's coherence config
Cross-domain references require explicit capability-mediated linking
Coherence violations in one domain do not affect other domains
Each domain has its own proof policy, nonce tracker, and witness region

/// Coherence domain configuration.
pub struct CoherenceDomain {
    pub domain_id: u32,
    pub vector_stores: &[VectorStoreHandle],
    pub graph_stores: &[GraphHandle],
    pub proof_policy: ProofPolicy,
    pub min_coherence_score: u16,  // 0-10000 (0.00-1.00)
    pub isolation_level: DomainIsolationLevel,
}

pub enum DomainIsolationLevel {
    /// Stores in this domain share no physical pages with other domains.
    Full,
    /// Read-only immutable data may be shared across domains.
    SharedImmutable,
}

5. Device Security

5.1 Lease-Based Device Access

Devices are not permanently assigned to partitions. Instead, RVM uses time-bounded, revocable leases:

/// A time-bounded, revocable lease on a device.
pub struct DeviceLease {
    /// Capability handle authorizing device access.
    pub cap: CapHandle,
    /// Device identifier (kernel-assigned, not hardware address).
    pub device_id: DeviceId,
    /// Lease start time (nanoseconds since boot).
    pub granted_at_ns: u64,
    /// Lease expiry (0 = no expiry, must be explicitly revoked).
    pub expires_at_ns: u64,
    /// Rights on the device (READ for sensors, WRITE for actuators, both for DMA).
    pub rights: CapRights,
    /// The MMIO region mapped for this lease (None if not yet mapped).
    pub mmio_region: Option<RegionHandle>,
}

/// Trait for the device lease manager.
pub trait DeviceLeaseManager {
    /// Request a lease on a device. Requires a capability with appropriate rights.
    /// The lease is time-bounded; after expiry, the mapping is automatically torn down.
    fn request_lease(
        &mut self,
        device_id: DeviceId,
        cap: CapHandle,
        duration_ns: u64,
    ) -> Result<DeviceLease, KernelError>;

    /// Renew an existing lease. Must be called before expiry.
    fn renew_lease(
        &mut self,
        lease: &mut DeviceLease,
        additional_ns: u64,
    ) -> Result<(), KernelError>;

    /// Revoke a lease immediately. Tears down MMIO mapping and flushes DMA.
    fn revoke_lease(&mut self, lease: DeviceLease) -> Result<(), KernelError>;

    /// Check if a lease is still valid.
    fn is_lease_valid(&self, lease: &DeviceLease, current_time_ns: u64) -> bool;
}

Lease lifecycle:

Partition requests a lease via request_lease() with a capability
Kernel checks the capability has appropriate rights on the device object
Kernel maps the device's MMIO region into the partition's address space as RegionPolicy::DeviceMmio with PTE_DEVICE (uncacheable) flags
Kernel programs an expiry timer; when it fires, the lease is automatically torn down
On teardown: MMIO pages are unmapped, TLB is flushed, DMA channels are reset

5.2 DMA Isolation

DMA is the most dangerous hardware capability because DMA engines can read/write arbitrary physical memory. RVM uses a layered defense:

5.2.1 With IOMMU (Preferred)

On platforms with an IOMMU (ARM SMMU, Intel VT-d), the kernel programs the IOMMU's page tables to restrict each device's DMA to only the physical pages belonging to the leaseholder's regions:

/// IOMMU-based DMA isolation.
pub trait IommuController {
    /// Create a DMA mapping for a device, restricting it to the given physical pages.
    /// The device can only DMA to/from these pages and no others.
    fn map_device_dma(
        &mut self,
        device_id: DeviceId,
        allowed_pages: &[PhysFrame],
        direction: DmaDirection,
    ) -> Result<DmaMapping, KernelError>;

    /// Remove a DMA mapping, preventing the device from accessing those pages.
    fn unmap_device_dma(
        &mut self,
        device_id: DeviceId,
        mapping: DmaMapping,
    ) -> Result<(), KernelError>;

    /// Invalidate all DMA mappings for a device (called on lease revocation).
    fn invalidate_device(&mut self, device_id: DeviceId) -> Result<(), KernelError>;
}

5.2.2 Without IOMMU (Bounce Buffers)

On platforms without an IOMMU (early Raspberry Pi models), DMA isolation uses bounce buffers:

The kernel allocates a dedicated physical region for DMA operations
Before a device-to-memory transfer, the kernel prepares the bounce buffer
After transfer completion, the kernel copies data from the bounce buffer to the partition's region (after validation)
The device never has direct access to partition memory

This is slower (extra copy) but maintains the isolation invariant. The crates/ruvix/crates/dma/ crate provides the abstraction layer.

/// Bounce buffer DMA isolation (fallback when no IOMMU).
pub struct BounceBufferDma {
    /// Kernel-owned physical region for DMA bounce.
    bounce_region: PhysRegion,
    /// Maximum bounce buffer size.
    max_bounce_size: usize,
}

impl BounceBufferDma {
    /// Execute a DMA transfer through the bounce buffer.
    /// The device only ever sees the bounce buffer's physical address.
    pub fn transfer(
        &mut self,
        device: DeviceId,
        partition_region: &RegionHandle,
        offset: usize,
        length: usize,
        direction: DmaDirection,
    ) -> Result<(), KernelError> {
        if length > self.max_bounce_size {
            return Err(KernelError::LimitExceeded);
        }
        match direction {
            DmaDirection::MemToDevice => {
                // Copy from partition region to bounce buffer
                self.copy_to_bounce(partition_region, offset, length)?;
                // Program DMA from bounce buffer to device
                self.start_dma(device, direction)?;
            }
            DmaDirection::DeviceToMem => {
                // Program DMA from device to bounce buffer
                self.start_dma(device, direction)?;
                // Wait for completion
                self.wait_completion()?;
                // Copy from bounce buffer to partition region (validated)
                self.copy_from_bounce(partition_region, offset, length)?;
            }
            DmaDirection::MemToMem => {
                return Err(KernelError::InvalidArgument);
            }
        }
        Ok(())
    }
}

5.3 Interrupt Routing Security

Each interrupt line is a kernel object accessed through capabilities:

Interrupt capability: A partition must hold a capability with READ right on an interrupt object to receive interrupts from that line
Interrupt-to-queue routing: Interrupts are delivered as messages on a queue (via sensor_subscribe), not as direct callbacks. This maintains the queue-based IPC model and prevents a malicious interrupt handler from running in kernel context.
Priority ceiling: Interrupt processing tasks have bounded priority to prevent a flood of interrupts from starving other partitions
Rate limiting: The kernel enforces a maximum interrupt rate per device. Interrupts exceeding the rate are queued and delivered at the rate limit.

/// Interrupt routing configuration.
pub struct InterruptRoute {
    /// Hardware interrupt number (e.g., GIC SPI number).
    pub irq_number: u32,
    /// Capability authorizing access to this interrupt.
    pub cap: CapHandle,
    /// Queue where interrupt messages are delivered.
    pub target_queue: QueueHandle,
    /// Maximum interrupt rate (interrupts per second). 0 = unlimited.
    pub rate_limit_hz: u32,
    /// Priority ceiling for the interrupt processing task.
    pub priority_ceiling: TaskPriority,
}

5.4 Device Capability Model

Every device in the system is represented as a kernel object with its own capability:

pub enum ObjectType {
    Task,
    Region,
    Queue,
    Timer,
    VectorStore,
    GraphStore,
    RvfMount,
    Sensor,
    /// A hardware device (UART, DMA controller, GPU, NIC, etc.)
    Device,
    /// An interrupt line (GIC SPI/PPI/SGI)
    Interrupt,
}

The root task (first task created at boot) receives capabilities to all devices discovered during boot (from DTB parsing). It then distributes device capabilities to appropriate partitions according to the RVF manifest's resource policy.

6. Boot Security

6.1 Secure Boot Chain

RVM implements a four-stage secure boot chain:

Stage 0: Hardware ROM / eFUSE
  |  Root of trust: device-unique key burned in silicon
  |  Measures and verifies Stage 1
  v
Stage 1: RVM Boot Stub (ruvix-aarch64/src/boot.S + boot.rs)
  |  Minimal assembly: set up stack, clear BSS, jump to Rust
  |  Rust entry: initialize MMU, verify Stage 2 signature
  |  Verifies using trusted keys embedded in Stage 1 image
  v
Stage 2: RVM Kernel (ruvix-nucleus)
  |  Full kernel initialization: cap table, proof engine, scheduler
  |  Verifies RVF package signature (ML-DSA-65 or Ed25519)
  |  SEC-001: Signature failure -> PANIC (no fallback)
  v
Stage 3: Boot RVF Package
  |  Contains all initial RVF components
  |  Loaded into immutable regions
  |  Queue wiring and capability distribution per manifest
  v
Stage 4: Application RVF Components
     Runtime-mounted RVF packages, each signature-verified

6.2 Signature Verification

The existing verify_boot_signature_or_panic() in crates/ruvix/crates/cap/src/security.rs implements SEC-001: signature failure panics the system with no fallback path. The security feature flag disable-boot-verify is blocked at compile time for release builds:

// CVE-001 FIX: Prevent disable-boot-verify in release builds
#[cfg(all(feature = "disable-boot-verify", not(debug_assertions)))]
compile_error!(
    "SECURITY ERROR [CVE-001]: The 'disable-boot-verify' feature cannot be used \
     in release builds."
);

Supported algorithms:

Algorithm	Status	Use Case
Ed25519	Implemented	Primary boot signature
ECDSA P-256	Supported	Legacy compatibility
RSA-PSS 2048	Supported	Legacy compatibility
ML-DSA-65	Planned	Post-quantum RVF signatures

6.3 Measured Boot with Witness Log

Every boot stage emits a witness record:

Stage 1 measurement: Hash of the kernel image, stored as WitnessRecordKind::Boot
Stage 2 initialization: Each subsystem (cap manager, proof engine, scheduler) records its initialized state
Stage 3 RVF mount: Each mounted RVF package is recorded as WitnessRecordKind::Mount with the package hash and attestation

The boot witness log forms the root of the system's audit trail. All subsequent witness records chain from it.

6.4 Remote Attestation for Edge Deployment

For edge deployments where RVM nodes must prove their integrity to a remote verifier:

/// Remote attestation protocol.
pub trait RemoteAttestor {
    /// Generate an attestation report that a remote verifier can check.
    /// The report includes:
    ///   - Platform identity (device-unique key signed measurement)
    ///   - Boot chain hashes (all four stages)
    ///   - Current witness log root hash
    ///   - Loaded RVF component inventory
    ///   - Nonce from the challenger (prevents replay)
    fn generate_attestation_report(
        &self,
        challenge_nonce: &[u8; 32],
    ) -> Result<AttestationReport, KernelError>;

    /// Verify an attestation report from another node.
    /// Used in mesh deployments where nodes must mutually attest.
    fn verify_attestation_report(
        &self,
        report: &AttestationReport,
        expected_measurements: &MeasurementPolicy,
    ) -> Result<AttestationVerdict, KernelError>;
}

pub struct AttestationReport {
    /// Platform identifier (public key of device).
    pub platform_id: [u8; 32],
    /// Boot chain measurement (hash of all four stages).
    pub boot_measurement: [u8; 32],
    /// Current witness log chain hash (latest chain_hash).
    pub witness_root: [u8; 32],
    /// List of loaded RVF component hashes.
    pub component_inventory: Vec<[u8; 32]>,
    /// Challenge nonce from the verifier.
    pub nonce: [u8; 32],
    /// Signature over all of the above using the platform key.
    pub signature: [u8; 64],
}

6.5 Code Signing for Partition Images

All RVF packages must be signed before they can be mounted. The signature is verified by the kernel's boot loader (crates/ruvix/crates/boot/src/signature.rs):

The RVF manifest specifies the signing key ID and algorithm
The kernel maintains a TrustedKeyStore (up to 8 keys, expirable)
Keys can be rotated by mounting a key-update RVF signed by an existing trusted key
The signing key hierarchy supports a two-level PKI:
- Root key: Burned in eFUSE or compiled into Stage 1 (immutable)
- Signing keys: Derived from root key, time-bounded, rotatable

7. Agent-Specific Security

7.1 WASM Sandbox Security Within Partitions

RVF components execute as WASM modules within partitions. The WASM sandbox provides a second layer of isolation inside the capability boundary:

                Partition A (capability-isolated)
   +--------------------------------------------------+
   |  +-----------+  +-----------+  +-----------+     |
   |  | WASM      |  | WASM      |  | WASM      |     |
   |  | Module 1  |  | Module 2  |  | Module 3  |     |
   |  | (Agent)   |  | (Agent)   |  | (Service) |     |
   |  +-----------+  +-----------+  +-----------+     |
   |        |              |              |            |
   |        +--- WASM Host Interface (WASI-like) ----+|
   |                       |                           |
   |  +--------------------------------------------+  |
   |  | RVM Syscall Shim                         |  |
   |  | Maps WASM imports -> cap-gated syscalls     |  |
   |  +--------------------------------------------+  |
   +--------------------------------------------------+
   |  Kernel capability boundary (MMU-enforced)        |
   +--------------------------------------------------+

WASM security properties:

Linear memory isolation: Each WASM module has its own linear memory; it cannot access memory of other modules or the host
Import-only system access: WASM modules can only call functions explicitly imported from the host. The host provides a minimal syscall shim that maps WASM calls to capability-gated RVM syscalls
Resource limits: Each WASM module has configured limits on memory size, stack depth, execution fuel (instruction count), and table size
No raw pointer access: WASM's type system prevents arbitrary memory access. Pointers are offsets into the linear memory, bounds-checked by the runtime

/// WASM module resource limits.
pub struct WasmResourceLimits {
    /// Maximum linear memory size in pages (64 KiB per page).
    pub max_memory_pages: u32,
    /// Maximum call stack depth.
    pub max_stack_depth: u32,
    /// Maximum execution fuel (instructions). 0 = unlimited.
    pub max_fuel: u64,
    /// Maximum number of table entries.
    pub max_table_elements: u32,
    /// Maximum number of globals.
    pub max_globals: u32,
}

/// The host interface exposed to WASM modules.
/// Every function here validates capabilities before performing the operation.
pub trait WasmHostInterface {
    fn vector_get(&self, store: u32, key: u64) -> Result<WasmVectorRef, WasmTrap>;
    fn vector_put(&self, store: u32, key: u64, data: &[f32], proof: WasmProofRef)
        -> Result<(), WasmTrap>;
    fn queue_send(&self, queue: u32, msg: &[u8], priority: u8) -> Result<(), WasmTrap>;
    fn queue_recv(&self, queue: u32, buf: &mut [u8], timeout_ms: u64)
        -> Result<usize, WasmTrap>;
    fn log(&self, level: u8, message: &str);
}

7.2 Inter-Agent Communication Security

Agents communicate exclusively through typed queues. Security properties of queue-based IPC:

Capability-gated: Both sender and receiver must hold capabilities on the queue
Typed messages: Queue schema (WIT types) is validated at send time. Malformed messages are rejected before reaching the receiver
Zero-copy safety: Zero-copy messages use descriptors pointing into immutable or append-only regions. The kernel rejects descriptors pointing into slab regions (TOCTOU mitigation -- SEC-004)
No covert channels: Queue capacity is bounded and visible. The kernel does not leak information about queue occupancy to tasks that do not hold the queue's capability
Message ordering: Messages within a priority level are delivered in FIFO order. Cross-priority ordering is by priority (higher first). This is deterministic and does not leak information.

7.3 Agent Identity and Authentication

Agents do not have traditional identities (no UIDs, no usernames). Instead, agent identity is established through the capability chain:

Boot-time identity: An agent's initial capabilities are assigned by the RVF manifest. The manifest is signed, so the identity is rooted in the code signer.
Runtime identity: An agent can prove its identity by demonstrating possession of specific capabilities. A "who are you?" query is answered by "I hold capability X with badge Y", and the verifier checks that badge against its expected value.
Attestation identity: An agent can emit an attest_emit record that binds its capability badge to a witness entry. External verifiers can trace this back through the witness chain to the boot attestation.

/// Agent identity is derived from capability badges, not global names.
pub struct AgentIdentity {
    /// The agent's task handle (ephemeral, changes across reboots).
    pub task: TaskHandle,
    /// Badge on the agent's primary capability (stable across reboots if
    /// assigned by the RVF manifest).
    pub primary_badge: u64,
    /// RVF component ID that spawned this agent.
    pub component_id: RvfComponentId,
    /// Hash of the WASM module binary (code identity).
    pub code_hash: [u8; 32],
}

7.4 Resource Limits and DoS Prevention

Each partition and each WASM module within a partition has enforceable resource limits:

/// Per-partition resource quota.
pub struct PartitionQuota {
    /// Maximum physical memory (bytes).
    pub max_memory_bytes: usize,
    /// Maximum number of tasks.
    pub max_tasks: u32,
    /// Maximum number of capabilities.
    pub max_capabilities: u32,
    /// Maximum number of queue endpoints.
    pub max_queues: u32,
    /// Maximum number of region mappings.
    pub max_regions: u32,
    /// CPU time budget per scheduling epoch (microseconds). 0 = unlimited.
    pub cpu_budget_us: u64,
    /// Maximum interrupt rate across all devices (per second).
    pub max_interrupt_rate_hz: u32,
    /// Maximum witness log entries per epoch (prevents log flooding).
    pub max_witness_entries_per_epoch: u32,
}

/// Enforcement mechanism.
pub trait QuotaEnforcer {
    /// Check if an allocation would exceed the partition's quota.
    fn check_allocation(
        &self,
        partition: PartitionHandle,
        resource: ResourceKind,
        amount: usize,
    ) -> Result<(), KernelError>;

    /// Record a resource allocation against the quota.
    fn record_allocation(
        &mut self,
        partition: PartitionHandle,
        resource: ResourceKind,
        amount: usize,
    ) -> Result<(), KernelError>;

    /// Release a resource allocation.
    fn release_allocation(
        &mut self,
        partition: PartitionHandle,
        resource: ResourceKind,
        amount: usize,
    );
}

pub enum ResourceKind {
    Memory,
    Tasks,
    Capabilities,
    Queues,
    Regions,
    CpuTime,
    WitnessEntries,
}

DoS prevention mechanisms:

Attack Vector	Defense
Memory exhaustion	Per-partition memory quota, `region_map` returns `OutOfMemory`
CPU starvation	Per-partition CPU budget, preemptive scheduler with budget enforcement
Queue flooding	Bounded queue capacity, backpressure on `queue_send`
Interrupt storm	Per-device rate limiting, priority ceiling
Capability table exhaustion	Per-partition cap table limit (1024 max)
Witness log flooding	Per-partition witness entry budget per epoch
Fork bomb	`task_spawn` checks per-partition task count against quota
Proof spam	Proof cache limited to 64 entries, nonce tracker bounded

8. Threat Model

8.1 What RVM Defends Against

Attacks from Partitions Against Other Partitions

Attack	Defense
Read another partition's memory	MMU page tables (TTBR0 per-partition)
Write another partition's memory	MMU + capability-gated region mapping
Forge a capability	Capabilities are kernel-resident, handles are opaque + epoch-checked
Escalate capability rights	`derive()` enforces monotonic attenuation
Replay a proof token	Single-use nonces in ProofVerifier
Use an expired proof	Time-bounded validity check
Tamper with witness log	Append-only region + hash chaining + optional TEE signing
Spoof another agent's identity	Identity is derived from capability badge, not forgeable name
Starve other partitions of CPU	Per-partition CPU budget + preemptive scheduling
Exhaust system memory	Per-partition memory quota
Flood queues	Bounded capacity + backpressure
DMA attack	IOMMU page tables or bounce buffers
Interrupt storm DoS	Rate limiting + priority ceiling

Attacks from Partitions Against the Kernel

Attack	Defense
Corrupt kernel memory	EL1/EL0 separation, PAN enabled
Modify page tables	Page table pages not mapped in EL0
Disable interrupts	DAIF masking only in EL1
Exploit kernel vulnerability	Rust's memory safety, `#![forbid(unsafe_code)]` on most crates
Spectre/Meltdown	CSDB barriers, BTI, PAN, branch predictor flush
Supply crafted syscall args	All syscall args validated, bounds-checked
Time a kernel operation to leak info	Constant-time critical paths, timer virtualization

Boot-Time Attacks

Attack	Defense
Boot unsigned kernel	SEC-001: panic on signature failure
Tamper with kernel image	Boot measurement chain, hash verification
Downgrade attack	Algorithm allowlist in TrustedKeyStore
Replay old signed image	Boot nonce from hardware RNG, version checking
Compromise signing key	Key rotation via signed key-update RVF

Network/Remote Attacks (Multi-Node Mesh)

Attack	Defense
Impersonate a node	Mutual attestation with device-unique keys
Migrate malicious partition	Deep-tier proof with source/destination attestation
Replay migration	Nonce in migration proof
Man-in-the-middle on migration	Encrypted channel + attestation binding

8.2 What Is Out of Scope for v1

The following are explicitly NOT defended against in v1. They are acknowledged risks that will be addressed in future iterations:

Physical access attacks: An attacker with physical access to the hardware (JTAG, bus probing, cold boot attacks) is out of scope. Hardware security modules (HSMs) and tamper-resistant packaging are future work.
Rowhammer / DRAM disturbance: RVM does not implement guard rows or ECC requirements in v1. Edge hardware with ECC RAM is recommended but not enforced.
Supply chain attacks on the compiler: RVM trusts the Rust compiler. Reproducible builds are recommended but not verified in v1.
Formal verification of the kernel: Unlike seL4, RVM is not formally verified in v1. The kernel is written in safe Rust (with minimal unsafe in the HAL layer), but there is no machine-checked proof of correctness.
Covert channels via power consumption: Power analysis side channels are out of scope. RVM does not implement constant-power execution.
GPU/accelerator isolation: v1 targets CPU-only execution. GPU and accelerator DMA isolation is future work.
Encrypted memory (SEV-SNP/TDX): v1 does not implement memory encryption. The hypervisor trusts the physical memory bus.
Multi-tenant adversarial scheduling: The scheduler provides time isolation through budgets and jitter, but does not defend against a sophisticated adversary performing cache-timing analysis across many scheduling quanta.

8.3 Trust Boundaries

+================================================================+
|                    UNTRUSTED                                    |
|  +----------------------------------------------------------+  |
|  |  RVF Components (WASM agents, services, drivers)          |  |
|  |  - May be malicious                                       |  |
|  |  - May exploit any vulnerability                          |  |
|  |  - Constrained by: capabilities, quotas, WASM sandbox     |  |
|  +----------------------------------------------------------+  |
|                          | syscall                              |
|                          v                                      |
|  +----------------------------------------------------------+  |
|  |  TRUSTED: RVM Kernel (ruvix-nucleus)                    |  |
|  |  - Capability manager, proof verifier, scheduler          |  |
|  |  - Witness log, region manager, queue IPC                 |  |
|  |  - Bug here = system compromise                           |  |
|  |  - Minimized: 12 syscalls, ~15K lines Rust                |  |
|  +----------------------------------------------------------+  |
|                          | hardware interface                   |
|                          v                                      |
|  +----------------------------------------------------------+  |
|  |  TRUSTED: Hardware                                        |  |
|  |  - MMU, GIC, IOMMU, timers                               |  |
|  |  - Assumed correct (no hardware bugs modeled in v1)       |  |
|  +----------------------------------------------------------+  |
|                          | optional                              |
|                          v                                      |
|  +----------------------------------------------------------+  |
|  |  TRUSTED: TrustZone Secure World (when available)         |  |
|  |  - Device-unique key storage                              |  |
|  |  - Witness signing                                        |  |
|  |  - Boot measurement anchoring                             |  |
|  +----------------------------------------------------------+  |
+================================================================+

Key trust assumptions:

The kernel is correct (not formally verified, but written in safe Rust)
The hardware functions as documented (MMU enforces page permissions, IOMMU restricts DMA)
The boot signing key has not been compromised
The Rust compiler generates correct code
The WASM runtime (Wasmtime or WAMR) correctly enforces sandboxing

8.4 Comparison to KVM and seL4 Threat Models

Property	KVM	seL4	RVM
TCB size	~2M lines (Linux kernel)	~8.7K lines (C)	~15K lines (Rust)
Formal verification	No	Yes (full functional correctness)	No (safe Rust, not verified)
Memory safety	C (manual)	C (verified)	Rust (compiler-enforced)
Capability model	No (uses DAC/MAC)	Yes (unforgeable tokens)	Yes (seL4-inspired)
Proof-gated mutation	No	No	Yes (unique to RVM)
Witness audit log	No (relies on external logging)	No	Yes (kernel-native)
DMA isolation	VT-d/SMMU	IOMMU-dependent	IOMMU + bounce buffer fallback
Side-channel defense	KPTI, IBRS, MDS mitigations	Limited (depends on platform)	CSDB, BTI, PAN, const-time paths
Agent-native primitives	No	No	Yes (vectors, graphs, coherence)
Hot-code loading	Module loading (large TCB)	No	RVF mount (capability-gated)

Key differentiators:

RVM vs. KVM: RVM has a 100x smaller TCB. KVM inherits the entire Linux kernel as its TCB, including filesystems, networking, drivers, and hundreds of syscalls. RVM has 12 syscalls and no ambient authority. KVM relies on Linux's DAC/MAC; RVM uses capabilities with proof-gated mutation.
RVM vs. seL4: seL4 has formal verification, which RVM does not. However, RVM has proof-gated mutation (no mutation without cryptographic authorization), kernel-native witness logging, and agent-specific primitives (vector stores, graph stores, coherence scoring). seL4 would require these as userspace servers communicating through IPC, reintroducing overhead and expanding the trusted codebase.

9. Security Invariants Summary

The following invariants MUST hold at all times. Violation of any invariant indicates a security breach.

ID	Invariant	Enforcement
SEC-001	Boot signature failure -> PANIC	`verify_boot_signature_or_panic()`, compile-time block on `disable-boot-verify` in release
SEC-002	Proof cache: 64 entries max, 100ms TTL, single-use nonces	`ProofCache` + `NonceTracker`
SEC-003	Capability delegation depth <= 8	`DerivationTree` depth check
SEC-004	Zero-copy IPC descriptors cannot point into Slab regions	Queue descriptor validation
SEC-005	No writable physical page shared between partitions	`PartitionAddressSpace::map_region()` exclusivity check
SEC-006	Capability rights can only decrease through delegation	`Capability::derive()` subset check
SEC-007	Every mutating syscall emits a witness record	Witness log append in syscall path
SEC-008	Device MMIO access requires active lease + capability	`DeviceLeaseManager` check
SEC-009	DMA restricted to leaseholder's physical pages	IOMMU or bounce buffer
SEC-010	Per-partition resource quotas enforced	`QuotaEnforcer` checks before allocation
SEC-011	Witness log is append-only with hash chaining	`ChainedWitnessRecord`, region policy enforcement
SEC-012	No EL0 code can access kernel memory	TTBR1 mappings are PTE_KERNEL_RW, PAN enabled

10. Implementation Roadmap

Phase A (Complete): Linux-Hosted Prototype

Already implemented and tested (760 tests passing):

Capability manager with derivation trees
3-tier proof engine with nonce tracking
Witness log with serialization
12-syscall nucleus with checkpoint/replay

Phase B (In Progress): Bare-Metal AArch64

Security-specific deliverables:

MMU-enforced partition isolation (TTBR0 per-partition)
EL1/EL0 separation for kernel/user code
PAN + BTI + CSDB speculation barriers
Hardware timer virtualization
Device capability model with lease management

Phase C (Planned): SMP + DMA

Security-specific deliverables:

IOMMU programming for DMA isolation
Bounce buffer fallback for platforms without IOMMU
Per-CPU TLB management for partition switches
IPI-based remote TLB invalidation
SpinLock with timing-attack-resistant implementation

Phase D (Planned): Mesh + Attestation

Security-specific deliverables:

Remote attestation protocol
Mutual node authentication
Proof-gated migration
Encrypted partition state transfer
Distributed witness log with cross-node hash chaining

References

ADR-087: RVM Cognition Kernel (accepted, Phase A implemented)
ADR-042: Security RVF -- AIDefence + TEE Hardened Cognitive Container
ADR-047: Proof-gated mutation protocol
ADR-029: RVF canonical binary format
ADR-030: RVF cognitive container / self-booting vector files
seL4 Reference Manual (capability model inspiration)
ARM Architecture Reference Manual (AArch64 exception levels, MMU, PAN, BTI)
NIST SP 800-147B: BIOS Protection Guidelines for Servers (measured boot)
Dennis & Van Horn, "Programming Semantics for Multiprogrammed Computations" (1966) -- original capability concept

56 KiB Raw Permalink Blame History

RVM Security Model

Status

Date

Scope

1. Capability-Based Authority

1.1 Design Philosophy

1.2 Capability Structure

1.3 Capability Delegation and Attenuation

1.4 Capability Revocation

1.5 How This Differs from DAC/MAC

2. Proof-Gated Mutation

2.1 Invariant

2.2 What Constitutes a Valid Proof

2.3 Proof Tiers

2.4 Proof Lifecycle

2.5 What Requires a Proof

2.6 Proof-Gated Device Mapping (Bare-Metal Extension)

2.7 Proof-Gated Migration

2.8 Proof-Gated Partition Merge/Split

3. Witness-Native Audit

3.1 Design Principle

3.2 Witness Record Format

3.3 Tamper Evidence

3.4 Replayability and Forensics

3.5 Witness-Enabled Rollback/Recovery

4. Isolation Model

4.1 Partition Isolation Guarantees

4.2 MMU-Enforced Memory Isolation (Bare Metal)

4.3 EL1/EL0 Separation

4.4 Side-Channel Mitigation

4.4.1 Spectre v1 (Bounds Check Bypass)

4.4.2 Spectre v2 (Branch Target Injection)

4.4.3 Meltdown (Rogue Data Cache Load)

4.4.4 Microarchitectural Data Sampling (MDS)

4.5 Time Isolation

4.6 Coherence Domain Isolation

5. Device Security

5.1 Lease-Based Device Access

5.2 DMA Isolation

5.2.1 With IOMMU (Preferred)

5.2.2 Without IOMMU (Bounce Buffers)

5.3 Interrupt Routing Security

5.4 Device Capability Model

6. Boot Security

6.1 Secure Boot Chain

6.2 Signature Verification

6.3 Measured Boot with Witness Log

6.4 Remote Attestation for Edge Deployment

6.5 Code Signing for Partition Images

7. Agent-Specific Security

7.1 WASM Sandbox Security Within Partitions

7.2 Inter-Agent Communication Security

7.3 Agent Identity and Authentication

7.4 Resource Limits and DoS Prevention

8. Threat Model

8.1 What RVM Defends Against

Attacks from Partitions Against Other Partitions

Attacks from Partitions Against the Kernel

Boot-Time Attacks

Network/Remote Attacks (Multi-Node Mesh)

8.2 What Is Out of Scope for v1

8.3 Trust Boundaries

8.4 Comparison to KVM and seL4 Threat Models

9. Security Invariants Summary

10. Implementation Roadmap

Phase A (Complete): Linux-Hosted Prototype

Phase B (In Progress): Bare-Metal AArch64

Phase C (Planned): SMP + DMA

Phase D (Planned): Mesh + Attestation

References

56 KiB

Raw Permalink Blame History