ADR-062: QEMU ESP32-S3 Swarm Configurator
| Field |
Value |
| Status |
Accepted |
| Date |
2026-03-14 |
| Authors |
RuView Team |
| Relates |
ADR-061 (QEMU testing platform), ADR-060 (channel/MAC filter), ADR-018 (binary frame), ADR-039 (edge intel) |
Glossary
| Term |
Definition |
| Swarm |
A group of N QEMU ESP32-S3 instances running simultaneously |
| Topology |
How nodes are connected: star, mesh, line, ring |
| Role |
Node function: sensor (collects CSI), coordinator (aggregates + forwards), gateway (bridges to host) |
| Scenario matrix |
Cross-product of topology × node count × NVS config × mock scenario |
| Health oracle |
Python process that monitors all node UART logs and declares swarm health |
Context
ADR-061 Layer 3 provides a basic multi-node mesh test: N identical nodes with sequential TDM slots connected via a Linux bridge. This is useful but limited:
- All nodes are identical — real deployments have heterogeneous roles (sensor, coordinator, gateway)
- Single topology — only fully-connected bridge; no star, line, or ring topologies
- No scenario variation per node — all nodes run the same mock CSI scenario
- Manual configuration — each test requires hand-editing env vars and arguments
- No swarm-level health monitoring — validation checks individual nodes, not collective behavior
- No cross-node timing validation — TDM slot ordering and inter-frame gaps aren't verified
Real WiFi-DensePose deployments use 3-8 ESP32-S3 nodes in various topologies. A single coordinator aggregates CSI from multiple sensors. The firmware must handle TDM conflicts, missing nodes, role-based behavior differences, and network partitions — none of which ADR-061 Layer 3 tests.
Decision
Build a QEMU Swarm Configurator — a YAML-driven tool that defines multi-node test scenarios declaratively and orchestrates them under QEMU with swarm-level validation.
Architecture
┌─────────────────────────────────────────────────────┐
│ swarm_config.yaml │
│ nodes: [{role: sensor, scenario: 2, channel: 6}] │
│ topology: star │
│ duration: 60s │
│ assertions: [all_nodes_boot, tdm_no_collision, ...] │
└──────────────────────┬──────────────────────────────┘
│
┌────────────▼────────────┐
│ qemu_swarm.py │
│ (orchestrator) │
└───┬────┬────┬───┬──────┘
│ │ │ │
┌────▼┐ ┌▼──┐ ▼ ┌▼────┐
│Node0│ │N1 │... │N(n-1)│ QEMU instances
│sens │ │sen│ │coord │
└──┬──┘ └─┬─┘ └──┬───┘
│ │ │
┌──▼──────▼─────────▼──┐
│ Virtual Network │ TAP bridge / SLIRP
│ (topology-shaped) │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Aggregator (Rust) │ Collects frames
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Health Oracle │ Swarm-level assertions
│ (swarm_health.py) │
└──────────────────────┘
YAML Configuration Schema
# swarm_config.yaml
swarm:
name: "3-sensor-star"
duration_s: 60
topology: star # star | mesh | line | ring
aggregator_port: 5005
nodes:
- role: coordinator
node_id: 0
scenario: 0 # empty room (baseline)
channel: 6
edge_tier: 2
is_gateway: true # receives aggregated frames
- role: sensor
node_id: 1
scenario: 2 # walking person
channel: 6
tdm_slot: 1 # TDM slot index (auto-assigned from node position if omitted)
- role: sensor
node_id: 2
scenario: 3 # fall event
channel: 6
tdm_slot: 2
assertions:
- all_nodes_boot
- no_crashes
- tdm_no_collision
- all_nodes_produce_frames
- coordinator_receives_from_all
- fall_detected_by_node_2
- frame_rate_above: 15 # Hz minimum per node
- max_boot_time_s: 10
Topologies
| Topology |
Network |
Description |
star |
All sensors connect to coordinator; coordinator has TAP to each sensor |
Hub-and-spoke, most common |
mesh |
All nodes on same bridge (existing Layer 3 behavior) |
Every node sees every other |
line |
Node 0 ↔ Node 1 ↔ Node 2 ↔ ... |
Linear chain, tests multi-hop |
ring |
Like line but last connects to first |
Circular, tests routing |
Node Roles
| Role |
Behavior |
NVS Keys |
sensor |
Runs mock CSI, sends frames to coordinator |
node_id, tdm_slot, target_ip |
coordinator |
Receives frames from sensors, runs edge aggregation |
node_id, tdm_slot=0, edge_tier=2 |
gateway |
Like coordinator but also bridges to host UDP |
node_id, target_ip=host, is_gateway=1 |
Assertions (Swarm-Level)
| Assertion |
What It Checks |
all_nodes_boot |
Every node's UART log shows boot indicators within timeout |
no_crashes |
No Guru Meditation, assert, panic in any log |
tdm_no_collision |
No two nodes transmit in the same TDM slot |
all_nodes_produce_frames |
Every sensor node's log contains CSI frame output |
coordinator_receives_from_all |
Coordinator log shows frames from each sensor's node_id |
fall_detected_by_node_N |
Node N's log reports a fall detection event |
frame_rate_above |
Each node produces at least N frames/second |
max_boot_time_s |
All nodes boot within N seconds |
no_heap_errors |
No OOM or heap corruption in any log |
network_partitioned_recovery |
After deliberate partition, nodes resume communication (future) |
Preset Configurations
| Preset |
Nodes |
Topology |
Purpose |
smoke |
2 |
star |
Quick CI smoke test (15s) |
standard |
3 |
star |
Default 3-node (sensor + sensor + coordinator) |
large-mesh |
6 |
mesh |
Scale test with 6 fully-connected nodes |
line-relay |
4 |
line |
Multi-hop relay chain |
ring-fault |
4 |
ring |
Ring with fault injection mid-test |
heterogeneous |
5 |
star |
Mixed scenarios: walk, fall, static, channel-sweep, empty |
ci-matrix |
3 |
star |
CI-optimized preset (30s, minimal assertions) |
File Layout
scripts/
├── qemu_swarm.py # Main orchestrator (CLI entry point)
├── swarm_health.py # Swarm-level health oracle
└── swarm_presets/
├── smoke.yaml
├── standard.yaml
├── large_mesh.yaml
├── line_relay.yaml
├── ring_fault.yaml
├── heterogeneous.yaml
└── ci_matrix.yaml
.github/workflows/
└── firmware-qemu.yml # MODIFIED: add swarm test job
Consequences
Benefits
- Declarative testing — define swarm topology in YAML, not shell scripts
- Role-based nodes — test coordinator/sensor/gateway interactions
- Topology variety — star/mesh/line/ring match real deployment patterns
- Swarm-level assertions — validate collective behavior, not just individual nodes
- Preset library — quick CI smoke tests and thorough manual validation
- Reproducible — YAML configs are version-controlled and shareable
Limitations
- Still requires root for TAP bridge topologies (star, line, ring); mesh can use SLIRP
- QEMU resource usage — 6+ QEMU instances use ~2GB RAM, may slow CI runners
- No real RF — inter-node communication is IP-based, not WiFi CSI multipath
References
- ADR-061: QEMU ESP32-S3 firmware testing platform (Layers 1-9)
- ADR-060: Channel override and MAC address filter provisioning
- ADR-018: Binary CSI frame format (magic
0xC5110001)
- ADR-039: Edge intelligence pipeline (biquad, vitals, fall detection)