Ruview/docs/adr/ADR-068-per-node-state-pipeline.md
ruv a4bd2308b7 feat: ADR-069 ESP32 CSI → Cognitum Seed RVF pipeline (v0.5.4-esp32)
Hardware-validated pipeline connecting ESP32-S3 CSI sensing to Cognitum
Seed (Pi Zero 2 W) edge intelligence appliance via 8-dim feature vectors.

Firmware:
- New 48-byte feature vector packet (magic 0xC5110003) at 1 Hz with
  normalized presence, motion, breathing, heart rate, phase variance,
  person count, fall detection, and RSSI
- Compressed frame magic reassigned 0xC5110003 → 0xC5110005
- Guard against uninitialized s_top_k read when count=0

Bridge (scripts/seed_csi_bridge.py):
- UDP→HTTPS ingest with bearer token, hash-based vector IDs
- --validate (kNN), --stats, --compact, --allowed-sources modes
- NaN/inf rejection, retry logic, SEED_TOKEN env var support

Validated on live hardware:
- 941 vectors ingested, 100% kNN exact match
- Witness chain SHA-256 verified (1,325 entries)
- 1,463 Rust tests passed, Python proof VERDICT: PASS

Research: 26 docs covering Arena Physica, Maxwell's equations in WiFi
sensing, SOTA survey 2025-2026, GOAP implementation plan

Security: removed hardcoded credentials, added NVS patterns to
.gitignore, source IP filtering, NaN validation

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 19:32:18 -04:00

8.4 KiB

ADR-068: Per-Node State Pipeline for Multi-Node Sensing

Field Value
Status Accepted
Date 2026-03-27
Authors rUv, claude-flow
Drivers #249, #237, #276, #282
Supersedes

Context

The sensing server (wifi-densepose-sensing-server) was originally designed for single-node operation. When multiple ESP32 nodes send CSI frames simultaneously, all data is mixed into a single shared pipeline:

  • One frame_history VecDeque for all nodes
  • One smoothed_person_score / smoothed_motion / vital sign buffers
  • One baseline and debounce state

This means the classification, person count, and vital signs reported to the UI are an uncontrolled aggregate of all nodes' data. The result: the detection window shows identical output regardless of how many nodes are deployed, where people stand, or how many people are in the room (#249 — 24 comments, the most reported issue).

Root Cause Verified

Investigation of AppStateInner (main.rs lines 279-367) confirmed:

Shared field Impact
frame_history Temporal analysis mixes all nodes' CSI data
smoothed_person_score Person count aggregates all nodes
smoothed_motion Motion classification undifferentiated
smoothed_hr / br Vital signs are global, not per-node
baseline_motion Adaptive baseline learned from mixed data
debounce_counter All nodes share debounce state

Decision

Introduce per-node state tracking via a HashMap<u8, NodeState> in AppStateInner. Each ESP32 node (identified by its node_id byte) gets an independent sensing pipeline with its own temporal history, smoothing buffers, baseline, and classification state.

Architecture

                     ┌─────────────────────────────────────────┐
   UDP frames        │           AppStateInner                  │
   ───────────►      │                                         │
   node_id=1    ──►  │  node_states: HashMap<u8, NodeState>    │
   node_id=2    ──►  │    ├── 1: NodeState { frame_history,    │
   node_id=3    ──►  │    │      smoothed_motion, vitals, ... }│
                     │    ├── 2: NodeState { ... }              │
                     │    └── 3: NodeState { ... }              │
                     │                                         │
                     │  ┌── Per-Node Pipeline ──┐               │
                     │  │ extract_features()     │               │
                     │  │ smooth_and_classify()  │               │
                     │  │ smooth_vitals()        │               │
                     │  │ score_to_person_count()│               │
                     │  └────────────────────────┘               │
                     │                                         │
                     │  ┌── Multi-Node Fusion ──┐               │
                     │  │ Aggregate person count │               │
                     │  │ Per-node classification│               │
                     │  │ All-nodes WebSocket msg│               │
                     │  └────────────────────────┘               │
                     │                                         │
                     │  ──► WebSocket broadcast (sensing_update) │
                     └─────────────────────────────────────────┘

NodeState Struct

struct NodeState {
    frame_history: VecDeque<Vec<f64>>,
    smoothed_person_score: f64,
    prev_person_count: usize,
    smoothed_motion: f64,
    current_motion_level: String,
    debounce_counter: u32,
    debounce_candidate: String,
    baseline_motion: f64,
    baseline_frames: u64,
    smoothed_hr: f64,
    smoothed_br: f64,
    smoothed_hr_conf: f64,
    smoothed_br_conf: f64,
    hr_buffer: VecDeque<f64>,
    br_buffer: VecDeque<f64>,
    rssi_history: VecDeque<f64>,
    vital_detector: VitalSignDetector,
    latest_vitals: VitalSigns,
    last_frame_time: Option<std::time::Instant>,
    edge_vitals: Option<Esp32VitalsPacket>,
}

Multi-Node Aggregation

  • Person count: Sum of per-node prev_person_count for active nodes (seen within last 10 seconds).
  • Classification: Per-node classification included in SensingUpdate.nodes.
  • Vital signs: Per-node vital signs; UI can render per-node or aggregate.
  • Signal field: Generated from the most-recently-updated node's features.
  • Stale nodes: Nodes with no frame for >10 seconds are excluded from aggregation and marked offline (consistent with PR #300).

Backward Compatibility

  • The simulated data path (simulated_data_task) continues using global state.
  • Single-node deployments behave identically (HashMap has one entry).
  • The WebSocket message format (sensing_update) remains the same but the nodes array now contains all active nodes, and estimated_persons reflects the cross-node aggregate.
  • The edge vitals path (#323 fix) also uses per-node state.

Scaling Characteristics

Nodes Per-Node Memory Total Overhead Notes
1 ~50 KB ~50 KB Identical to current
3 ~50 KB ~150 KB Typical home setup
10 ~50 KB ~500 KB Small office
50 ~50 KB ~2.5 MB Building floor
100 ~50 KB ~5 MB Large deployment
256 ~50 KB ~12.8 MB Max (u8 node_id)

Memory is dominated by frame_history (100 frames x ~500 bytes each = ~50 KB per node). This scales linearly and fits comfortably in server memory even at 256 nodes.

QEMU Validation

The existing QEMU swarm infrastructure (ADR-062, scripts/qemu_swarm.py) supports multi-node simulation with configurable topologies:

  • star: Central coordinator + sensor nodes
  • mesh: Fully connected peer network
  • line: Sequential chain
  • ring: Circular topology

Each QEMU instance runs with a unique node_id via NVS provisioning. The swarm health validator (scripts/swarm_health.py) checks per-node UART output.

Validation plan:

  1. QEMU swarm with 3-5 nodes in mesh topology
  2. Verify server produces distinct per-node classifications
  3. Verify aggregate person count reflects multi-node contributions
  4. Verify stale-node eviction after timeout

Consequences

Positive

  • Each node's CSI data is processed independently — no cross-contamination
  • Person count scales with the number of deployed nodes
  • Vital signs are per-node, enabling room-level health monitoring
  • Foundation for spatial localization (per-node positions + triangulation)
  • Scales to 256 nodes with <13 MB memory overhead

Negative

  • Slightly more memory per node (~50 KB each)
  • smooth_and_classify_node function duplicates some logic from global version
  • Per-node VitalSignDetector instances add CPU cost proportional to node count

Risks

  • Node ID collisions (mitigated by NVS persistence since v0.5.0)
  • HashMap growth without cleanup (mitigated by stale-node eviction)
  • ADR-069 (ESP32 CSI → Cognitum Seed RVF Ingest Pipeline) extends this ADR's per-node state architecture with Cognitum Seed integration. Live hardware validation (2026-04-02) confirmed per-node feature vectors flowing through the bridge into the Seed's RVF store with witness chain attestation.

References

  • Issue #249: Detection window same regardless (24 comments)
  • Issue #237: Same display for 0/1/2 people (12 comments)
  • Issue #276: Only one can be detected (8 comments)
  • Issue #282: Detection fail (5 comments)
  • PR #295: Hysteresis smoothing (partial mitigation)
  • PR #300: ESP32 offline detection after 5s
  • ADR-062: QEMU Swarm Configurator