Ruview/docs/adr/ADR-070-self-supervised-pretraining.md
ruv b46b789e9e feat: ADR-070 self-supervised pretraining from live ESP32 CSI + Seed
4-phase pipeline: data collection (2 nodes), contrastive pretraining,
downstream heads (presence/count/activity/vitals), package & distribute.
Validated: 118 features from 2 nodes in 60s, witness chain intact.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 20:42:37 -04:00

9.2 KiB
Raw Blame History

ADR-070: Self-Supervised Pretraining from Live ESP32 CSI + Cognitum Seed

Field Value
Status Accepted
Date 2026-04-02
Authors rUv, claude-flow
Drivers README limitation "No pre-trained model weights provided"
Related ADR-069 (Cognitum Seed pipeline), ADR-027 (MERIDIAN), ADR-024 (AETHER contrastive), ADR-015 (MM-Fi dataset)

Context

The README lists "No pre-trained model weights are provided; training from scratch is required" as a known limitation. Users must collect their own CSI dataset and train from scratch, which is a significant barrier to adoption.

We now have the infrastructure to generate pre-trained weights directly from live hardware:

  • 2 ESP32-S3 nodes (COM8 node_id=2 at 192.168.1.104, COM9 node_id=1 at 192.168.1.105) streaming CSI + vitals + 8-dim feature vectors at 1 Hz each
  • Cognitum Seed (Pi Zero 2 W) with RVF vector store, kNN search, witness chain, and environmental sensors (BME280, PIR, vibration)
  • Recording API in sensing-server (POST /api/v1/recording/start) that saves CSI frames to .csi.jsonl
  • Self-supervised training via rapid_adapt.rs (contrastive TTT + entropy minimization)
  • AETHER contrastive embeddings (ADR-024) for environment-independent representations

Why Self-Supervised?

No cameras or labels are needed. The system learns from:

  1. Temporal coherence — Frames close in time should have similar embeddings (positive pairs), frames far apart should differ (negative pairs)
  2. Multi-node consistency — The same person seen from 2 nodes should produce correlated features, different people should produce decorrelated features
  3. Cognitum Seed ground truth — PIR sensor, BME280 environment changes, and kNN cluster transitions provide weak supervision without human labeling
  4. Physical constraints — Breathing 6-30 BPM, heart rate 40-150 BPM, person count 0-4, RSSI physics

Decision

Implement a 4-phase pretraining pipeline that collects CSI from 2 ESP32 nodes, stores feature vectors in the Cognitum Seed, and produces distributable pre-trained weights.

Phase 1: Data Collection (30 min)

Capture labeled scenarios using the sensing-server recording API and Cognitum Seed:

Scenario Duration Label Activity
Empty room 5 min empty No one present, establish baseline
1 person stationary 5 min 1p-still Sit at desk, normal breathing
1 person walking 5 min 1p-walk Walk around room, varied paths
1 person varied 5 min 1p-varied Stand, sit, wave arms, turn
2 people 5 min 2p Both moving in room
Transitions 5 min transitions Enter/exit room, appear/disappear

Data rate per scenario:

  • 2 nodes × 100 Hz CSI = 200 frames/sec = 60,000 frames per 5 min
  • 2 nodes × 1 Hz features = 2 vectors/sec = 600 vectors per 5 min
  • Total: 360,000 CSI frames + 3,600 feature vectors per collection run

Cognitum Seed role:

  • Stores all feature vectors with witness chain attestation
  • PIR sensor provides binary presence ground truth
  • BME280 tracks environmental conditions during collection
  • kNN graph clusters naturally emerge from the vector distribution

Phase 2: Contrastive Pretraining

Train a contrastive encoder on the collected CSI data:

Input: Raw CSI frame (128 subcarriers × 2 I/Q = 256 features)
       ↓
    TCN temporal encoder (3 layers, kernel=7)
       ↓
    Projection head → 128-dim embedding
       ↓
    Contrastive loss (InfoNCE):
      positive: frames within 0.5s window from same node
      negative: frames >5s apart or from different scenario
      cross-node positive: same timestamp, different node

Self-supervised signals:

  • Temporal adjacency (frames within 500ms = positive pair)
  • Cross-node agreement (same person seen from 2 viewpoints)
  • PIR consistency (embedding should cluster by PIR state)
  • Scenario boundary (embeddings should shift at label transitions)

Phase 3: Downstream Head Training

Attach lightweight heads for each task:

Head Architecture Output Supervision
Presence Linear(128→1) + sigmoid 0.0-1.0 PIR sensor (free)
Person count Linear(128→4) + softmax 0-3 people Scenario labels
Activity Linear(128→4) + softmax still/walk/varied/empty Scenario labels
Vital signs Linear(128→2) BR, HR (BPM) ESP32 edge vitals

Phase 4: Package & Distribute

Produce distributable artifacts:

Artifact Format Size Description
pretrained-encoder.onnx ONNX ~2 MB Contrastive encoder (TCN backbone)
pretrained-heads.onnx ONNX ~100 KB Task-specific heads
pretrained.rvf RVF ~500 KB RuVector format with metadata
room-profiles.json JSON ~10 KB Environment calibration profiles
collection-witness.json JSON ~5 KB Seed witness chain attestation proving data provenance

Include in GitHub release alongside firmware binaries. Users download and run:

# Use pre-trained model (no training needed)
cargo run -p wifi-densepose-sensing-server -- --model pretrained.rvf --http-port 3000

Hardware Setup

                    192.168.1.20 (Host laptop)
                    ┌──────────────────────────┐
                    │  sensing-server           │
                    │    Recording API          │
                    │    Training pipeline      │
                    │                           │
                    │  seed_csi_bridge.py       │
                    │    Feature → Seed ingest  │
                    └────┬──────────┬───────────┘
                         │          │
          UDP:5006       │          │  HTTPS:8443
     ┌───────────────────┤          ├───────────────┐
     │                   │          │               │
     ▼                   ▼          ▼               │
┌──────────┐    ┌──────────┐    ┌──────────────┐    │
│ ESP32 #1 │    │ ESP32 #2 │    │Cognitum Seed │◄───┘
│ COM9     │    │ COM8     │    │ Pi Zero 2W   │
│ node=1   │    │ node=2   │    │ USB          │
│ .1.105   │    │ .1.104   │    │ .42.1/8443   │
│ v0.5.4   │    │ v0.5.4   │    │ v0.8.1       │
└──────────┘    └──────────┘    │ PIR, BME280  │
                                │ RVF store    │
                                │ Witness chain│
                                └──────────────┘

Data Collection Protocol

Step 1: Start Seed ingest (background)

export SEED_TOKEN="your-token"
python scripts/seed_csi_bridge.py \
  --seed-url https://169.254.42.1:8443 --token "$SEED_TOKEN" \
  --udp-port 5006 --batch-size 10 --validate &

Step 2: Start sensing-server with recording

cargo run -p wifi-densepose-sensing-server -- \
  --source esp32 --udp-port 5006 --http-port 3000

Step 3: Record each scenario

# Empty room (leave room for 5 min)
curl -X POST http://localhost:3000/api/v1/recording/start \
  -H 'Content-Type: application/json' \
  -d '{"session_name":"pretrain-empty","label":"empty","duration_secs":300}'

# 1 person stationary (sit at desk for 5 min)
curl -X POST http://localhost:3000/api/v1/recording/start \
  -d '{"session_name":"pretrain-1p-still","label":"1p-still","duration_secs":300}'

# ... repeat for each scenario

Step 4: Verify with Seed

python scripts/seed_csi_bridge.py --token "$SEED_TOKEN" --stats
# Should show 3,600+ vectors from the collection run

Risks

Risk Likelihood Impact Mitigation
2 nodes insufficient for spatial diversity Medium Lower pretraining quality Place nodes 3-5m apart at different heights
PIR sensor has limited range Low Weak presence labels BME280 temp changes + kNN clusters as backup
Contrastive pretraining collapses Low Useless embeddings Temperature scheduling, hard negative mining
Model too large for ESP32 inference N/A N/A Inference on host/Seed, not on ESP32
Room-specific overfitting Medium Poor generalization MERIDIAN domain randomization (ADR-027), LoRA adaptation

Consequences

Positive

  • Users get working model out of the box — no training needed
  • Witness chain proves data provenance (when/where/which hardware)
  • Pre-trained encoder transfers to new environments via LoRA fine-tuning
  • Removes the #1 adoption barrier from the README

Negative

  • 30 min of manual data collection per pretraining run
  • Pre-trained weights are room-specific without adaptation
  • ONNX runtime dependency for inference