Ruview/docs/adr/ADR-050-provisioning-tool-enhancements.md
rUv 0943a32248
feat: Real-time dense point cloud from camera + WiFi CSI (#405)
* Add wifi-densepose-pointcloud: real-time dense point cloud from camera + WiFi CSI

New crate with 5 modules:
- depth: monocular depth estimation + 3D backprojection (ONNX-ready, synthetic fallback)
- pointcloud: Point3D/ColorPoint types, PLY export, Gaussian splat conversion
- fusion: WiFi occupancy volume → point cloud + multi-modal voxel fusion
- stream: HTTP + Three.js viewer server (Axum, port 9880)
- main: CLI with serve/capture/demo subcommands

Demo output: 271 WiFi points + 19,200 depth points → 4,886 fused → 1,718 Gaussian splats.
Serves interactive 3D viewer at http://localhost:9880 with Three.js orbit controls.

ADR-SYS-0021 documents the architecture for camera + WiFi CSI dense point cloud pipeline.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Optimize pointcloud: larger splat voxels, smaller responses, faster fusion

- Gaussian splat voxel size: 0.10 → 0.15 (42% fewer splats: 1718 → 994)
- Splat response: 399 KB → 225 KB (44% smaller)
- Pipeline: 22.2ms mean (100 runs, σ=0.3ms)
- Cloud API: 1.11ms avg, 905 req/s
- Splats API: 1.39ms avg, 719 req/s
- Binary: 1.0 MB arm64 (Mac Mini), tested

Co-Authored-By: claude-flow <ruv@ruv.net>

* Complete implementation: camera capture, WiFi CSI receiver, training pipeline

Three new modules added to wifi-densepose-pointcloud:

1. camera.rs — Cross-platform camera capture
   - macOS: AVFoundation via Swift, ffmpeg avfoundation
   - Linux: V4L2, ffmpeg v4l2
   - Camera detection, listing, frame capture to RGB
   - Graceful fallback to synthetic data when no camera

2. csi.rs — WiFi CSI receiver for ESP32 nodes
   - UDP listener for CSI JSON frames from ESP32
   - Per-link attenuation tracking with EMA smoothing
   - Simplified RF tomography (backprojection to occupancy grid)
   - Test frame sender for development without hardware
   - Ready for real ESP32 CSI data from ruvzen

3. training.rs — Calibration and training pipeline
   - Depth calibration: grid search over scale/offset/gamma
   - Occupancy training: threshold optimization for presence detection
   - Ground truth reference points for depth RMSE measurement
   - Preference pair export (JSONL) for DPO training on ruOS brain
   - Brain integration: submit observations as memories
   - Persistent calibration files (JSON)

New CLI commands:
   ruview-pointcloud cameras         # list available cameras
   ruview-pointcloud train           # run calibration + training
   ruview-pointcloud csi-test        # send test CSI frames
   ruview-pointcloud serve --csi     # serve with live CSI input

All tested: demo, training (10 samples, 4 reference points, 3 pairs),
CSI receiver (50 test frames), server API.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Fix viewer: replace WebSocket with fetch polling

Co-Authored-By: claude-flow <ruv@ruv.net>

* Wire live camera into server — real-time updating point cloud

- Server captures from /dev/video0 at 2fps via ffmpeg
- Background tokio task refreshes cloud + splats every 500ms
- Viewer polls /api/splats every 500ms, only updates on new frame
- Shows 🟢 LIVE / 🔴 DEMO indicator
- Camera position set for first-person view (looking forward into scene)
- Downsample 4x for performance (19,200 points per frame)
- Graceful fallback to demo data if camera capture fails

Co-Authored-By: claude-flow <ruv@ruv.net>

* Add MiDaS GPU depth, serial CSI reader, full sensor fusion

- MiDaS depth server: PyTorch on CUDA, real monocular depth estimation
- Rust server calls MiDaS via HTTP for neural depth (falls back to luminance)
- Serial CSI reader for ESP32 with motion detection + presence estimation
- CSI disabled by default (RUVIEW_CSI=1 to enable) — serial reader needs baud config
- Edge-enhanced depth for better object boundaries
- All sensors wired: camera, ESP32 CSI, mmWave (CSI gated until serial fixed)

Co-Authored-By: claude-flow <ruv@ruv.net>

* Complete 7-component sensor fusion pipeline (all working)

1. ADR-018 binary parser — decodes ESP32 CSI UDP frames, extracts I/Q subcarriers
2. WiFlow pose — 17 COCO keypoints from CSI (186K param model loaded)
3. Camera depth — MiDaS on CUDA + luminance fallback
4. Sensor fusion — camera depth + CSI occupancy grid + skeleton overlay
5. RF tomography — ISTA-inspired backprojection from per-node RSSI
6. Vital signs — breathing rate from CSI phase analysis
7. Motion-adaptive — skip expensive depth when CSI shows no motion

Live results: 510 CSI frames/session, 17 keypoints, 26% motion, 40 BPM breathing.
Both ESP32 nodes provisioned to send CSI to 192.168.1.123:3333.
Magic number fix: supports both 0xC5110001 (v1) and 0xC5110006 (v6) frames.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Add brain bridge — sparse spatial observation sync every 60s

Stores room scan summaries, motion events, and vital signs
in the ruOS brain as memories. Only syncs every 120 frames
(~60 seconds) to keep the brain sparse and optimized.

Categories: spatial-observation, spatial-motion, spatial-vitals.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Update README + user guide with dense point cloud features

Added pointcloud section to README (quick start, CLI, performance).
Added comprehensive user guide section: setup, sensors, commands,
pipeline components, API endpoints, training, output formats,
deep room scan, ESP32 provisioning.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Add ruview-geo: geospatial satellite integration (11 modules, 8/8 tests)

New crate with free satellite imagery, terrain, OSM, weather, and brain integration.

Modules: types, coord, locate, cache, tiles, terrain, osm, register, fuse, brain, temporal
Tests: 8 passed (haversine, ENU roundtrip, tiles, HGT parse, registration)
Validation: real data — 43.49N 79.71W, 4 Sentinel-2 tiles, 2°C weather, brain stored

Data sources (all free, no API keys):
- EOX Sentinel-2 cloudless (10m satellite tiles)
- SRTM GL1 (30m elevation)
- Overpass API (OSM buildings/roads)
- ip-api.com (geolocation)
- Open Meteo (weather)

ADR-044 documents architecture decisions.
README.md in crate subdirectory.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Update ADR-044: add Common Crawl WET, NASA FIRMS, OpenAQ, Overture Maps sources

Extended geospatial data sources leveraging ruvector's existing web_ingest
and Common Crawl support for hyperlocal context.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Fix OSM/SRTM queries, add change detection + night mode

- OSM: use inclusive building filter with relation query and 25s timeout
- SRTM: switch to NASA public mirror with viewfinderpanoramas fallback
- Add detect_tile_changes() for pixel-diff satellite change detection
- Add is_night() solar-declination model for CSI-only night mode
- 6 new unit tests (night mode + tile change detection)

Co-Authored-By: claude-flow <ruv@ruv.net>

* Enhance viewer: skeleton overlay, weather, buildings, better camera

Add COCO skeleton rendering with yellow keypoint spheres and white bone
lines, info panel sections for weather/buildings/CSI rate/confidence,
overhead camera at (0,2,-4), and denser point size with sizeAttenuation.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Add CSI fingerprint DB + night mode detection

Co-Authored-By: claude-flow <ruv@ruv.net>

* Fix ADR-044 numbering conflict, update geo README

Renumbered provisioning tool ADR from 044 to 050 to avoid conflict
with geospatial satellite integration ADR-044.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Clean up warnings: suppress dead_code for conditional pipeline modules

Removes unused imports/variables via cargo fix and adds #[allow(dead_code)]
for modules used conditionally at runtime (CSI, depth, fusion, serial).
Pointcloud: 28 → 0 warnings. Geo: 2 → 0 warnings. 8/8 tests pass.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Fix PR #405 blockers: async runtime panic, crate rename, path traversal, brain URL config

- brain_bridge.rs: replace `Handle::current().block_on(...)` inside async fn
  with `.await` (was a guaranteed "runtime within runtime" panic). Brain URL
  now read from RUVIEW_BRAIN_URL env var (default http://127.0.0.1:9876),
  logged once via OnceLock.
- wifi-densepose-geo: rename Cargo package from `ruview-geo` to
  `wifi-densepose-geo` to match directory and workspace conventions. Update
  all use sites (tests/examples/README). Same env-var pattern for brain URL
  in brain.rs + temporal.rs.
- training.rs: add sanitize_data_path() rejecting `..` components and
  safe_join() that canonicalises + enforces base-dir containment on every
  write (calibration.json, samples.json, preference_pairs.jsonl,
  occupancy_calibration.json). Defence-in-depth check also in main.rs
  before TrainingSession::new.
- osm.rs: clamp Overpass radius to MAX_RADIUS_M=5000m; return Err beyond
  that. Add parse_overpass_json() that rejects malformed payloads
  (missing top-level `elements` array).

Co-Authored-By: claude-flow <ruv@ruv.net>

* csi_pipeline: rename WiFlow stub to heuristic_pose_from_amplitude, decouple UDP

Blocker 3 (PR #405 review): The "WiFlow inference" path was a stub that
built a model from empty weight vectors and synthesised keypoints from
amplitude energy. Presenting this as "WiFlow inference" was misleading.

- Rename WiFlowModel to PoseModelMetadata (empty tag struct; we only care
  if the on-disk file exists)
- Rename load_wiflow_model() -> detect_pose_model_metadata() and log
  "amplitude-energy heuristic enabled/disabled" (no "WiFlow" claim)
- Rename estimate_pose() -> heuristic_pose_from_amplitude() with
  prominent `STUB:` doc comment saying this is NOT a trained model

Blocker 4 (PR #405 review): The UDP receiver held the shared Arc<Mutex>
across a synchronous process_frame() call, starving HTTP handlers.

- Introduce a std::sync::mpsc channel between the UDP thread (which only
  parses + pushes) and a dedicated processor thread (which locks only
  briefly around a single process_frame). HTTP snapshots via
  get_pipeline_output no longer contend with the socket read loop.

Also:
- Move ADR-018 parser to parser.rs (see next commit); csi_pipeline re-exports
- send_test_frames now uses parser::build_test_frame for synthetic frames
- Log a one-line node stats summary every 500 frames (reads every public
  CsiFrame field on the runtime path)

Co-Authored-By: claude-flow <ruv@ruv.net>

* Extract ADR-018 parser into parser.rs + wire Fingerprint CLI

File-split (strong concern #9 in PR #405 review): csi_pipeline.rs was 602
LOC; extract the pure-function ADR-018 parser + synthetic frame builder
into src/parser.rs. Inline unit tests in parser.rs cover:

- 0xC5110001 (raw CSI, v1) roundtrip
- 0xC5110006 (feature state, v6) roundtrip
- wrong magic is rejected
- truncated header is rejected
- truncated payload is rejected

main.rs: expose `fingerprint NAME [--seconds N]` subcommand wiring
record_fingerprint() (this was the only caller needed to make the public
API non-dead on the runtime path). Also:

- Replace `--host/--port` + external `--csi` with a single `--bind`
  defaulting to loopback (`127.0.0.1:9880`) — addresses strong concern
  #7 about exposing camera/CSI/vitals by default.
- Update synthetic `csi-test` to target UDP 3333 (matching the ADR-018
  listener) and use the shared parser::build_test_frame.
- Defence-in-depth: call training::sanitize_data_path on the expanded
  --data-dir before TrainingSession::new does the same.

Co-Authored-By: claude-flow <ruv@ruv.net>

* stream: extract viewer HTML to viewer.html, default bind to loopback

Strong concern #7 (PR #405): default HTTP bind leaked camera/CSI/vitals
to the LAN. The `serve` fn now takes a single `bind` arg and prints a
loud WARNING when bound outside loopback.

Strong concern #10 (PR #405): embedded HTML+JS was ~220 LOC of the 418
LOC stream.rs. Moved the markup verbatim into viewer.html and inlined
via `include_str!("viewer.html")`. Also:

- Drop the #![allow(dead_code)] crate-level silencing (reviewer point
  #11). Remove the now-unused AppState.csi_pipeline field.
- capture_camera_cloud_with_luminance returns the mean luminance of the
  captured frame; the background loop feeds that to
  CsiPipelineState::set_light_level so the night-mode flag actually
  toggles at runtime (previously it could only be set from tests).

Net effect on file size: stream.rs 418 → 232 LOC.

Co-Authored-By: claude-flow <ruv@ruv.net>

* Dead-code cleanup + tests for fusion/depth/OSM/training/fingerprinting

Reviewer point #11 (PR #405): remove the `#![allow(dead_code)]`
silencing added in 8eb808d and fix the underlying issues.

- Delete csi.rs: duplicate of csi_pipeline.rs with incompatible wire
  format (JSON vs ADR-018 binary). csi_pipeline is the real path.
- Delete serial_csi.rs: never referenced by any module.
- Drop Frame.timestamp_ms (unread), AppState.csi_pipeline (unread),
  brain_bridge::brain_available (caller-less), fusion::fetch_wifi_occupancy
  (caller-less) — these had no runtime users.
- Drop crate-level #![allow(dead_code)] from camera.rs, depth.rs,
  fusion.rs, pointcloud.rs.

Tests (target: 8-12, actual: 15 unit + 9 geo unit + 8 geo integration
= 32 total, all pass):

- parser.rs: 5 tests (v1/v6 magic roundtrip, wrong magic, truncated
  header, truncated payload).
- fusion.rs: 2 tests (non-overlapping merge, voxel dedup).
- depth.rs: 2 tests (2x2 backproject → 4 points at z=1, NaN rejected).
- training.rs: 4 tests (rejects `..`, accepts relative child, refuses
  TrainingSession::new("../etc/passwd"), accepts a clean tmpdir).
- csi_pipeline.rs: 2 tests (set_light_level toggles is_dark,
  record_fingerprint stores and self-identifies).
- osm.rs: 3 tests (parse_overpass_json minimal fixture, rejects
  malformed payload, fetch_buildings rejects > MAX_RADIUS_M).

Co-Authored-By: claude-flow <ruv@ruv.net>

* Update README + user-guide for PR #405 review-fix additions

- serve now uses --bind 127.0.0.1:9880 (loopback default) instead of --port
- Add fingerprint subcommand to CLI tables
- Document RUVIEW_BRAIN_URL env var + --brain flag
- Flag pose path as amplitude-energy heuristic stub (not trained WiFlow)
- Security note on exposing server outside loopback
- Add wifi-densepose-pointcloud + wifi-densepose-geo rows to crate table

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-20 12:48:54 -04:00

8.3 KiB

ADR-050: Provisioning Tool Enhancements

Status: Proposed Date: 2026-03-03 Deciders: @ruvnet Supersedes: None Related: ADR-029, ADR-032, ADR-039, ADR-040


Context

The ESP32-S3 CSI node provisioning script (firmware/esp32-csi-node/provision.py) is the primary tool for configuring pre-built firmware binaries without recompiling. It writes NVS key-value pairs that the firmware reads at boot.

After #131 added TDM and edge intelligence flags, the script now covers the most-requested NVS keys. However, there remain gaps between what the firmware reads from NVS (nvs_config.c, 20 keys) and what the provisioning script can write (13 keys). Additionally, the script lacks usability features that would help field operators deploying multi-node meshes.

Gap 1: Missing NVS Keys (7 keys)

The firmware reads these NVS keys at boot but the provisioning script has no corresponding CLI flags:

NVS Key Type Firmware Default Purpose
hop_count u8 1 (no hop) Number of channels to hop through
chan_list blob (u8[6]) {1,6,11} Channel numbers for hopping sequence
dwell_ms u32 100 Time to dwell on each channel before hopping (ms)
power_duty u8 100 Power duty cycle percentage (10-100%) for battery life
wasm_max u8 4 Max concurrent WASM modules (ADR-040)
wasm_verify u8 0 Require Ed25519 signature for WASM uploads (0/1)
wasm_pubkey blob (32B) zeros Ed25519 public key for WASM signature verification

Gap 2: No Read-Back

There is no way to read the current NVS configuration from a device. Field operators must remember what was provisioned or reflash everything. This is especially problematic for multi-node meshes where each node has different TDM slots.

Gap 3: No Verification

After flashing, there is no automated check that the device booted successfully with the new configuration. Operators must manually run a serial monitor and inspect logs.

Gap 4: No Config File Support

Provisioning a 6-node mesh requires running the script 6 times with largely overlapping flags (same SSID, password, target IP) and only TDM slot varying. There is no way to define a mesh configuration in a file.

Gap 5: No Presets

Common deployment scenarios (single-node basic, 3-node mesh, 6-node mesh with vitals) require operators to know which flags to combine. Named presets would lower the barrier to entry.

Gap 6: No Auto-Detect

The --port flag is required even though the script could auto-detect connected ESP32-S3 devices via esptool.py.


Decision

Enhance provision.py with the following capabilities, implemented incrementally.

Phase 1: Complete NVS Coverage

Add flags for all remaining firmware NVS keys:

--hop-count N          Channel hop count (1=no hop, default: 1)
--channels 1,6,11      Comma-separated channel list for hopping
--dwell-ms N           Dwell time per channel in ms (default: 100)
--power-duty N         Power duty cycle 10-100% (default: 100)
--wasm-max N           Max concurrent WASM modules 1-8 (default: 4)
--wasm-verify          Require Ed25519 signature for WASM uploads
--wasm-pubkey FILE     Path to Ed25519 public key file (32 bytes raw or PEM)

Validation:

  • --channels length must match --hop-count
  • --power-duty clamped to 10-100
  • --wasm-pubkey implies --wasm-verify

Phase 2: Config File and Mesh Provisioning

Add --config FILE to load settings from a JSON or TOML file:

{
  "common": {
    "ssid": "SensorNet",
    "password": "secret",
    "target_ip": "192.168.1.20",
    "target_port": 5005,
    "edge_tier": 2
  },
  "nodes": [
    { "port": "COM7", "node_id": 0, "tdm_slot": 0 },
    { "port": "COM8", "node_id": 1, "tdm_slot": 1 },
    { "port": "COM9", "node_id": 2, "tdm_slot": 2 }
  ]
}

--config mesh.json provisions all listed nodes in sequence, computing tdm_total automatically from the nodes array length.

Phase 3: Presets

Add --preset NAME for common deployment profiles:

Preset What It Sets
basic Single node, edge_tier=0, no TDM, no hopping
vitals Single node, edge_tier=2, vital_int=1000, subk_count=32
mesh-3 3-node TDM, edge_tier=1, hop_count=3, channels=1,6,11
mesh-6-vitals 6-node TDM, edge_tier=2, hop_count=3, channels=1,6,11, vital_int=500

Presets set defaults that can be overridden by explicit flags.

Phase 4: Read-Back and Verify

Add --read to dump the current NVS configuration from a connected device:

python provision.py --port COM7 --read
# Output:
#   ssid:         SensorNet
#   target_ip:    192.168.1.20
#   tdm_slot:     0
#   tdm_nodes:    3
#   edge_tier:    2
#   ...

Implementation: use esptool.py read_flash to read the NVS partition, then parse the NVS binary format to extract key-value pairs.

Add --verify to provision and then confirm the device booted:

python provision.py --port COM7 --ssid "Net" --password "pass" --target-ip 192.168.1.20 --verify
# After flash, opens serial monitor for 5 seconds
# Checks for "CSI streaming active" log line
# Reports PASS or FAIL

Phase 5: Auto-Detect Port

When --port is omitted, scan for connected ESP32-S3 devices:

python provision.py --ssid "Net" --password "pass" --target-ip 192.168.1.20
# Auto-detected ESP32-S3 on COM7 (Silicon Labs CP210x)
# Proceed? [Y/n]

Implementation: use esptool.py or serial.tools.list_ports to enumerate ports.


Rationale

Why incremental phases?

Phase 1 is a small diff that closes the NVS coverage gap immediately. Phases 2-5 add progressively more UX polish. Each phase is independently useful and can be shipped separately.

Why JSON config over YAML/TOML?

JSON requires no additional Python dependencies (stdlib json module). TOML requires tomllib (Python 3.11+) or tomli. JSON is sufficient for this use case.

Why not a GUI?

The target users are embedded developers and field operators who are already running esptool from the command line. A TUI/GUI would add dependencies and complexity for minimal benefit.


Consequences

Positive

  • Complete NVS coverage: Every firmware-readable key can be set from the provisioning tool
  • Mesh provisioning in one command: --config mesh.json replaces 6 separate invocations
  • Lower barrier to entry: Presets eliminate the need to know which flags to combine
  • Auditability: --read lets operators inspect and verify deployed configurations
  • Fewer mis-provisions: --verify catches flashing failures before the operator walks away

Negative

  • NVS binary parsing (Phase 4) requires understanding the ESP-IDF NVS binary format, which is not officially documented as a stable API
  • Auto-detect (Phase 5) may produce false positives if other ESP32 variants are connected

Risks

Risk Likelihood Impact Mitigation
NVS binary format changes in ESP-IDF v6 Low Medium Pin to known ESP-IDF NVS page format; add format version check
--verify serial parsing is fragile Medium Low Match on stable log tag [CSI_MAIN]; timeout after 10s
Config file credentials in plaintext Medium Medium Document that config files should not be committed; add .gitignore pattern

Implementation Priority

Phase Effort Impact Priority
Phase 1: Complete NVS coverage Small (1 file, ~50 lines) High — closes feature gap P0
Phase 2: Config file + mesh Medium (~100 lines) High — biggest UX win P1
Phase 3: Presets Small (~40 lines) Medium — convenience P2
Phase 4: Read-back + verify Medium (~150 lines) Medium — debugging aid P2
Phase 5: Auto-detect Small (~30 lines) Low — minor convenience P3

References

  • firmware/esp32-csi-node/main/nvs_config.h — NVS config struct (20 fields)
  • firmware/esp32-csi-node/main/nvs_config.c — NVS read logic (20 keys)
  • firmware/esp32-csi-node/provision.py — Current provisioning script (13 of 20 keys)
  • ADR-029: RuvSense multistatic sensing mode (TDM, channel hopping)
  • ADR-032: Multistatic mesh security hardening (mesh keys)
  • ADR-039: ESP32-S3 edge intelligence (edge tiers, vitals)
  • ADR-040: WASM programmable sensing (WASM modules, signature verification)
  • Issue #130: Provisioning script doesn't support TDM