Ruview/docs/adr/ADR-058-ruvector-wasm-browser-pose-example.md
rUv 7c1351fd5d
feat(demo): wire all 6 RuVector WASM attention mechanisms into pose fusion
* feat: dual-modal WASM browser pose estimation demo (ADR-058)

Live webcam video + WiFi CSI fusion for real-time pose estimation.
Two parallel CNN pipelines (ruvector-cnn-wasm) with attention-weighted
fusion and dynamic confidence gating. Three modes: Dual, Video-only,
CSI-only. Includes pre-built WASM package (~52KB) for browser deployment.

- ADR-058: Dual-modal architecture design
- ui/pose-fusion.html: Main demo page with dark theme UI
- 7 JS modules: video-capture, csi-simulator, cnn-embedder, fusion-engine,
  pose-decoder, canvas-renderer, main orchestrator
- Pre-built ruvector-cnn-wasm WASM package for browser
- CSI heatmap, embedding space visualization, latency metrics
- WebSocket support for live ESP32 CSI data
- Navigation link added to main dashboard

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: motion-responsive skeleton + through-wall CSI tracking

- Pose decoder now uses per-cell motion grid to track actual arm/head
  positions — raising arms moves the skeleton's arms, head follows
  lateral movement
- Motion grid (10x8 cells) tracks intensity per body zone: head,
  left/right arm upper/mid, legs
- Through-wall mode: when person exits frame, CSI maintains presence
  with slow decay (~10s) and skeleton drifts in exit direction
- CSI simulator persists sensing after video loss, ghost pose renders
  with decreasing confidence
- Reduced temporal smoothing (0.45) for faster response to movement

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: video fills available space + correct WASM path resolution

- Remove fixed aspect-ratio and max-height from video panel so it
  fills the available viewport space without scrolling
- Grid uses 1fr row for content area, overflow:hidden on main grid
- Fix WASM path: resolve relative to JS module file using import.meta.url
  instead of hardcoded ./pkg/ which resolved incorrectly on gh-pages
- Responsive: mobile still gets aspect-ratio constraint

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: live ESP32 CSI pipeline + auto-connect WebSocket

- Add auto-connect to local sensing server WebSocket (ws://localhost:8765)
- Demo shows "Live ESP32" when connected to real CSI data
- Add build_firmware.ps1 for native Windows ESP-IDF builds (no Docker)
- Add read_serial.ps1 for ESP32 serial monitor

Pipeline: ESP32 → UDP:5005 → sensing-server → WS:8765 → browser demo

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: add ADR-059 live ESP32 CSI pipeline + update README with demo links

- ADR-059: Documents end-to-end ESP32 → sensing server → browser pipeline
- README: Add dual-modal pose fusion demo link, update ADR count to 49
- References issue #245

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: RSSI visualization, RuVector attention WASM, cache-bust fixes

- Add animated RSSI Signal Strength panel with sparkline history
- Fix RuVector WasmMultiHeadAttention retptr calling convention
- Wire up RuVector Multi-Head + Flash Attention in CNN embedder
- Add ambient temporal drift to CSI simulator for visible heatmap animation
- Fix embedding space projection (sparse projection replaces cancelling sum)
- Add auto-scaling to embedding space renderer
- Add cache busters (?v=4) to all ES module imports to prevent stale caches
- Add diagnostic logging for module version verification
- Add RSSI tracking with quality labels and color-coded dBm display
- Includes ruvector-attention-wasm v2.0.5 browser ESM wrapper

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: 26-keypoint dexterous pose + full RuVector attention pipeline

Pose Decoder (17 → 26 keypoints):
- Add finger approximations: thumb, index, pinky per hand (6 new)
- Add toe tips: left/right foot index (2 new)
- Add neck keypoint (1 new)
- Hand openness driven by arm motion intensity
- Finger positions computed from wrist-elbow axis angles

CNN Embedder (full RuVector WASM pipeline):
- Stage 1: Multi-Head Attention (global spatial reasoning)
- Stage 2: Hyperbolic Attention (hierarchical body-part tree)
- Stage 3: MoE Attention (3 experts: upper/lower/extremities, top-2)
- Blended 40/30/30 weighting → final embedding projection

Canvas Renderer:
- Magenta finger joints with distinct glow
- Cyan toe tips
- White neck keypoint
- Thinner limb lines for hand/foot connections
- Joint count shown in overlay label

CSI Simulator:
- Skip synthetic person state when live ESP32 connected
- Only simulate CSI data in demo mode (was already correct)

Embedding Space:
- Fixed projection: sparse 8-dim projection replaces cancelling sum
- Auto-scaling normalizes point spread to fill canvas

Cache busters bumped to v=5 on all imports.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: centroid-based pose tracking for responsive limb movement

Rewrites pose decoder from intensity-based to position-based tracking:
- Arms now track toward motion centroid in each body zone
- Elbow/wrist positions computed along shoulder→centroid vector
- Legs track toward lower-body zone centroids
- Smoothing reduced from 0.45 to 0.25 for responsiveness
- Zone centroids blend 30% old / 70% new each frame

6 body zones with overlapping coverage:
- Head (top 20%, center cols)
- Left/Right Arm (rows 10-60%, outer cols)
- Torso (rows 15-55%, center cols)
- Left/Right Leg (rows 50-100%, half cols each)

Hand openness now driven by arm spread distance + raise amount.
Cache busters v=6.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: remove duplicate lAnkleX/rAnkleX declarations in pose-decoder

Stale code block from old intensity-based tracking was left behind,
re-declaring variables already defined by centroid-based tracking.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(demo): wire all 6 RuVector WASM attention mechanisms into pose fusion

- Add WasmLinearAttention and WasmLocalGlobalAttention to browser ESM wrapper
- Add 6 WASM utility functions (batch_normalize, pairwise_distances, etc.)
- Extend CnnEmbedder to 6-stage pipeline: Flash → MHA → Hyperbolic → Linear → MoE → L+G
- Use log-energy softmax blending across all 6 stages
- Wire WASM cosine_similarity and normalize into FusionEngine
- Add RuVector pipeline stats panel to UI (energy, refinement, pose impact)
- Compute embedding-to-joint mapping stats without modifying joint positions
- Center camera prompt with flexbox layout
- Add cache busters v=12

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-12 20:59:57 -04:00

392 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-058: Dual-Modal WASM Browser Pose Estimation — Live Video + WiFi CSI Fusion
- **Status**: Proposed
- **Date**: 2026-03-12
- **Deciders**: ruv
- **Tags**: wasm, browser, cnn, pose-estimation, ruvector, video, multimodal, fusion
## Context
WiFi-DensePose estimates human poses from WiFi CSI (Channel State Information).
The `ruvector-cnn` crate provides a pure Rust CNN (MobileNet-V3) with WASM bindings.
Both modalities exist independently — what's missing is **fusing live webcam video
with WiFi CSI** in a single browser demo to achieve robust pose estimation that
works even when one modality degrades (occlusion, signal noise, poor lighting).
Existing assets:
1. **`wifi-densepose-wasm`** — CSI signal processing compiled to WASM
2. **`wifi-densepose-sensing-server`** — Axum server streaming live CSI via WebSocket
3. **`ruvector-cnn`** — Pure Rust CNN with MobileNet-V3 backbones, SIMD, contrastive learning
4. **`ruvector-cnn-wasm`** — wasm-bindgen bindings: `WasmCnnEmbedder`, `SimdOps`, `LayerOps`, contrastive losses
5. **`vendor/ruvector/examples/wasm-vanilla/`** — Reference vanilla JS WASM example
Research shows multi-modal fusion (camera + WiFi) significantly outperforms either alone:
- Camera fails under occlusion, poor lighting, privacy constraints
- WiFi CSI fails with signal noise, multipath, low spatial resolution
- Fusion compensates: WiFi provides through-wall coverage, camera provides fine-grained detail
## Decision
Build a **dual-modal browser demo** at `examples/wasm-browser-pose/` that:
1. Captures **live webcam video** via `getUserMedia` API
2. Receives **live WiFi CSI** via WebSocket from the sensing server
3. Processes **both streams** through separate CNN pipelines in `ruvector-cnn-wasm`
4. **Fuses embeddings** with learned attention weights for combined pose estimation
5. Renders **video overlay** with skeleton + WiFi confidence heatmap on Canvas
6. Runs entirely in the browser — all inference client-side via WASM
### Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ Browser │
│ │
│ ┌────────────┐ ┌────────────────┐ ┌───────────────────┐ │
│ │ getUserMedia│───▶│ Video Frame │───▶│ CNN WASM │ │
│ │ (Webcam) │ │ Capture │ │ (Visual Embedder) │ │
│ └────────────┘ │ 224×224 RGB │ │ → 512-dim │ │
│ └────────────────┘ └────────┬──────────┘ │
│ │ │
│ visual_embedding │
│ │ │
│ ┌──────▼──────┐ │
│ ┌────────────┐ ┌────────────────┐ │ │ │
│ │ WebSocket │───▶│ CSI WASM │ │ Attention │ │
│ │ Client │ │ (densepose- │ │ Fusion │ │
│ │ │ │ wasm) │ │ Module │ │
│ └────────────┘ └───────┬────────┘ │ │ │
│ │ └──────┬──────┘ │
│ ┌───────▼────────┐ │ │
│ │ CNN WASM │ fused_embedding │
│ │ (CSI Embedder) │ │ │
│ │ → 512-dim │ ┌──────▼──────┐ │
│ └───────┬────────┘ │ Pose │ │
│ │ │ Decoder │ │
│ csi_embedding │ → 17 kpts │ │
│ │ └──────┬──────┘ │
│ └──────────────────────┘ │
│ │ │
│ ┌──────────────┐ ┌─────▼──────┐ │
│ │ Video Canvas │◀────────│ Overlay │ │
│ │ + Skeleton │ │ Renderer │ │
│ │ + Heatmap │ └────────────┘ │
│ └──────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
▲ ▲
│ getUserMedia │ WebSocket
│ (camera) │ (ws://host:3030/ws/csi)
│ │
┌────┴────┐ ┌───────┴─────────┐
│ Webcam │ │ Sensing Server │
└─────────┘ └─────────────────┘
```
### Dual Pipeline Design
Two parallel CNN pipelines run on each frame tick (~30 FPS):
| Pipeline | Input | Preprocessing | CNN Config | Output |
|----------|-------|---------------|------------|--------|
| **Visual** | Webcam frame (640×480) | Resize to 224×224 RGB, ImageNet normalize | MobileNet-V3 Small, 512-dim | Visual embedding |
| **CSI** | CSI frame (ADR-018 binary) | Amplitude/phase/delta → 224×224 pseudo-RGB | MobileNet-V3 Small, 512-dim | CSI embedding |
Both use the same `WasmCnnEmbedder` but with separate instances and weight sets.
### Fusion Strategy
**Learned attention-weighted fusion** combines the two 512-dim embeddings:
```javascript
// Attention fusion: learn which modality to trust per-dimension
// α ∈ [0,1]^512 — attention weights (shipped as JSON, trained offline)
// visual_emb, csi_emb ∈ R^512
function fuseEmbeddings(visual_emb, csi_emb, attention_weights) {
const fused = new Float32Array(512);
for (let i = 0; i < 512; i++) {
const α = attention_weights[i];
fused[i] = α * visual_emb[i] + (1 - α) * csi_emb[i];
}
return fused;
}
```
**Dynamic confidence gating** adjusts fusion based on signal quality:
| Condition | Behavior |
|-----------|----------|
| Good video + good CSI | Balanced fusion (α ≈ 0.5) |
| Poor lighting / occlusion | CSI-dominant (α → 0, WiFi takes over) |
| CSI noise / no ESP32 | Video-dominant (α → 1, camera only) |
| Video-only mode (no WiFi) | α = 1.0, pure visual CNN pose estimation |
| CSI-only mode (no camera) | α = 0.0, pure WiFi pose estimation |
Quality detection:
- **Video quality**: Frame brightness variance (dark = low quality), motion blur score
- **CSI quality**: Signal-to-noise ratio from `wifi-densepose-wasm`, coherence gate output
### CSI-to-Image Encoding
CSI data encoded as 3-channel pseudo-image for the CSI CNN pipeline:
| Channel | Data | Normalization |
|---------|------|---------------|
| R | CSI amplitude (subcarrier × time window) | Min-max to [0, 255] |
| G | CSI phase (unwrapped, subcarrier × time window) | Min-max to [0, 255] |
| B | Temporal difference (frame-to-frame Δ amplitude) | Abs, min-max to [0, 255] |
### Video Processing
Webcam frames processed through standard ImageNet pipeline:
```javascript
// Capture frame from video element
const frame = captureVideoFrame(videoElement, 224, 224); // Returns Uint8Array RGB
// ImageNet normalization happens inside WasmCnnEmbedder.extract()
const visual_embedding = visual_embedder.extract(frame, 224, 224);
```
### Pose Keypoint Mapping
17 COCO-format keypoints decoded from the fused 512-dim embedding:
```
0: nose 1: left_eye 2: right_eye
3: left_ear 4: right_ear 5: left_shoulder
6: right_shoulder 7: left_elbow 8: right_elbow
9: left_wrist 10: right_wrist 11: left_hip
12: right_hip 13: left_knee 14: right_knee
15: left_ankle 16: right_ankle
```
Each keypoint decoded as (x, y, confidence) = 51 values from the 512-dim embedding
via a learned linear projection.
### Operating Modes
The demo supports three modes, selectable in the UI:
| Mode | Video | CSI | Fusion | Use Case |
|------|-------|-----|--------|----------|
| **Dual (default)** | ✅ | ✅ | Attention-weighted | Best accuracy, full demo |
| **Video Only** | ✅ | ❌ | α = 1.0 | No ESP32 available, quick demo |
| **CSI Only** | ❌ | ✅ | α = 0.0 | Privacy mode, through-wall sensing |
**Video Only mode works without any hardware** — just a webcam — making the demo
instantly accessible for anyone wanting to try it.
### File Layout
```
examples/wasm-browser-pose/
├── index.html # Single-page app (vanilla JS, no bundler)
├── js/
│ ├── app.js # Main entry, mode selection, orchestration
│ ├── video-capture.js # getUserMedia, frame extraction, quality detection
│ ├── csi-processor.js # WebSocket CSI client, frame parsing, pseudo-image encoding
│ ├── fusion.js # Attention-weighted embedding fusion, confidence gating
│ ├── pose-decoder.js # Fused embedding → 17 keypoints
│ └── canvas-renderer.js # Video overlay, skeleton, CSI heatmap, confidence bars
├── data/
│ ├── visual-weights.json # Visual CNN → embedding projection (placeholder until trained)
│ ├── csi-weights.json # CSI CNN → embedding projection (placeholder until trained)
│ ├── fusion-weights.json # Attention fusion α weights (512 values)
│ └── pose-weights.json # Fused embedding → keypoint projection
├── css/
│ └── style.css # Dark theme UI styling
├── pkg/ # Built WASM packages (gitignored, built by script)
│ ├── wifi_densepose_wasm/
│ └── ruvector_cnn_wasm/
├── build.sh # wasm-pack build script for both packages
└── README.md # Setup and usage instructions
```
### Build Pipeline
```bash
#!/bin/bash
# build.sh — builds both WASM packages into pkg/
set -e
# Build wifi-densepose-wasm (CSI processing)
wasm-pack build ../../rust-port/wifi-densepose-rs/crates/wifi-densepose-wasm \
--target web --out-dir "$(pwd)/pkg/wifi_densepose_wasm" --no-typescript
# Build ruvector-cnn-wasm (CNN inference for both video and CSI)
wasm-pack build ../../vendor/ruvector/crates/ruvector-cnn-wasm \
--target web --out-dir "$(pwd)/pkg/ruvector_cnn_wasm" --no-typescript
echo "Build complete. Serve with: python3 -m http.server 8080"
```
### UI Layout
```
┌─────────────────────────────────────────────────────────┐
│ WiFi-DensePose — Live Dual-Modal Pose Estimation │
│ [Dual Mode ▼] [⚙ Settings] FPS: 28 ◉ Live │
├───────────────────────────┬─────────────────────────────┤
│ │ │
│ ┌───────────────────┐ │ ┌───────────────────┐ │
│ │ │ │ │ │ │
│ │ Video + Skeleton │ │ │ CSI Heatmap │ │
│ │ Overlay │ │ │ (amplitude × │ │
│ │ (main canvas) │ │ │ subcarrier) │ │
│ │ │ │ │ │ │
│ └───────────────────┘ │ └───────────────────┘ │
│ │ │
├───────────────────────────┴─────────────────────────────┤
│ Fusion Confidence: ████████░░ 78% │
│ Video: ██████████ 95% │ CSI: ██████░░░░ 61% │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────┐ │
│ │ Embedding Space (2D projection) │ │
│ │ · · · │ │
│ │ · · · · · · (color = pose cluster) │ │
│ │ · · · · │ │
│ └─────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Latency: Video 12ms │ CSI 8ms │ Fusion 1ms │ Total 21ms│
│ [▶ Record] [📷 Snapshot] [Confidence: ████ 0.6] │
└─────────────────────────────────────────────────────────┘
```
### WASM Module Structure
| Package | Source Crate | Provides | Size (est.) |
|---------|-------------|----------|-------------|
| `wifi_densepose_wasm` | `wifi-densepose-wasm` | CSI frame parsing, signal processing, feature extraction | ~200KB |
| `ruvector_cnn_wasm` | `ruvector-cnn-wasm` | `WasmCnnEmbedder` (×2 instances), `SimdOps`, `LayerOps`, contrastive losses | ~150KB |
Two `WasmCnnEmbedder` instances are created — one for video frames, one for CSI pseudo-images.
They share the same WASM module but have independent state.
### Browser API Requirements
| API | Purpose | Required | Fallback |
|-----|---------|----------|----------|
| `getUserMedia` | Webcam capture | For video mode | CSI-only mode |
| WebAssembly | CNN inference | Yes | None (hard requirement) |
| WASM SIMD128 | Accelerated inference | No | Scalar fallback (~2× slower) |
| WebSocket | CSI data stream | For CSI mode | Video-only mode |
| Canvas 2D | Rendering | Yes | None |
| `requestAnimationFrame` | Render loop | Yes | `setTimeout` fallback |
| ES Modules | Code organization | Yes | None |
Target: Chrome 89+, Firefox 89+, Safari 15+, Edge 89+
### Performance Budget
| Stage | Target Latency | Notes |
|-------|---------------|-------|
| Video frame capture + resize | <3ms | `drawImage` to offscreen canvas |
| Video CNN embedding | <15ms | 224×224 RGB 512-dim |
| CSI receive + parse | <2ms | Binary WebSocket message |
| CSI pseudo-image encoding | <3ms | Amplitude/phase/delta channels |
| CSI CNN embedding | <15ms | 224×224 pseudo-RGB 512-dim |
| Attention fusion | <1ms | Element-wise weighted sum |
| Pose decoding | <1ms | Linear projection |
| Canvas overlay render | <3ms | Video + skeleton + heatmap |
| **Total (dual mode)** | **<33ms** | **30 FPS capable** |
| **Total (video only)** | **<22ms** | **45 FPS capable** |
Note: Video and CSI CNN pipelines can run in parallel using Web Workers,
reducing dual-mode latency to ~max(15, 15) + 5 = ~20ms (50 FPS).
### Contrastive Learning Integration
The demo optionally shows real-time contrastive learning in the browser:
- **InfoNCE loss** (`WasmInfoNCELoss`): Compare video vs CSI embeddings for the same pose trains cross-modal alignment
- **Triplet loss** (`WasmTripletLoss`): Push apart different poses, pull together same pose across modalities
- **SimdOps**: Accelerated dot products for real-time similarity computation
- **Embedding space panel**: Live 2D projection shows video and CSI embeddings converging when viewing the same person
### Relationship to Existing Crates
| Existing Crate | Role in This Demo |
|---------------|-------------------|
| `ruvector-cnn-wasm` | CNN inference for **both** video frames and CSI pseudo-images |
| `wifi-densepose-wasm` | CSI frame parsing and signal processing |
| `wifi-densepose-sensing-server` | WebSocket CSI data source |
| `wifi-densepose-core` | ADR-018 frame format definitions |
| `ruvector-cnn` | Underlying MobileNet-V3, layers, contrastive learning |
No new Rust crates are needed. The example is pure HTML/JS consuming existing WASM packages.
## Consequences
### Positive
- **Instant demo**: Video-only mode works with just a webcam no ESP32 needed
- **Multi-modal showcase**: Demonstrates camera + WiFi fusion, the core innovation of the project
- **Graceful degradation**: Works with video-only, CSI-only, or both
- **Through-wall capability**: CSI mode shows pose estimation where cameras cannot reach
- **Zero-install**: Anyone with a browser can try it
- **Training data collection**: Can record paired (video, CSI) data for offline model training
- **Reusable**: JS modules embed directly in the Tauri desktop app's webview
### Negative
- **Model weights**: Requires offline-trained weights for visual CNN, CSI CNN, fusion, and pose decoder (~200KB total JSON)
- **WASM size**: Two WASM modules total ~350KB (acceptable)
- **No GPU**: CPU-only WASM inference; adequate at 224×224 but limits resolution scaling
- **Camera privacy**: Video mode requires camera permission (mitigated: CSI-only mode available)
- **Two CNN instances**: Memory footprint doubles vs single-modal (~10MB total, acceptable for desktop browsers)
### Risks
- **Cross-modal alignment**: Video and CSI embeddings must be trained jointly for fusion to work;
without proper training, fusion may be worse than either modality alone
- **Latency on mobile**: Dual CNN on mobile browsers may exceed 33ms; implement automatic quality reduction
- **WebSocket drops**: Network jitter CSI frame gaps; buffer last 3 frames, interpolate missing data
## Implementation Plan
1. **Phase 1 — Scaffold**: File layout, build.sh, index.html shell, mode selector UI
2. **Phase 2 — Video pipeline**: getUserMedia frame capture CNN embedding basic pose display
3. **Phase 3 — CSI pipeline**: WebSocket client CSI parsing pseudo-image CNN embedding
4. **Phase 4 — Fusion**: Attention-weighted combination, confidence gating, mode switching
5. **Phase 5 — Pose decoder**: Linear projection with placeholder weights 17 keypoints
6. **Phase 6 — Overlay renderer**: Video canvas with skeleton overlay, CSI heatmap panel
7. **Phase 7 — Training**: Use `wifi-densepose-train` to generate real weights for both CNNs + fusion + decoder
8. **Phase 8 — Contrastive demo**: Embedding space visualization, cross-modal similarity display
9. **Phase 9 — Web Workers**: Move CNN inference to workers for parallel video + CSI processing
10. **Phase 10 — Polish**: Recording, snapshots, adaptive quality, mobile optimization
## Alternatives Considered
### 1. CSI-Only (No Video)
Rejected: Misses the opportunity to show multi-modal fusion and makes the demo less
accessible (requires ESP32 hardware). Video-only mode as a fallback is strictly better.
### 2. Server-Side Video Inference
Rejected: Adds latency, requires webcam stream upload (privacy concern), and defeats
the WASM-first architecture. All inference must be client-side.
### 3. TensorFlow.js for Video, ruvector-cnn-wasm for CSI
Rejected: Would require two different ML frameworks. Using `ruvector-cnn-wasm` for both
keeps a single WASM module, unified embedding space, and simpler fusion.
### 4. Pre-recorded Video Demo
Rejected: Live webcam input is far more compelling for demonstrations.
Pre-recorded mode can be added as a secondary option.
### 5. React/Vue Framework
Rejected: Adds build tooling. Vanilla JS + ES modules keeps the demo self-contained.
## References
- [ADR-018: Binary CSI Frame Format](ADR-018-binary-csi-frame-format.md)
- [ADR-024: Contrastive CSI Embedding / AETHER](ADR-024-contrastive-csi-embedding.md)
- [ADR-055: Integrated Sensing Server](ADR-055-integrated-sensing-server.md)
- `vendor/ruvector/crates/ruvector-cnn/src/lib.rs` CNN embedder implementation
- `vendor/ruvector/crates/ruvector-cnn-wasm/src/lib.rs` WASM bindings
- `vendor/ruvector/examples/wasm-vanilla/index.html` Reference vanilla JS WASM pattern
- Person-in-WiFi: Fine-grained Person Perception using WiFi (ICCV 2019) camera+WiFi fusion precedent
- WiPose: Multi-Person WiFi Pose Estimation (TMC 2022) cross-modal embedding approach