mirror of
https://github.com/ruvnet/RuView.git
synced 2026-04-28 05:59:32 +00:00
docs: ADR-081 add ruvector-cnn spectrogram gesture classification
- Replace DTW with CNN on CSI spectrograms via ruvector-cnn WASM - Pipeline: CSI → STFT → 64x64 spectrogram → CnnEmbedder → 128-dim → classifier - Two-phase training: InfoNCE contrastive + supervised classification - Dual-path fusion: DTW + CNN in parallel for max robustness - Comparison table: CNN ~95% vs DTW ~85% accuracy (literature) - Fallback: lightweight 1D CNN for ESP32 edge deployment Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
68d47a25d5
commit
3b4e151507
1 changed files with 99 additions and 1 deletions
|
|
@ -439,9 +439,107 @@ node scripts/train-gesture-model.js \
|
|||
**Training data per gesture:** ~20 examples × 11 gestures = 220 labeled samples.
|
||||
With augmentation (time warp, amplitude noise): ~1,000 effective samples.
|
||||
|
||||
### Optimization: ruvector-cnn Spectrogram Gesture Classification
|
||||
|
||||
Replace DTW template matching with a CNN operating on CSI spectrograms via the
|
||||
`ruvector-cnn` WASM package (ADR-076). This treats each gesture as an image
|
||||
classification problem on the CSI time-frequency representation.
|
||||
|
||||
#### Why CNN Over DTW
|
||||
|
||||
| | DTW (current, ADR-029) | CNN Spectrogram (proposed) |
|
||||
|---|---|---|
|
||||
| Input | 1D keypoint trajectories | 2D CSI spectrogram image |
|
||||
| Features | Hand-crafted (wrist velocity, elbow angle) | Learned end-to-end |
|
||||
| Robustness | Sensitive to speed variation | Warp-invariant (pooling layers) |
|
||||
| Multi-scale | Single scale | Hierarchical (dilated convolutions) |
|
||||
| Training | Template recording + DTW distance | Supervised from camera labels |
|
||||
| New gestures | Record new template | Retrain (or few-shot with embedding) |
|
||||
| Accuracy | ~85% (DTW literature) | ~95%+ (CNN on spectrograms, literature) |
|
||||
|
||||
#### Pipeline
|
||||
|
||||
```
|
||||
CSI [N_subcarriers, T=30] (1-second window)
|
||||
↓
|
||||
Spectrogram transform: STFT per subcarrier
|
||||
→ [N_sub, F_bins, T_bins] ≈ [35, 16, 15]
|
||||
↓
|
||||
Reshape to grayscale image: [35×16, 15] = [560, 15]
|
||||
→ Resize to [64, 64] (bilinear)
|
||||
↓
|
||||
ruvector-cnn CnnEmbedder (WASM-accelerated)
|
||||
→ 128-dim gesture embedding
|
||||
↓
|
||||
Classifier head: Linear(128 → 18 gestures) + softmax
|
||||
→ gesture_id + confidence
|
||||
```
|
||||
|
||||
#### ruvector-cnn Integration
|
||||
|
||||
The `@ruvector/cnn` WASM package provides:
|
||||
|
||||
```javascript
|
||||
const { init, CnnEmbedder, InfoNCELoss } = require('@ruvector/cnn');
|
||||
await init();
|
||||
|
||||
// Create embedder for 64x64 CSI spectrogram "images"
|
||||
const embedder = new CnnEmbedder({
|
||||
inputSize: 64,
|
||||
embeddingDim: 128,
|
||||
normalize: true,
|
||||
});
|
||||
|
||||
// Extract embedding from CSI spectrogram
|
||||
const spectrogram = csiToSpectrogram(csiWindow); // [64, 64] Uint8Array
|
||||
const embedding = embedder.extract(spectrogram, 64, 64);
|
||||
|
||||
// Classify gesture via nearest-neighbor to trained templates
|
||||
const gesture = classifyGesture(embedding, gestureTemplates);
|
||||
```
|
||||
|
||||
#### Training with Contrastive + Classification
|
||||
|
||||
Two-phase training using ruvector-cnn's built-in losses:
|
||||
|
||||
**Phase 1: Contrastive embedding (unsupervised)**
|
||||
```javascript
|
||||
const loss = new InfoNCELoss(0.07);
|
||||
// Same gesture performed at different speeds → positive pairs
|
||||
// Different gestures → negative pairs
|
||||
// Train CnnEmbedder to cluster same-gesture spectrograms
|
||||
```
|
||||
|
||||
**Phase 2: Gesture classification (supervised)**
|
||||
```javascript
|
||||
// Linear classifier on frozen embeddings
|
||||
// 18 gestures × 20 examples each = 360 labeled samples
|
||||
// Camera auto-labels: MediaPipe Hands detects gesture type
|
||||
```
|
||||
|
||||
#### Dual-Path Architecture
|
||||
|
||||
Run both CNN and DTW in parallel for maximum robustness:
|
||||
|
||||
```
|
||||
CSI input ──┬──→ WiFlow → keypoints → DTW templates → gesture_A (conf_A)
|
||||
│
|
||||
└──→ Spectrogram → ruvector-cnn → embedding → classifier → gesture_B (conf_B)
|
||||
|
||||
Fusion: if gesture_A == gesture_B → conf = max(conf_A, conf_B) + 0.15
|
||||
if conflict → pick higher confidence
|
||||
if only one detects → use it at 0.8× confidence
|
||||
```
|
||||
|
||||
This dual-path approach provides:
|
||||
- **DTW** catches gestures the CNN might miss (novel variations)
|
||||
- **CNN** provides higher accuracy for trained gesture types
|
||||
- **Fusion** reduces false positives (both must agree for high-confidence)
|
||||
|
||||
### Optimization: Temporal Gesture Encoding
|
||||
|
||||
Instead of classifying single frames, encode gesture trajectories:
|
||||
Alternative lightweight path for when ruvector-cnn WASM overhead matters
|
||||
(e.g., ESP32 edge deployment):
|
||||
|
||||
```
|
||||
Keypoint sequence [T=30 frames, 1 second]:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue