Pushes the ADR-084 novelty sensor down into the ESP32 sensor MCU's Layer 4 (On-device Feature Extraction) of ADR-081's 5-layer kernel: sketch + 32-slot ring bank in IRAM, suppress UDP send when novelty < CONFIG_RV_EDGE_NOVELTY_THRESHOLD (default 0.05). Wire format bumps to magic 0xC5110007 with two new fields (suppressed_since_last: u16, gate_version: u8) packed in by narrowing the existing 16-bit quality_flags to 8-bit (only 8 bits were ever defined). Frame size stays at 60 bytes; v6 receivers fall back gracefully. Stuck-gate self-heal at CONFIG_RV_EDGE_MAX_CONSEC_SUPPRESS (default 50 frames ≈ 10 s) so a wedged threshold can't silently disappear a node. Default-off Kconfig so existing deployments are unaffected. Validation commitments: - ≤ 200 µs sketch insert+score on Xtensa LX7 - ≥ 30% UDP TX-energy reduction in steady-state quiet rooms - ≤ 5 pp drop on cluster-Pi novelty top-K coverage vs unsuppressed - ≥ 50% bandwidth reduction in stable-room scenarios Six-pass implementation plan, default-off Kconfig, QEMU + COM7 hardware-in-loop validation. Honest gaps flagged: Xtensa LX7 POPCNT absence is conjecture (Pass 2 bench is the falsifier); interaction with ADR-082's Tentative→Active gate is the likeliest weak point (Open Q4). ADR-087 / ADR-088 reserved as pointer stubs at end: - ADR-087: Pass-4 mesh-exchange scope (cluster↔cluster vs sensor→Pi) - ADR-088: Firmware-release coordination policy Status: Proposed. SOTA review by goal-planner agent.
24 KiB
ADR-086: Edge Novelty Gate — Push the RaBitQ Sensor Down to the Sensor MCU
| Field | Value |
|---|---|
| Status | Proposed |
| Date | 2026-04-26 |
| Authors | ruv |
| Refines | ADR-081 (5-layer adaptive CSI mesh firmware kernel — Layer 4 / On-device feature extraction), ADR-084 (RaBitQ similarity sensor) |
| Touches | ADR-018 (binary CSI frame magic discipline), ADR-028 (capability audit / witness verification), ADR-082 (confirmed-track output filter), ADR-085 (RaBitQ pipeline expansion) |
| Companion | firmware/esp32-csi-node/main/rv_feature_state.h (current 0xC5110006 v6 wire format), docs/research/architecture/three-tier-rust-node.md (BQ24074 power budget context), vendor/ruvector/crates/ruvector-core/src/quantization.rs::BinaryQuantized (std reference implementation that this ADR will not directly reuse on-MCU) |
Context
ADR-081's 5-layer firmware kernel today emits one rv_feature_state_t
packet per node every 100–1000 ms (1–10 Hz, default 5 Hz on COM7),
60 bytes payload, magic 0xC5110006, regardless of how interesting
the underlying CSI window was. At a 5 Hz baseline the per-node steady-
state load is ~300 B/s of UDP plus the radio TX duty that emits it.
Across a 12-node deployment the cluster Pi sees ~3.6 kB/s of
feature-state — not a bandwidth crisis on its own, but every one of
those packets also costs sensor-MCU radio TX energy, every one
contends for ESP-WIFI-MESH airtime per ADR-081 Layer 3, and every one
runs through the cluster-Pi novelty bank ADR-084 Pass 3 only to be
classified as "nothing new" most of the time in a quiet room.
ADR-084 made novelty cheap on the cluster-Pi side. The same novelty
sensor is structurally local: a sketch, a small ring of recent
sketches, and a hamming-distance compare. Pushing that gate down into
the sensor MCU's Layer 4 (On-device feature extraction) lets the node
not transmit a frame the cluster-Pi would have filed under
"familiar" anyway. Bandwidth, sensor-MCU TX energy, and RF airtime
all win, and the cluster-Pi novelty path stops re-doing work the edge
already proved pointless. This is the natural ADR-085 follow-up
flagged but deliberately left out of the ADR-085 scope because it
requires a no_std sketch port, a Kconfig-gated rollout, a wire-
format bump, and a fresh witness regeneration — none of which are
appropriate inside an in-flight cluster-Pi work loop.
The crux of the decision is whether the cost of (a) hand-porting the
sketch primitive to no_std Xtensa LX7, (b) sizing the in-IRAM ring
without disturbing the existing Layer 4 budget, (c) bumping the
rv_feature_state_t magic and teaching the cluster-Pi a graceful
v6/v7 fallback, and (d) re-cutting the ADR-028 witness bundle is
justified by the suppression rate the gate actually achieves on real
deployments. The answer should be obvious in stable rooms (≥50 %
suppression looks easy) and ambiguous in active rooms (suppression
should drop sharply, which is exactly what we want). This ADR commits
to numbers up front so the decision is falsifiable.
Decision
Adopt an edge novelty gate in the sensor MCU's Layer 4 of
ADR-081's 5-layer kernel. The gate sits between feature extraction
and the existing UDP send path; when novelty is below a configurable
threshold the frame is not transmitted, and the node accumulates
a per-source suppressed_since_last counter that is folded into the
next non-suppressed packet. This keeps the cluster-Pi's books
honest — the edge can suppress bandwidth, but it can never
silently suppress the fact of suppression.
Components
The implementation is two pieces, both new in
firmware/esp32-csi-node/main/:
rv_sketch.{h,c}— ano_std-equivalent (plain C, ESP-IDF) 1-bit sketch primitive. Sign-quantize a feature vector, pack into bytes ((dim + 7) / 8bytes), hamming distance via 8-bit table-lookup popcount. Xtensa LX7 has no hardware POPCNT instruction (no primary source consulted; conjecture based on the ESP32-S3 TRM not advertising one — to be confirmed by checking the TRM under bit-manipulation extensions); the table-lookup scalar baseline is the right starting point and is already whatBinaryQuantizedfalls back to on architectures without a SIMD POPCNT path (vendor/ruvector/crates/ruvector-core/src/quantization.rs, lines 332–340).- An IRAM-resident sketch ring. Fixed size at compile time:
RV_EDGE_BANK_SIZEslots ×RV_EDGE_VECTOR_DIM_BYTESbytes. For the default Layer 4 feature dimension of 56 (matching the subcarrier-selection / interpolation target widely used in this codebase), the ring at the default 32 slots costs32 × 7 = 224 bytes. A 64-slot ring at 56 d costs 448 bytes — both sit comfortably inside the existing static-memory budget on either the 4 MB or 8 MB Waveshare AMOLED ESP32-S3 board, well clear of ADR-081 Layer 4's existing window buffers. Eviction is FIFO; on each new sketch the oldest is overwritten.
Gating policy
For each completed Layer 4 feature window:
1. compute feature vector (existing)
2. sketch = sign_quantize(feature_vector) // new
3. nearest_hamming = ring_min_distance(sketch) // new
4. novelty = nearest_hamming / dim // 0..1, new
5. if novelty >= CONFIG_RV_EDGE_NOVELTY_THRESHOLD
OR suppressed_since_last >= CONFIG_RV_EDGE_MAX_CONSEC_SUPPRESS
OR CONFIG_RV_EDGE_FORCE_SEND:
ring_insert(sketch)
emit rv_feature_state_t v7 with suppressed_since_last
suppressed_since_last = 0
else:
suppressed_since_last += 1
// do not insert into ring — only confirmed-emitted sketches anchor the bank
Threshold default: CONFIG_RV_EDGE_NOVELTY_THRESHOLD = 500
basis-points (= 5.0 % of dimension). Kconfig does not accept floats
without contortion (the standard Espressif practice in our codebase
is to express thresholds as int basis-points or scaled fixed-point);
this preserves the Kconfig-as-truth discipline ADR-081 already
follows.
Suppression cap default:
CONFIG_RV_EDGE_MAX_CONSEC_SUPPRESS = 50. At 5 Hz that is 10 s of
forced silence at most before a "stuck gate" self-heals into a
forced send — comparable to ADR-081's slow-loop 30 s recalibration
cadence and well below any user-visible UI staleness threshold.
Default-off gate: CONFIG_RV_EDGE_NOVELTY_GATE_ENABLE = n. Existing
deployments behave identically until they opt in.
Wire format — v7
Bump the rv_feature_state_t magic to 0xC5110007 and add three
bytes by reusing the existing 2-byte reserved field plus one byte
borrowed from the 16-bit quality_flags budget (only 8 of 16 flags
are defined today; we narrow to uint8_t quality_flags):
| Offset (v7) | Field | Notes |
|---|---|---|
| 0..3 | magic = 0xC5110007 |
new; differentiates from 0xC5110006 |
| 4 | node_id |
unchanged |
| 5 | mode |
unchanged |
| 6..7 | seq |
unchanged |
| 8..15 | ts_us |
unchanged |
| 16..51 | nine float features |
unchanged |
| 52 | quality_flags (uint8_t) |
narrowed from u16 — see Open Q3 |
| 53 | gate_version (uint8_t) |
new |
| 54..55 | suppressed_since_last |
new (uint16_t LE) |
| 56..59 | crc32 |
unchanged, computed over [0..56) |
Total size: still 60 bytes, wire-compatible at packet length but
not at field semantics — magic is the discriminator. Cluster-Pi
receivers that recognize 0xC5110007 interpret the new fields;
receivers that recognize 0xC5110006 continue to work but do not
see the suppression count. The receiver gracefully falls back when
it sees the v6 magic; this is the explicit graceful-fallback contract
ADR-081 already established for Layer 5 stream parsing.
The choice to narrow quality_flags from 16 to 8 bits relies on the
fact that rv_feature_state.h defines exactly 8 RV_QFLAG_* bits
today (lines 33–40); future flag growth is a separate ADR slot, and
the alternative — adding a 4th uint8_t and growing the packet to
64 bytes — costs a recompute of every Layer 5 parser and is more
intrusive than the magic bump.
Consequences
Positive
- Sensor-MCU UDP TX duty cycle drops by the suppression rate. A
back-of-envelope at 5 Hz: at 50 % suppression, ~150 B/s and
~2.5 packets/s per node instead of ~300 B/s and 5; at 90 %
suppression, ~30 B/s and 0.5 packets/s. ESP32-S3 TX energy at
+20 dBm is the dominant per-packet cost on the BQ24074-class node
(
docs/research/architecture/three-tier-rust-node.md§3.3 power budget shows ~80 mA active-CSI baseline with TX-burst spikes at ~150 mA peak; the gate primarily cuts the burst-frequency rather than the baseline). ≥30 % TX-energy reduction in steady-state quiet rooms is the validation target. - Cluster-Pi novelty path runs on a smaller stream. ADR-084 Pass 3 is unchanged in code, but the input rate it processes drops by the suppression rate. The Pi-side bank stops accumulating redundant "stable" anchors and concentrates its bank slots on actually-different frames. This is a quality win, not just a cost win.
- Mesh airtime contention drops, which improves ADR-081 Layer 3 for everyone else. Less feature-state traffic frees airtime for TIME_SYNC, ROLE_ASSIGN, FEATURE_DELTA, HEALTH, and ANOMALY_ALERT — the high-priority mesh-control traffic that today competes with routine feature-state in the same channel.
suppressed_since_lastis observable. The cluster-Pi can detect a node that has been suppressing for too long, a node whose suppression rate suddenly drops (occupant entered the room — the right behaviour), and a node whose suppression cap is triggering frequently (gate is mistuned). All three are useful signals and all three live in fields the receiver already parses.
Negative / risks
- The cluster-Pi-side novelty sensor sees fewer data points. This
is the load-bearing negative consequence and the most likely
source of regression. ADR-084 Pass 3's bank ages out anchors based
on insertion time; if the edge gate suppresses 70 % of frames in
a quiet room, the Pi bank receives 30 % of its expected anchor
rate and may take 3× longer to converge to a useful steady state
on a freshly-rebooted Pi. Mitigation: the validation acceptance
test runs the Pi-side novelty top-K coverage against an
unsuppressed baseline and budgets ≤5 percentage points regression.
If the cluster-Pi cold-start convergence becomes a real problem
the simplest patch is to force-send the first
CONFIG_RV_EDGE_FORCE_SEND_BURST(default 32) frames per Layer 2 slow-loop recalibration window — but this lives outside the ADR-086 baseline and is called out as a follow-up if needed. - Witness chain. Per ADR-028, every change to firmware
invalidates the witness bundle. Edge novelty gate is a non-trivial
firmware change: it touches Layer 4, adds a wire-format magic,
and ships a Kconfig surface. The witness bundle must be re-cut
and the SHA-256 of the proof bundle is expected to change
(which is the whole point of the witness — the change must be
visible). The post-change validation step is to run
bash scripts/generate-witness-bundle.shand confirm 7/7 PASS viadist/witness-bundle-ADR028-*/VERIFY.sh. - Two wire-format magics in the field at once. During rollout some nodes emit v6 and some v7. The cluster-Pi receiver must handle both, and the WebSocket "latest snapshot" path must not accidentally null-out the new fields when re-encoding for v6 consumers. The graceful-fallback contract is small (~30 LOC on the Pi), but it is a contract and breaking it loses observability for the v7 nodes. Validation includes a mixed-version soak.
- Pose-tracker interaction (Open Q4). ADR-082 added a confirmed- track output filter that already drops single-frame phantom poses before they reach the WebSocket. The edge gate could suppress the very frames that would have promoted a pose track from Tentative to Active — i.e., a person walks through a quiet room and the first 1–2 frames look "low novelty" because the gate hasn't seen them yet, then the gate suddenly fires and emits the third frame. ADR-082's three-frame minimum could miss a real pose. Mitigation candidates: (a) lower the threshold during ADR-082 Tentative-state minutes; (b) treat motion_score above a fixed floor as a force-send signal regardless of sketch novelty; (c) accept the regression as part of the "novelty is precisely what we wanted to gate on" framing. Decision deferred — Open Q4.
- Operator debuggability. A development-time
CONFIG_RV_EDGE_FORCE_SENDKconfig flag bypasses the gate entirely and is the right tool for diffing with-gate vs without-gate behaviour during a deployment. Required.
Neutral
- ADR-018's binary CSI frame stream is unchanged; the gate operates on Layer 4 feature state, not on the debug raw-CSI path.
- ADR-085's seven cluster-Pi-side sketch sites that consume
rv_feature_state_tsee fewer inputs but the same shape; Sites 6 (swarm routing) and 7 (event-stream anomaly) will be slightly less sensitive under v7. Re-measurement is recommended but is not a blocker for ADR-086.
Implementation
Six numbered passes, ordered cheapest-first / lowest-risk-first. Each is independently shippable, each has a one-line acceptance criterion that must pass before the next pass starts. Default-off Kconfig means none of these passes can break a deployment that has not opted in.
| # | Pass | Target | Acceptance |
|---|---|---|---|
| 1 | no_std sketch primitive port (firmware/esp32-csi-node/main/rv_sketch.{h,c}) |
sensor-MCU C | QEMU unit test: 56-d sign-quantize of a fixed seed produces the bit-pattern matching the host-side reference; hamming distance round-trips. |
| 2 | IRAM ring + insert/min-distance API | sensor-MCU C | On-target benchmark on COM7: insert + ring-min on 32 slots ≤ 200 µs at 240 MHz. |
| 3 | Kconfig flags (CONFIG_RV_EDGE_NOVELTY_GATE_ENABLE, _THRESHOLD, _MAX_CONSEC_SUPPRESS, _FORCE_SEND) |
firmware/esp32-csi-node/main/Kconfig.projbuild |
Build with each flag toggled produces the expected sdkconfig.defaults merge; unit test asserts threshold of 500 bps maps to 5.0 % decision boundary. |
| 4 | rv_feature_state_t v7 wire format + finalize() update |
firmware/esp32-csi-node/main/rv_feature_state.{h,c} |
_Static_assert(sizeof == 60) still holds; CRC32 over the new layout round-trips; v6 receiver test reads a v7 packet without panic and ignores the new fields. |
| 5 | Cluster-Pi reconciliation | crates/wifi-densepose-sensing-server/ UDP intake + ADR-084 Pass 3 novelty bank |
A v7 packet with suppressed_since_last = N causes the Pi-side bank to interpret the gap as low-novelty stable-baseline contribution rather than as missing data; integration test on a synthetic v7 stream. |
| 6 | QEMU + COM7 hardware-in-loop validation | end-to-end | Stable-room recording: ≥50 % suppression rate; cluster-Pi novelty top-K coverage regression ≤ 5 pp vs unsuppressed baseline; stuck-gate self-heal exercised in a unit test. |
Pass 1 deliberately does not depend on
vendor/ruvector/crates/ruvector-core::BinaryQuantized. That crate
is std-bound (Vec<u8>, is_x86_feature_detected!, NEON
intrinsics — quantization.rs lines 289–340) and porting it to
no_std Xtensa LX7 is not a one-line #![no_std] flip. The clean
path is a fresh minimal C primitive that matches the
BinaryQuantized behaviour (sign quantization, byte-table popcount
fallback, (dim+7)/8 packed bytes); the host-side reference becomes
a spec, not a dependency. A future no_std-clean Rust port may
unify both once esp-radio / esp-csi-rs matures (three-tier node
research §7.3) — out of scope here.
Validation
This ADR is Proposed. Acceptance requires every numbered Pass to meet its acceptance criterion and the following system-level numbers to hold on the COM7 hardware-in-loop run:
- Computation budget: sketch insert + ring-min ≤ 200 µs; total per-frame Layer 4 overhead (existing feature extraction + new gate) ≤ 500 µs at 240 MHz Xtensa LX7.
- Energy: ≥ 30 % UDP TX-energy reduction in stable-room scenarios, measured by packets-per-second × per-packet TX duty against an unsuppressed baseline. Direct mA-level measurement is out of scope for this ADR; the proxy metric is sufficient.
- Cluster-Pi accuracy: ≤ 5 percentage-point drop on the ADR-084 Pass 3 novelty top-K coverage metric vs an unsuppressed baseline run on the same recorded CSI.
- Bandwidth: ≥ 50 % reduction in steady-state quiet-room UDP byte rate per node.
- Stuck-gate self-heal: a unit test that pins the sketch primitive output to "always low novelty" must observe a forced send within ≤ 10 s (≤ 50 frames at 5 Hz).
- Existing test gates:
cargo test --workspace --no-default-featuresstays green;python v1/data/proof/verify.pystays green (the proof harness sees no firmware-side change and the SHA-256 should not move because the proof exercises Python pipeline math, not firmware behaviour); the witness bundle (scripts/generate-witness-bundle.sh) runs and the resultingVERIFY.shreports 7/7 PASS — the bundle's own SHA-256 will differ, which is the witness-chain signal that firmware changed.
If any system-level number fails, the gate ships behind
CONFIG_RV_EDGE_NOVELTY_GATE_ENABLE = n (default-off) and the ADR
moves to Rejected for that hardware target while the wire-format
v7 changes are kept (they cost nothing dormant). If only the cluster-
Pi accuracy number fails, the gate is allowed to ship at a more
conservative CONFIG_RV_EDGE_NOVELTY_THRESHOLD until the cluster-
Pi-side reconciliation logic catches up.
Open questions
- Does Xtensa LX7's lack of POPCNT make the table-lookup scalar baseline fast enough at 5 Hz? No primary-source confirmation performed — conjecture (the ESP32-S3 TRM is the primary source). At 7 bytes/sketch × 32 slots = 224 bytes of popcount per frame, even a pessimistic 100-cycles-per-byte estimate sits well under 200 µs at 240 MHz; Pass 2 bench resolves it.
- Should the IRAM ring be replaced by PSRAM-backed storage when the board has it? The 8 MB-flash Waveshare AMOLED ESP32-S3 ships with 8 MB PSRAM (CLAUDE.md hardware table; not a primary source — the board datasheet is); the ring at 32 slots × 7 bytes does not need PSRAM. A larger ring (1024 slots × 7 bytes ≈ 7 kB) to keep a longer history would benefit from PSRAM. The default IRAM-only sizing is the correct ship-now choice; PSRAM-backed is an open follow-up if the cluster-Pi reconciliation logic needs more history than 32 slots provides.
- Where does
gate_version: u8come from? Three options: (a) Kconfig-pinned at firmware build time; (b) NVS-stored and bumped at provision time; (c) embedded as a build-id byte derived from the firmware manifest. Default: option (a), Kconfig-pinned. Rationale: the gate version is part of the firmware contract, not the per- deployment configuration. NVS is the wrong namespace; the build- id approach is more robust to provisioning slips but harder to compare across deployments. The decision is reversible — the field width is fixed at 8 bits regardless of source. - Interaction with ADR-082 (pose-tracker confirmed-track
filter). The gate could legitimately suppress the very frames
that would have promoted a Tentative track to Active in
ADR-082's three-frame minimum. The risk is asymmetric: false-
positive ghost poses are filtered by ADR-082 (correct), but
false-negative-real poses are enabled by the edge gate
suppressing real-but-quiet first frames. Mitigations are listed
in Consequences; the ADR commits to (a) Tentative-state-aware
threshold tuning if the validation regression on the pose
recall metric exceeds 2 percentage points, and (b) keeping
motion_score >= 0.05as an unconditional force-send override inside the gate. Open Q because the right mitigation depends on the measured regression.
Related
- ADR-018 (Accepted) — Binary CSI frame magic discipline. The v7 wire format follows the same magic-bump pattern.
- ADR-028 (Accepted) — Capability audit / witness verification. Re-cut the bundle after this ADR ships; the SHA is expected to change.
- ADR-081 (Accepted) — 5-layer adaptive CSI mesh firmware kernel. ADR-086 is a Layer 4 refinement.
- ADR-082 (Accepted) — Pose-tracker confirmed-track filter. Open Q4 above.
- ADR-084 (Proposed) — RaBitQ similarity sensor. The cluster- Pi reference for the same gate this ADR pushes to the edge.
- ADR-085 (Proposed) — RaBitQ pipeline expansion. Seven cluster-Pi-side sites; ADR-086 is the deliberately-out-of-scope edge follow-up flagged at ADR-085 publication time.
Related ADR slots
The user prompt that produced this ADR identified two further follow-ups that should land as their own ADRs if and when the triggering condition occurs. They are recorded here as pointer-stubs rather than full ADRs because each is a one-paragraph commitment, not a structured decision; opening a full ADR for either prematurely would inflate the ledger without buying decision resolution.
ADR-087 (prospective) — Pass-4 mesh-exchange scope clarification
ADR-084 §"Decision" lists "mesh-exchange compression" between sensor
nodes when reporting cross-cluster events as the fourth of its five
sites. The binding intent of that text is cluster-Pi to cluster-Pi
exchange — i.e., the ADR-066 swarm-bridge channel between peer
Cognitum Seeds — not sensor-MCU to cluster-Pi UDP traffic. The two
are different problems: cluster-to-cluster is std Rust on Linux/Mac
and reuses BinaryQuantized directly; sensor-to-Pi is what ADR-086
addresses. If the team later reinterprets Pass 4 as
sensor→cluster-Pi UDP compression, that would be ADR-086's twin and
should land as ADR-087 with its own firmware release, distinct
from ADR-086's release. The clarification is one paragraph because
the only decision is "which interpretation does ADR-084's Pass 4
mean", and the answer is currently the cluster-to-cluster reading.
ADR-087 only opens if that reading is contested.
ADR-088 (prospective) — Firmware-release coordination policy
Issues #386 and #396 (firmware-only fixes — the MGMT-only promiscuous filter and the 50 Hz callback-rate gate) demonstrate that the firmware can need a release independent of any cluster-Pi ADR work. ADR-086 is itself an example: it requires a firmware release that is not driven by ADR-084 or ADR-085, both of which are cluster-Pi-only. Today the implicit policy is "firmware releases when something firmware-only ships." That works but is undocumented. ADR-088 would formalize when a firmware release is required vs deferred, with concrete examples: a Kconfig flag flip (#386 / #396) must release; a Pi-side parser-only addition (ADR-085 Sites 1–7) must not; a wire-format magic bump (ADR-086) must release and must re-cut the witness bundle; a feature-flag-default flip on a shipped v7 firmware should release a config bundle but not a firmware binary. ADR-088 opens when the next firmware-only change after ADR-086 lands and forces the decision; it is recorded here as a slot rather than written speculatively because the actual release- gating questions only become concrete in the presence of a real shipping change.