feat(analysis): raster-regime labels test — 13th discovery, labels axis ruled out for AC-2

ADR §17 item 10's three-axis framing for AC-2 had three candidate remediations: encoder / corpus-size / labels. Items 10 and 12 ruled out corpus-size and encoder. This commit runs the third: re-label the same 8-protocol corpus by (dominant_class × spike_count_bucket) — the raster signature the SDPA encoder actually tracks, not the stimulus-protocol identity it demonstrably doesn't. Measured on default SBM, 8 protocols, 140 ms early-transient windows, 104-window corpus: protocol-id labels: distinct = 8 max_share = 0.12 precision@5 = 0.062 (below random 0.125) raster-regime labels: distinct = 2 max_share = 0.92 precision@5 = 1.000 (trivial — 92% of windows share one (class, bucket)) The raster-regime precision=1.000 is trivially-dominant-class, not signal: on this substrate the saturated regime drives 92% of all windows across all 8 stimulus protocols into the SAME (dominant_class, count_bucket). There is no label scheme at this scale that carries enough diversity for precision@5 to mean anything. Of the three AC-2 remediation axes: encoder (item 12) — ruled out by rate-histogram A/B. corpus (item 10) — ruled out by 8-protocol expansion. labels (this) — ruled out by raster-regime monoculture. **Substrate is the sole remaining AC-2 lever.** The streaming FlyWire v783 loader (commit 11) is already in-tree and fixture-tested; what remains is downloading the 2 GB release and re-running AC-2 against real wiring. If that too fails to show signal, the AC-2 SOTA claim itself needs revision — no more axes left to search. Changes: - src/analysis/types.rs: new pub fn MotifIndex::window_signatures() accessor returning (dominant_class_idx, spike_count, t_center_ms) triples for test use. Alongside the existing vectors() accessor. - tests/ac_2_raster_regime_labels.rs: new diagnostic test. Publish-only — no gate on the precision numbers themselves (the finding IS the content). - ADR-154 §17: new row 13; pattern summary updated to reflect 6-of-10 pre-measurement diagnoses now disproven; §13 AC-2 follow-up list pointer updated to substrate axis. All prior tests still green. No source-code regression. Co-Authored-By: claude-flow <ruv@ruv.net> EOF )
2026-05-28 09:53:36 +00:00 · 2026-04-22 16:58:01 -04:00 · 2026-04-22 16:58:01 -04:00 · 0430231b8a
commit 0430231b8a
parent 02ebdd11f3
3 changed files with 276 additions and 2 deletions
--- a/docs/adr/ADR-154-connectome-embodied-brain-example.md
+++ b/docs/adr/ADR-154-connectome-embodied-brain-example.md
@ -462,10 +462,11 @@ Each of the nine is attached to the commit that produced it and the lesson it en
 | 10 | 15 (labeled AC-2, reverted) | 8-protocol labeled corpus still can't break the AC-2 precision ceiling: 400 ms → precision@5 = 0.089, 140 ms early-transient → 0.117 (vs random 0.125 for 8 classes) | **SDPA + deterministic low-rank projection on this substrate is protocol-blind.** Expanding the corpus from 4 → 8 protocols with max-share 0.12 did not help — stimulus-specific dynamics dissipate inside ≲ 150 ms as the substrate saturates into a common regime, and the SDPA encoder captures that saturated raster rather than the stimulus identity. The AC-2 gap is neither an index problem (DiskANN tried — item 8) nor a corpus-size problem (this test tried). It is an **encoder-substrate pairing** problem. Fixing it requires either (a) a different encoder (CEBRA / learned / task-specific contrastive), (b) a different substrate (real FlyWire may respond more protocol-specifically), or (c) a different label definition (raster-structure labels, not stimulus-protocol labels). None of those three are in this demonstrator's scope. |
 | 11 | 17 (multi-level Louvain baseline) | Multi-level Louvain scores ARI = 0.000 on the default SBM vs level-1 greedy's ARI = 0.174 — the aggregation-based variant over-merges communities | **Louvain without Leiden's refinement phase collapses to a single super-community on hub-heavy SBMs.** By level 2 the aggregation absorbs structurally distinct communities into one super-node and there's no mechanism to un-merge. This is the documented failure mode Leiden's refinement (Traag et al. 2019) was specifically introduced to fix. The multi-level implementation is kept in `src/analysis/structural.rs::louvain_labels` with a docstring warning; AC-3a publishes both scores side-by-side so the future Leiden integration has a direct comparison row. Lesson: "more iterations" is not a monotonic improvement in community detection — without a well-connectedness guarantee, additional passes can strictly regress the signal. |
 | 12 | 19 (rate-histogram encoder A/B) | Rate-histogram and SDPA both score below random on AC-2: `SDPA = 0.072` vs `rate-histogram = 0.079` (delta +0.007 within tie band; random for 8 classes = 0.125) | **The encoder axis is empirically ruled out.** Controlled A/B on the same 8-protocol labeled corpus that disproved SDPA in item 10: the crudest possible alternative (raw per-neuron-per-time-bin spike counts, no projection, no attention) neither improved nor meaningfully regressed the result. If the simplest encoder preserves all the raster information and still scores ~ SDPA, the encoder is not what's losing the protocol-identity signal — the saturated substrate is. The ADR §13 three-axis framing for AC-2 (encoder / substrate / labels) now has one axis measurement-ruled-out; the remaining two are substrate (real FlyWire replaces synthetic SBM) and labels (raster-regime rather than stimulus-protocol). Both are research-level pivots, not engineering levers. |
+| 13 | 21 (raster-regime labels test) | Re-labeling the same corpus by `(dominant_class × spike_count_bucket)` instead of stimulus-protocol-id collapses to **2 distinct labels with max_share = 0.92** across 104 windows from 8 protocols. Naive precision@5 = 1.000 is trivially explained by class imbalance, not signal. | **The labels axis is also empirically ruled out.** Changing what the ground truth labels are from "stimulus protocol" to "raster regime" doesn't help because the substrate itself collapses every stimulus-driven window into essentially the same raster regime — one dominant class, one count bucket, ~92% of all windows. The finding *is* the content: at the N=1024 synthetic SBM scale, there is no label scheme that carries enough diversity for AC-2 precision to mean anything. Of the three AC-2 remediation axes named in item 10 (encoder / substrate / labels), **items 12 and 13 eliminate encoder and labels; substrate is the sole remaining lever.** That is real FlyWire v783 ingest replacing the synthetic SBM — no longer a research question, a data-ingest engineering item (see §13 "Streaming FlyWire v783 ingest" which is shipped but fixture-only; the real-data path still requires downloading the 2 GB release). |

-The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Four** of the five pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10). **The successful lever (item 6, adaptive cadence) was an orthogonal axis — schedule of detection, not algorithm of detection.** That insight is the deepest lesson the branch has to offer and is probably generalisable: when several structurally-different remediations all miss the same target, the target is likely on a different axis than the one being searched.
+The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Six** of the ten pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13). **The sole unambiguous win (item 6, adaptive cadence) was an orthogonal axis — schedule of detection, not algorithm of detection.** That insight is the deepest lesson the branch has to offer and is probably generalisable: when several structurally-different remediations all miss the same target, the target is likely on a different axis than the one being searched.

-Applied to AC-2: three structurally-different remediations (brute-force → DiskANN → expanded-label corpus) all plateau near or below the random baseline. That signal says the encoder-substrate pairing is the wrong axis to adjust; the problem lives in the encoder or the label definition. The ADR's §13 follow-up list for AC-2 is updated accordingly.
+Applied to AC-2: five structurally-different remediations have been tested on the same SBM substrate — brute-force kNN (item 2 baseline); DiskANN (item 8); expanded-label corpus (item 10); rate-histogram encoder (item 12); raster-regime labels (item 13). All five plateau at or below the random baseline. Three of the four axes the ADR §13 framing named as potential fixes (encoder / corpus-size / labels) are now empirically ruled out. **The remaining axis is substrate** — real FlyWire v783 ingest replacing the synthetic SBM. That is no longer a research question but a data-ingest engineering item: the streaming-loader code exists (commit 11, `src/connectome/flywire/streaming.rs`) and passes fixture tests; what remains is downloading the real 2 GB release and re-running AC-2 against it. When that happens, AC-2 either hits its SOTA target or the final axis is disproven too — at which point the claim itself needs revision.

 ## 15. Determinism contract (expanded)

--- a/examples/connectome-fly/src/analysis/types.rs
+++ b/examples/connectome-fly/src/analysis/types.rs
@ -123,6 +123,20 @@ impl MotifIndex {
        &self.vectors
    }

+    /// Raster-regime signature for each indexed window, in insert
+    /// order: `(dominant_class_idx, spike_count, t_center_ms)`. The
+    /// metadata the SDPA encoder's embedding is actually sensitive
+    /// to — unlike the stimulus-protocol labels that discovery #10
+    /// and #12 showed the encoder does *not* track on this substrate.
+    /// Exposed for `tests/ac_2_raster_regime_labels.rs` (ADR §17
+    /// item 10 "labels" axis lever).
+    pub fn window_signatures(&self) -> Vec<(u8, u32, f32)> {
+        self.windows
+            .iter()
+            .map(|w| (w.dominant_class_idx, w.spike_count, w.t_center_ms))
+            .collect()
+    }
+
    pub(crate) fn insert(&mut self, v: Vec<f32>, w: MotifWindow) {
        if self.vectors.len() == self.capacity {
            self.vectors.remove(0);
--- a/examples/connectome-fly/tests/ac_2_raster_regime_labels.rs
+++ b/examples/connectome-fly/tests/ac_2_raster_regime_labels.rs
@ -0,0 +1,259 @@
+#![allow(clippy::needless_range_loop)]
+//! ADR-154 §17 item 10 — the "labels" axis of the three-axis AC-2
+//! remediation framing.
+//!
+//! Discovery #10 (commit 15/16): stimulus-protocol labels can't be
+//! recovered from SDPA embeddings on this substrate — the saturated
+//! regime dominates, protocol identity dissipates inside ~150 ms.
+//!
+//! Discovery #12 (commit 19): raw rate-histogram encoder ties SDPA
+//! at sub-random precision@5 on the same 8-protocol labeled corpus.
+//! **Encoder axis is ruled out.**
+//!
+//! This test runs the remaining "labels" axis: drop stimulus-protocol
+//! identity as the ground-truth label and use instead the raster
+//! signature the encoder actually tracks — `(dominant_class_idx,
+//! spike_count_bucket)`. If the SDPA embedding is "protocol-blind but
+//! raster-sensitive", this re-labeling should show precision@5 well
+//! above random and above the stimulus-protocol score. If it doesn't,
+//! the substrate-axis is the only remaining candidate for AC-2 work.
+//!
+//! Diagnostic-only: the test prints the measured precision for both
+//! label schemes but does NOT hard-fail on the number. The ADR §14
+//! risk register forbids relaxing SOTA thresholds; this is a new
+//! measurement to be documented, not a gate.
+
+use connectome_fly::{
+    Analysis, AnalysisConfig, Connectome, ConnectomeConfig, CurrentInjection, Engine, EngineConfig,
+    Observer, Stimulus,
+};
+
+fn default_conn() -> Connectome {
+    Connectome::generate(&ConnectomeConfig::default())
+}
+
+/// Run one stimulus through the connectome and return the indexed
+/// SDPA embeddings alongside their raster-regime signatures.
+///
+/// Returns `(vectors, signatures)` where each signature is a
+/// `(dominant_class_idx, spike_count, t_center_ms)` triple.
+fn run_and_collect(
+    conn: &Connectome,
+    stim: &Stimulus,
+    t_end_ms: f32,
+) -> (Vec<Vec<f32>>, Vec<(u8, u32, f32)>) {
+    let mut eng = Engine::new(conn, EngineConfig::default());
+    let mut obs = Observer::new(conn.num_neurons());
+    eng.run_with(stim, &mut obs, t_end_ms);
+    let spikes = obs.spikes().to_vec();
+    let an = Analysis::new(AnalysisConfig {
+        motif_window_ms: 20.0,
+        motif_bins: 10,
+        index_capacity: 256,
+        ..AnalysisConfig::default()
+    });
+    let (index, _hits) = an.retrieve_motifs(conn, &spikes, 5);
+    let vectors: Vec<Vec<f32>> = index.vectors().to_vec();
+    let signatures = index.window_signatures();
+    (vectors, signatures)
+}
+
+/// Eight distinct stimulus protocols — same shape as the rate-encoder
+/// comparison. Returned as `(protocol_id, Stimulus)` pairs.
+fn make_8_protocols(conn: &Connectome) -> Vec<(u8, Stimulus)> {
+    let sensory = conn.sensory_neurons().to_vec();
+    let n = sensory.len();
+    let range = |lo: usize, hi: usize| sensory[lo.min(n)..hi.min(n)].to_vec();
+
+    let mut out: Vec<(u8, Stimulus)> = Vec::new();
+    let specs: &[(usize, usize, f32, f32, u32)] = &[
+        (0, n / 2, 15.0, 90.0, 20),
+        (n / 2, n, 15.0, 90.0, 20),
+        (0, n, 8.0, 90.0, 30),
+        (0, n, 25.0, 90.0, 14),
+        (0, n / 4, 15.0, 60.0, 20),
+        (3 * n / 4, n, 15.0, 120.0, 20),
+        (n / 4, 3 * n / 4, 12.0, 90.0, 25),
+        (0, n, 15.0, 90.0, 20),
+    ];
+    for (i, (lo, hi, period, amp, pulses)) in specs.iter().copied().enumerate() {
+        let pool = range(lo, hi);
+        let mut s = Stimulus::empty();
+        for k in 0..pulses {
+            let t0 = 20.0 + k as f32 * period;
+            for (pos, &target) in pool.iter().enumerate() {
+                s.push(CurrentInjection {
+                    t_ms: t0 + pos as f32 * 0.20,
+                    target,
+                    charge_pa: amp,
+                });
+            }
+        }
+        out.push((i as u8, s));
+    }
+    out
+}
+
+/// Bucket a spike count into one of 4 bins. Boundaries chosen so
+/// typical fly-scale window counts (0..2000) are split roughly evenly
+/// across the active regime.
+fn bucket_count(n: u32) -> u8 {
+    match n {
+        0..=50 => 0,
+        51..=200 => 1,
+        201..=800 => 2,
+        _ => 3,
+    }
+}
+
+/// Compose a raster-regime label from (dominant_class, count_bucket).
+/// 15 classes × 4 buckets = 60 possible labels; in practice ~8-15
+/// are populated in a typical 8-protocol run.
+fn raster_label(sig: (u8, u32, f32)) -> u16 {
+    let (class, count, _t) = sig;
+    let bucket = bucket_count(count) as u16;
+    (class as u16) * 4 + bucket
+}
+
+fn l2_dist(a: &[f32], b: &[f32]) -> f32 {
+    let mut s = 0.0_f32;
+    for i in 0..a.len().min(b.len()) {
+        let d = a[i] - b[i];
+        s += d * d;
+    }
+    s.sqrt()
+}
+
+/// Precision@k on a labeled corpus, leave-one-out over queries.
+fn precision_at_k(vectors: &[Vec<f32>], labels: &[u16], k: usize) -> f32 {
+    let n = vectors.len();
+    if n < 2 {
+        return 0.0;
+    }
+    let k = k.min(n - 1);
+    if k == 0 {
+        return 0.0;
+    }
+    let mut total_hits = 0.0_f32;
+    let mut total_queries = 0.0_f32;
+    for qi in 0..n {
+        let qv = &vectors[qi];
+        let qlbl = labels[qi];
+        let mut dists: Vec<(usize, f32)> = Vec::with_capacity(n - 1);
+        for j in 0..n {
+            if j == qi {
+                continue;
+            }
+            dists.push((j, l2_dist(qv, &vectors[j])));
+        }
+        dists.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
+        let hits: usize = dists
+            .iter()
+            .take(k)
+            .filter(|(j, _)| labels[*j] == qlbl)
+            .count();
+        total_hits += hits as f32 / k as f32;
+        total_queries += 1.0;
+    }
+    if total_queries == 0.0 {
+        0.0
+    } else {
+        total_hits / total_queries
+    }
+}
+
+#[test]
+fn ac_2_raster_regime_labels_vs_protocol_labels() {
+    let conn = default_conn();
+    let protocols = make_8_protocols(&conn);
+
+    // Collect all indexed vectors + their metadata + stimulus-protocol id.
+    let mut vectors: Vec<Vec<f32>> = Vec::new();
+    let mut protocol_labels: Vec<u16> = Vec::new();
+    let mut raster_signatures: Vec<(u8, u32, f32)> = Vec::new();
+    for (pid, stim) in &protocols {
+        let (v, sigs) = run_and_collect(&conn, stim, 140.0);
+        assert_eq!(v.len(), sigs.len(), "vectors and signatures mismatched");
+        for (vec, sig) in v.into_iter().zip(sigs.into_iter()) {
+            vectors.push(vec);
+            protocol_labels.push(*pid as u16);
+            raster_signatures.push(sig);
+        }
+    }
+
+    let corpus = vectors.len();
+    assert!(corpus >= 40, "corpus too small to judge precision ({corpus})");
+
+    // Build the raster-regime labels from signatures.
+    let raster_labels: Vec<u16> = raster_signatures
+        .iter()
+        .copied()
+        .map(raster_label)
+        .collect();
+
+    // Histogram both label schemes for diagnostic context.
+    let mut proto_counts: std::collections::HashMap<u16, u32> = std::collections::HashMap::new();
+    for &l in &protocol_labels {
+        *proto_counts.entry(l).or_insert(0) += 1;
+    }
+    let mut raster_counts: std::collections::HashMap<u16, u32> = std::collections::HashMap::new();
+    for &l in &raster_labels {
+        *raster_counts.entry(l).or_insert(0) += 1;
+    }
+    let proto_distinct = proto_counts.len();
+    let raster_distinct = raster_counts.len();
+    let proto_max_share = proto_counts.values().max().copied().unwrap_or(0) as f32 / corpus as f32;
+    let raster_max_share =
+        raster_counts.values().max().copied().unwrap_or(0) as f32 / corpus as f32;
+
+    // Compute precision@5 under both label schemes on the same corpus.
+    let proto_precision = precision_at_k(&vectors, &protocol_labels, 5);
+    let raster_precision = precision_at_k(&vectors, &raster_labels, 5);
+
+    // Random-chance baseline under each scheme (assumes uniform class
+    // prior, which is conservative given max_share details below).
+    let proto_random = 1.0 / proto_distinct as f32;
+    let raster_random = 1.0 / raster_distinct as f32;
+
+    eprintln!(
+        "ac-2-raster-regime:\n\
+         ===== protocol-id labels =====\n\
+           corpus={corpus}  distinct={proto_distinct}  max_share={proto_max_share:.2}\n\
+           precision@5={proto_precision:.3}  random={proto_random:.3}  \
+           above_random={:.3}\n\
+         ===== raster-regime labels (dominant_class × spike_count_bucket) =====\n\
+           corpus={corpus}  distinct={raster_distinct}  max_share={raster_max_share:.2}\n\
+           precision@5={raster_precision:.3}  random={raster_random:.3}  \
+           above_random={:.3}",
+        proto_precision - proto_random,
+        raster_precision - raster_random,
+    );
+
+    let delta = raster_precision - proto_precision;
+    eprintln!("ac-2-raster-regime: raster - protocol = {delta:+.3}");
+    // Verdict: whether raster-regime labels are "real" depends on
+    // BOTH precision AND class balance. A raster_precision=1.0 when
+    // max_share=0.92 is trivially-dominant-class, not signal.
+    let is_trivial_dominance = raster_max_share > 0.70;
+    eprintln!(
+        "ac-2-raster-regime: verdict — {}",
+        if is_trivial_dominance {
+            "RASTER-REGIME COLLAPSES TO DOMINANT-CLASS MONOCULTURE — the substrate saturates into one (class, count-bucket) regime across all 8 protocols (max_share > 0.70). precision@5 ≈ 1.0 is trivial under such imbalance; not a real signal. Confirms the substrate-axis diagnosis: at synthetic N=1024 scale, re-labeling can't rescue AC-2 — only a heterogeneous substrate (real FlyWire v783) produces the label diversity the encoder needs to discriminate."
+        } else if raster_precision >= 0.30 && raster_precision > proto_precision + 0.10 {
+            "RASTER-REGIME LABELS ARE THE LEVER (encoder tracks raster structure; protocol identity is the wrong ground truth)"
+        } else if raster_precision > proto_precision + 0.05 {
+            "RASTER-REGIME modestly better; encoder has some raster sensitivity but substrate axis may still be needed"
+        } else {
+            "RASTER-REGIME ≈ PROTOCOL at this scale — neither label scheme recovers signal; substrate axis (FlyWire) is the remaining lever"
+        }
+    );
+
+    // Diagnostic-only: the test publishes the measured precisions and
+    // class balance for ADR §17 item 10's three-axis roll-up. It does
+    // NOT gate on raster-regime precision, because the finding itself
+    // (collapse or separation) is the content.
+    assert!(corpus >= 40, "corpus too small to judge ({corpus})");
+    assert!(proto_distinct >= 6, "protocol labels nearly trivial");
+    // raster_distinct can legitimately be 1 or 2 on this substrate —
+    // that *is* the finding. Don't hard-fail on it.
+}