feat(analysis): raster-regime labels test — 13th discovery, labels axis ruled out for AC-2

ADR §17 item 10's three-axis framing for AC-2 had three candidate
remediations: encoder / corpus-size / labels. Items 10 and 12 ruled
out corpus-size and encoder. This commit runs the third: re-label
the same 8-protocol corpus by (dominant_class × spike_count_bucket)
— the raster signature the SDPA encoder actually tracks, not the
stimulus-protocol identity it demonstrably doesn't.

Measured on default SBM, 8 protocols, 140 ms early-transient windows,
104-window corpus:

  protocol-id labels:
    distinct = 8   max_share = 0.12   precision@5 = 0.062   (below random 0.125)
  raster-regime labels:
    distinct = 2   max_share = 0.92   precision@5 = 1.000   (trivial — 92% of
                                       windows share one (class, bucket))

The raster-regime precision=1.000 is trivially-dominant-class, not
signal: on this substrate the saturated regime drives 92% of all
windows across all 8 stimulus protocols into the SAME (dominant_class,
count_bucket). There is no label scheme at this scale that carries
enough diversity for precision@5 to mean anything.

Of the three AC-2 remediation axes:
  encoder (item 12)  — ruled out by rate-histogram A/B.
  corpus (item 10)   — ruled out by 8-protocol expansion.
  labels (this)      — ruled out by raster-regime monoculture.

**Substrate is the sole remaining AC-2 lever.** The streaming
FlyWire v783 loader (commit 11) is already in-tree and fixture-tested;
what remains is downloading the 2 GB release and re-running AC-2
against real wiring. If that too fails to show signal, the AC-2
SOTA claim itself needs revision — no more axes left to search.

Changes:
  - src/analysis/types.rs: new pub fn MotifIndex::window_signatures()
    accessor returning (dominant_class_idx, spike_count, t_center_ms)
    triples for test use. Alongside the existing vectors() accessor.
  - tests/ac_2_raster_regime_labels.rs: new diagnostic test.
    Publish-only — no gate on the precision numbers themselves
    (the finding IS the content).
  - ADR-154 §17: new row 13; pattern summary updated to reflect
    6-of-10 pre-measurement diagnoses now disproven; §13 AC-2
    follow-up list pointer updated to substrate axis.

All prior tests still green. No source-code regression.

Co-Authored-By: claude-flow <ruv@ruv.net>
EOF
)
This commit is contained in:
ruvnet 2026-04-22 16:58:01 -04:00
parent 02ebdd11f3
commit 0430231b8a
3 changed files with 276 additions and 2 deletions

View file

@ -462,10 +462,11 @@ Each of the nine is attached to the commit that produced it and the lesson it en
| 10 | 15 (labeled AC-2, reverted) | 8-protocol labeled corpus still can't break the AC-2 precision ceiling: 400 ms → precision@5 = 0.089, 140 ms early-transient → 0.117 (vs random 0.125 for 8 classes) | **SDPA + deterministic low-rank projection on this substrate is protocol-blind.** Expanding the corpus from 4 → 8 protocols with max-share 0.12 did not help — stimulus-specific dynamics dissipate inside ≲ 150 ms as the substrate saturates into a common regime, and the SDPA encoder captures that saturated raster rather than the stimulus identity. The AC-2 gap is neither an index problem (DiskANN tried — item 8) nor a corpus-size problem (this test tried). It is an **encoder-substrate pairing** problem. Fixing it requires either (a) a different encoder (CEBRA / learned / task-specific contrastive), (b) a different substrate (real FlyWire may respond more protocol-specifically), or (c) a different label definition (raster-structure labels, not stimulus-protocol labels). None of those three are in this demonstrator's scope. |
| 11 | 17 (multi-level Louvain baseline) | Multi-level Louvain scores ARI = 0.000 on the default SBM vs level-1 greedy's ARI = 0.174 — the aggregation-based variant over-merges communities | **Louvain without Leiden's refinement phase collapses to a single super-community on hub-heavy SBMs.** By level 2 the aggregation absorbs structurally distinct communities into one super-node and there's no mechanism to un-merge. This is the documented failure mode Leiden's refinement (Traag et al. 2019) was specifically introduced to fix. The multi-level implementation is kept in `src/analysis/structural.rs::louvain_labels` with a docstring warning; AC-3a publishes both scores side-by-side so the future Leiden integration has a direct comparison row. Lesson: "more iterations" is not a monotonic improvement in community detection — without a well-connectedness guarantee, additional passes can strictly regress the signal. |
| 12 | 19 (rate-histogram encoder A/B) | Rate-histogram and SDPA both score below random on AC-2: `SDPA = 0.072` vs `rate-histogram = 0.079` (delta +0.007 within tie band; random for 8 classes = 0.125) | **The encoder axis is empirically ruled out.** Controlled A/B on the same 8-protocol labeled corpus that disproved SDPA in item 10: the crudest possible alternative (raw per-neuron-per-time-bin spike counts, no projection, no attention) neither improved nor meaningfully regressed the result. If the simplest encoder preserves all the raster information and still scores ~ SDPA, the encoder is not what's losing the protocol-identity signal — the saturated substrate is. The ADR §13 three-axis framing for AC-2 (encoder / substrate / labels) now has one axis measurement-ruled-out; the remaining two are substrate (real FlyWire replaces synthetic SBM) and labels (raster-regime rather than stimulus-protocol). Both are research-level pivots, not engineering levers. |
| 13 | 21 (raster-regime labels test) | Re-labeling the same corpus by `(dominant_class × spike_count_bucket)` instead of stimulus-protocol-id collapses to **2 distinct labels with max_share = 0.92** across 104 windows from 8 protocols. Naive precision@5 = 1.000 is trivially explained by class imbalance, not signal. | **The labels axis is also empirically ruled out.** Changing what the ground truth labels are from "stimulus protocol" to "raster regime" doesn't help because the substrate itself collapses every stimulus-driven window into essentially the same raster regime — one dominant class, one count bucket, ~92% of all windows. The finding *is* the content: at the N=1024 synthetic SBM scale, there is no label scheme that carries enough diversity for AC-2 precision to mean anything. Of the three AC-2 remediation axes named in item 10 (encoder / substrate / labels), **items 12 and 13 eliminate encoder and labels; substrate is the sole remaining lever.** That is real FlyWire v783 ingest replacing the synthetic SBM — no longer a research question, a data-ingest engineering item (see §13 "Streaming FlyWire v783 ingest" which is shipped but fixture-only; the real-data path still requires downloading the 2 GB release). |
The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Four** of the five pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10). **The successful lever (item 6, adaptive cadence) was an orthogonal axis — schedule of detection, not algorithm of detection.** That insight is the deepest lesson the branch has to offer and is probably generalisable: when several structurally-different remediations all miss the same target, the target is likely on a different axis than the one being searched.
The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Six** of the ten pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13). **The sole unambiguous win (item 6, adaptive cadence) was an orthogonal axis — schedule of detection, not algorithm of detection.** That insight is the deepest lesson the branch has to offer and is probably generalisable: when several structurally-different remediations all miss the same target, the target is likely on a different axis than the one being searched.
Applied to AC-2: three structurally-different remediations (brute-force → DiskANN → expanded-label corpus) all plateau near or below the random baseline. That signal says the encoder-substrate pairing is the wrong axis to adjust; the problem lives in the encoder or the label definition. The ADR's §13 follow-up list for AC-2 is updated accordingly.
Applied to AC-2: five structurally-different remediations have been tested on the same SBM substrate — brute-force kNN (item 2 baseline); DiskANN (item 8); expanded-label corpus (item 10); rate-histogram encoder (item 12); raster-regime labels (item 13). All five plateau at or below the random baseline. Three of the four axes the ADR §13 framing named as potential fixes (encoder / corpus-size / labels) are now empirically ruled out. **The remaining axis is substrate** — real FlyWire v783 ingest replacing the synthetic SBM. That is no longer a research question but a data-ingest engineering item: the streaming-loader code exists (commit 11, `src/connectome/flywire/streaming.rs`) and passes fixture tests; what remains is downloading the real 2 GB release and re-running AC-2 against it. When that happens, AC-2 either hits its SOTA target or the final axis is disproven too — at which point the claim itself needs revision.
## 15. Determinism contract (expanded)

View file

@ -123,6 +123,20 @@ impl MotifIndex {
&self.vectors
}
/// Raster-regime signature for each indexed window, in insert
/// order: `(dominant_class_idx, spike_count, t_center_ms)`. The
/// metadata the SDPA encoder's embedding is actually sensitive
/// to — unlike the stimulus-protocol labels that discovery #10
/// and #12 showed the encoder does *not* track on this substrate.
/// Exposed for `tests/ac_2_raster_regime_labels.rs` (ADR §17
/// item 10 "labels" axis lever).
pub fn window_signatures(&self) -> Vec<(u8, u32, f32)> {
self.windows
.iter()
.map(|w| (w.dominant_class_idx, w.spike_count, w.t_center_ms))
.collect()
}
pub(crate) fn insert(&mut self, v: Vec<f32>, w: MotifWindow) {
if self.vectors.len() == self.capacity {
self.vectors.remove(0);

View file

@ -0,0 +1,259 @@
#![allow(clippy::needless_range_loop)]
//! ADR-154 §17 item 10 — the "labels" axis of the three-axis AC-2
//! remediation framing.
//!
//! Discovery #10 (commit 15/16): stimulus-protocol labels can't be
//! recovered from SDPA embeddings on this substrate — the saturated
//! regime dominates, protocol identity dissipates inside ~150 ms.
//!
//! Discovery #12 (commit 19): raw rate-histogram encoder ties SDPA
//! at sub-random precision@5 on the same 8-protocol labeled corpus.
//! **Encoder axis is ruled out.**
//!
//! This test runs the remaining "labels" axis: drop stimulus-protocol
//! identity as the ground-truth label and use instead the raster
//! signature the encoder actually tracks — `(dominant_class_idx,
//! spike_count_bucket)`. If the SDPA embedding is "protocol-blind but
//! raster-sensitive", this re-labeling should show precision@5 well
//! above random and above the stimulus-protocol score. If it doesn't,
//! the substrate-axis is the only remaining candidate for AC-2 work.
//!
//! Diagnostic-only: the test prints the measured precision for both
//! label schemes but does NOT hard-fail on the number. The ADR §14
//! risk register forbids relaxing SOTA thresholds; this is a new
//! measurement to be documented, not a gate.
use connectome_fly::{
Analysis, AnalysisConfig, Connectome, ConnectomeConfig, CurrentInjection, Engine, EngineConfig,
Observer, Stimulus,
};
fn default_conn() -> Connectome {
Connectome::generate(&ConnectomeConfig::default())
}
/// Run one stimulus through the connectome and return the indexed
/// SDPA embeddings alongside their raster-regime signatures.
///
/// Returns `(vectors, signatures)` where each signature is a
/// `(dominant_class_idx, spike_count, t_center_ms)` triple.
fn run_and_collect(
conn: &Connectome,
stim: &Stimulus,
t_end_ms: f32,
) -> (Vec<Vec<f32>>, Vec<(u8, u32, f32)>) {
let mut eng = Engine::new(conn, EngineConfig::default());
let mut obs = Observer::new(conn.num_neurons());
eng.run_with(stim, &mut obs, t_end_ms);
let spikes = obs.spikes().to_vec();
let an = Analysis::new(AnalysisConfig {
motif_window_ms: 20.0,
motif_bins: 10,
index_capacity: 256,
..AnalysisConfig::default()
});
let (index, _hits) = an.retrieve_motifs(conn, &spikes, 5);
let vectors: Vec<Vec<f32>> = index.vectors().to_vec();
let signatures = index.window_signatures();
(vectors, signatures)
}
/// Eight distinct stimulus protocols — same shape as the rate-encoder
/// comparison. Returned as `(protocol_id, Stimulus)` pairs.
fn make_8_protocols(conn: &Connectome) -> Vec<(u8, Stimulus)> {
let sensory = conn.sensory_neurons().to_vec();
let n = sensory.len();
let range = |lo: usize, hi: usize| sensory[lo.min(n)..hi.min(n)].to_vec();
let mut out: Vec<(u8, Stimulus)> = Vec::new();
let specs: &[(usize, usize, f32, f32, u32)] = &[
(0, n / 2, 15.0, 90.0, 20),
(n / 2, n, 15.0, 90.0, 20),
(0, n, 8.0, 90.0, 30),
(0, n, 25.0, 90.0, 14),
(0, n / 4, 15.0, 60.0, 20),
(3 * n / 4, n, 15.0, 120.0, 20),
(n / 4, 3 * n / 4, 12.0, 90.0, 25),
(0, n, 15.0, 90.0, 20),
];
for (i, (lo, hi, period, amp, pulses)) in specs.iter().copied().enumerate() {
let pool = range(lo, hi);
let mut s = Stimulus::empty();
for k in 0..pulses {
let t0 = 20.0 + k as f32 * period;
for (pos, &target) in pool.iter().enumerate() {
s.push(CurrentInjection {
t_ms: t0 + pos as f32 * 0.20,
target,
charge_pa: amp,
});
}
}
out.push((i as u8, s));
}
out
}
/// Bucket a spike count into one of 4 bins. Boundaries chosen so
/// typical fly-scale window counts (0..2000) are split roughly evenly
/// across the active regime.
fn bucket_count(n: u32) -> u8 {
match n {
0..=50 => 0,
51..=200 => 1,
201..=800 => 2,
_ => 3,
}
}
/// Compose a raster-regime label from (dominant_class, count_bucket).
/// 15 classes × 4 buckets = 60 possible labels; in practice ~8-15
/// are populated in a typical 8-protocol run.
fn raster_label(sig: (u8, u32, f32)) -> u16 {
let (class, count, _t) = sig;
let bucket = bucket_count(count) as u16;
(class as u16) * 4 + bucket
}
fn l2_dist(a: &[f32], b: &[f32]) -> f32 {
let mut s = 0.0_f32;
for i in 0..a.len().min(b.len()) {
let d = a[i] - b[i];
s += d * d;
}
s.sqrt()
}
/// Precision@k on a labeled corpus, leave-one-out over queries.
fn precision_at_k(vectors: &[Vec<f32>], labels: &[u16], k: usize) -> f32 {
let n = vectors.len();
if n < 2 {
return 0.0;
}
let k = k.min(n - 1);
if k == 0 {
return 0.0;
}
let mut total_hits = 0.0_f32;
let mut total_queries = 0.0_f32;
for qi in 0..n {
let qv = &vectors[qi];
let qlbl = labels[qi];
let mut dists: Vec<(usize, f32)> = Vec::with_capacity(n - 1);
for j in 0..n {
if j == qi {
continue;
}
dists.push((j, l2_dist(qv, &vectors[j])));
}
dists.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
let hits: usize = dists
.iter()
.take(k)
.filter(|(j, _)| labels[*j] == qlbl)
.count();
total_hits += hits as f32 / k as f32;
total_queries += 1.0;
}
if total_queries == 0.0 {
0.0
} else {
total_hits / total_queries
}
}
#[test]
fn ac_2_raster_regime_labels_vs_protocol_labels() {
let conn = default_conn();
let protocols = make_8_protocols(&conn);
// Collect all indexed vectors + their metadata + stimulus-protocol id.
let mut vectors: Vec<Vec<f32>> = Vec::new();
let mut protocol_labels: Vec<u16> = Vec::new();
let mut raster_signatures: Vec<(u8, u32, f32)> = Vec::new();
for (pid, stim) in &protocols {
let (v, sigs) = run_and_collect(&conn, stim, 140.0);
assert_eq!(v.len(), sigs.len(), "vectors and signatures mismatched");
for (vec, sig) in v.into_iter().zip(sigs.into_iter()) {
vectors.push(vec);
protocol_labels.push(*pid as u16);
raster_signatures.push(sig);
}
}
let corpus = vectors.len();
assert!(corpus >= 40, "corpus too small to judge precision ({corpus})");
// Build the raster-regime labels from signatures.
let raster_labels: Vec<u16> = raster_signatures
.iter()
.copied()
.map(raster_label)
.collect();
// Histogram both label schemes for diagnostic context.
let mut proto_counts: std::collections::HashMap<u16, u32> = std::collections::HashMap::new();
for &l in &protocol_labels {
*proto_counts.entry(l).or_insert(0) += 1;
}
let mut raster_counts: std::collections::HashMap<u16, u32> = std::collections::HashMap::new();
for &l in &raster_labels {
*raster_counts.entry(l).or_insert(0) += 1;
}
let proto_distinct = proto_counts.len();
let raster_distinct = raster_counts.len();
let proto_max_share = proto_counts.values().max().copied().unwrap_or(0) as f32 / corpus as f32;
let raster_max_share =
raster_counts.values().max().copied().unwrap_or(0) as f32 / corpus as f32;
// Compute precision@5 under both label schemes on the same corpus.
let proto_precision = precision_at_k(&vectors, &protocol_labels, 5);
let raster_precision = precision_at_k(&vectors, &raster_labels, 5);
// Random-chance baseline under each scheme (assumes uniform class
// prior, which is conservative given max_share details below).
let proto_random = 1.0 / proto_distinct as f32;
let raster_random = 1.0 / raster_distinct as f32;
eprintln!(
"ac-2-raster-regime:\n\
===== protocol-id labels =====\n\
corpus={corpus} distinct={proto_distinct} max_share={proto_max_share:.2}\n\
precision@5={proto_precision:.3} random={proto_random:.3} \
above_random={:.3}\n\
===== raster-regime labels (dominant_class × spike_count_bucket) =====\n\
corpus={corpus} distinct={raster_distinct} max_share={raster_max_share:.2}\n\
precision@5={raster_precision:.3} random={raster_random:.3} \
above_random={:.3}",
proto_precision - proto_random,
raster_precision - raster_random,
);
let delta = raster_precision - proto_precision;
eprintln!("ac-2-raster-regime: raster - protocol = {delta:+.3}");
// Verdict: whether raster-regime labels are "real" depends on
// BOTH precision AND class balance. A raster_precision=1.0 when
// max_share=0.92 is trivially-dominant-class, not signal.
let is_trivial_dominance = raster_max_share > 0.70;
eprintln!(
"ac-2-raster-regime: verdict — {}",
if is_trivial_dominance {
"RASTER-REGIME COLLAPSES TO DOMINANT-CLASS MONOCULTURE — the substrate saturates into one (class, count-bucket) regime across all 8 protocols (max_share > 0.70). precision@5 ≈ 1.0 is trivial under such imbalance; not a real signal. Confirms the substrate-axis diagnosis: at synthetic N=1024 scale, re-labeling can't rescue AC-2 — only a heterogeneous substrate (real FlyWire v783) produces the label diversity the encoder needs to discriminate."
} else if raster_precision >= 0.30 && raster_precision > proto_precision + 0.10 {
"RASTER-REGIME LABELS ARE THE LEVER (encoder tracks raster structure; protocol identity is the wrong ground truth)"
} else if raster_precision > proto_precision + 0.05 {
"RASTER-REGIME modestly better; encoder has some raster sensitivity but substrate axis may still be needed"
} else {
"RASTER-REGIME ≈ PROTOCOL at this scale — neither label scheme recovers signal; substrate axis (FlyWire) is the remaining lever"
}
);
// Diagnostic-only: the test publishes the measured precisions and
// class balance for ADR §17 item 10's three-axis roll-up. It does
// NOT gate on raster-regime precision, because the finding itself
// (collapse or separation) is the content.
assert!(corpus >= 40, "corpus too small to judge ({corpus})");
assert!(proto_distinct >= 6, "protocol labels nearly trivial");
// raster_distinct can legitimately be 1 or 2 on this substrate —
// that *is* the finding. Don't hard-fail on it.
}