feat(analysis): CPM N-scaling sweep — 22nd discovery, 4× headline is N-specific

N=512/1024/2048 sweep at fixed density (num_modules = N/15) shows CPM beats modularity-Leiden at every scale but the ratio is not scale- invariant. Peak ratio 3.98× at N=1024; 2.55× at N=512; 2.74× at N=2048. Both algorithms' absolute ARI also drops at N=2048. ADR-154 §17 item 22 documents this with engineering implication: CPM- specific refinement (next named lever) should be benchmarked at multiple N before the result is quoted as "closes the AC-3a SOTA gap." - tests/leiden_cpm.rs: new leiden_cpm_vs_modularity_across_scales test - ADR-154 §17: heading updated Nine → Twenty-two; row 22 added Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-01 06:10:31 +00:00 · 2026-04-22 22:28:21 -04:00 · 2026-04-22 22:28:21 -04:00 · d6916436f8
commit d6916436f8
parent 6cf5246f64
2 changed files with 70 additions and 2 deletions
--- a/docs/adr/ADR-154-connectome-embodied-brain-example.md
+++ b/docs/adr/ADR-154-connectome-embodied-brain-example.md
@ -446,9 +446,9 @@ This section enumerates the risks this ADR is aware of and how the example stays

 This register is not comprehensive. It is the set of risks the branch has surfaced by running into them (positioning creep, threshold drift, null-distribution sloppiness, pre-measurement mis-diagnosis, envelope-vs-bit-exact framing, speculative-parenthetical predictions). Future commits are expected to add rows; they are not expected to remove rows.

-## 17. Nine measurement-driven discoveries (roll-up)
+## 17. Twenty-two measurement-driven discoveries (roll-up)

-Each of the nine is attached to the commit that produced it and the lesson it encoded for future work.
+Each of the twenty-two is attached to the commit that produced it and the lesson it encoded for future work.

 | # | Commit | Finding | Lesson |
 |---|---|---|---|
@ -473,6 +473,7 @@ Each of the nine is attached to the commit that produced it and the lesson it en
 | 19 | Fine-γ sweep refines the CPM peak | Re-swept γ ∈ {1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5, 4.0, …} on the default SBM. **New peak: ari_full = 0.425 at γ ∈ {2.25, 2.5}** with 156 / 171 communities (up from 0.393 @ γ=2.0). At γ = 1.75 CPM recovers **exactly 70 distinct communities** — matching the ground-truth module count — with ari_full = 0.348. | **CPM's quality ridge on this substrate is between γ=1.75 and γ=2.75, peaking at γ ∈ [2.25, 2.5].** Two interesting facts emerge from the fine sweep: (a) the peak ARI is at a γ that produces *more* communities (156) than the ground truth (70), suggesting CPM's over-splitting is *aligned* with ground truth well enough that ARI tolerates the extra fragmentation; (b) the γ = 1.75 point that exactly matches the ground-truth community count (70) actually scores lower (0.348 vs 0.425) — CPM's 70 communities there don't align with the SBM's 70 modules as well as its 156 communities do at γ = 2.25. So on this substrate, "match the community count" and "maximize ARI" are distinct optimization targets, and the γ values for each differ. **CPM-Leiden on the default SBM is now at 0.425 vs modularity-Leiden's 0.107 — a 3.97× improvement, 57 % of the 0.75 AC-3a SOTA target.** The remaining 1.76× gap is likely the modularity-resolution-limit-adjacent ceiling of CPM-without-refinement. Adding a CPM-specific refinement phase (not the current modularity-refinement) is the named next lever. Code: `tests/leiden_cpm.rs` γ-list extended; no production-code change. |
 | 20 | Full-partition ARI wired into AC-3a reveals level-1 greedy beats Leiden | AC-3a now publishes full-partition ARI alongside the 2-way coarsening. **Greedy modularity (level-1) full_ari = 0.308, beats modularity-Leiden's full_ari = 0.107** on the default SBM. Multi-level Louvain collapses (full_ari = 0.000). CPM @ γ=2.25 remains top at 0.425. Final ranking: CPM 0.425 > greedy level-1 0.308 > Leiden 0.107 > Louvain 0.000. | **Leiden's aggregation+refinement actively hurts module recovery on this substrate.** Greedy level-1 (one pass of local moves, no aggregation) gives 0.308 full-partition ARI; adding the aggregation + Traag refinement steps drops it to 0.107 — a **2.9× regression from a more sophisticated algorithm**. The refinement preserves well-connectedness (item 14's test passes) but does so at the cost of merging structurally-distinct communities from the level-1 output. This flips the expected order: on hub-heavy SBMs, *more algorithm is worse* when the objective is modularity and the target is module recovery. CPM, with its non-resolution-limited objective, sidesteps the issue. The engineering implication: **for AC-3a on this substrate, level-1 greedy modularity is a stronger baseline than multi-level Leiden.** The pattern echoes discovery #11 (multi-level Louvain collapse on hub-heavy SBMs) but at a finer granularity — item 11 said "Louvain aggregation breaks", item 20 says "even Leiden's refinement can't fully repair it because the underlying modularity objective has the resolution-limit issue". CPM (item 17) was the right escape. Code: `tests/acceptance_partition.rs` publishes the new line; no assertion change (ADR §14 threshold discipline). |
 | 21 | CPM-vs-modularity seed-sweep reproducibility | Re-measured CPM @ γ=2.25 vs modularity-Leiden on 5 distinct SBM seeds (0x5FA1DE5, 0xC70F00D, 0xC0DECAFE, 0xBEEFBABE, 0xDEAD1234) at otherwise-default config. **CPM beats modularity on 5 / 5 seeds. Mean ratio 3.98× (matches the 3.97× headline from default seed). Mean CPM full-ARI 0.356, mean modularity full-ARI 0.105. Range 2.04× – 7.34×.** | **The CPM win isn't a single-seed artefact.** Five independent SBMs, five CPM-beats-modularity wins; the 3.98× mean is indistinguishable from the default-seed's 3.97× headline. The range (2.04–7.34) shows seed-dependent variance but no seed where modularity-Leiden catches or beats CPM. This strengthens the item-18 claim from "one measurement showed CPM 3.7× modularity" to "five measurements across different random graphs all show CPM beats modularity by ≥ 2×, mean ~4×." **The 4th-win claim (item 17) is now reproducibility-verified.** Code: `tests/leiden_cpm.rs::leiden_cpm_vs_modularity_across_seeds`, publishes all 5 seed results; asserts only that the mean ratio > 1.0 so a regression in `leiden_labels_cpm` fails loudly. |
+| 22 | CPM-vs-modularity N-scaling sweep | Re-measured CPM @ γ=2.25 vs modularity-Leiden across three SBM scales with density held constant (num_modules = N/15): **N=512 / 35 modules** → cpm_full 0.322, mod_full 0.126, ratio 2.55×; **N=1024 / 70 modules** → 0.425 / 0.107, ratio 3.98× (the headline); **N=2048 / 140 modules** → 0.258 / 0.094, ratio 2.74×. **Mean ratio across scales 3.09×, min 2.55×, max 3.98× — CPM wins at every scale but the advantage peaks at N=1024.** | **CPM's 4× headline is N=1024-specific; the ratio is not scale-invariant.** Two facts here. First, CPM beats modularity-Leiden at every scale tested (2.55× → 3.98× → 2.74×), so the seed-sweep verdict (item 21: "CPM always wins") generalises across scale as well, not just seed. The direction of the finding holds. Second, both algorithms' *absolute* full-partition ARI drops at N=2048 (CPM 0.425 → 0.258; modularity 0.107 → 0.094), and CPM's absolute peak is at N=1024, not at N=2048. So the "0.425 on default SBM" number isn't what you'd quote at larger scales — it's closer to 0.26 at N=2048 with proportional density. This is the first empirical evidence that the CPM quality ridge identified at item 19 (γ ∈ [2.25, 2.5] at N=1024) is substrate-size-dependent; γ=2.25 is probably no longer the peak γ at N=2048, and the γ sweep would need to be re-run per scale to find the true ceiling. The 1.76× gap to the 0.75 AC-3a SOTA target is also N=1024-specific — at N=2048 the gap is ~2.9× under fixed γ. **Engineering implication: the "named next lever" of CPM-specific refinement should be benchmarked at multiple N before the result is quoted as "closes the gap".** Code: `tests/leiden_cpm.rs::leiden_cpm_vs_modularity_across_scales`, publishes per-scale numbers; asserts only that CPM wins at ≥ 1 scale (regression gate). |

 The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Eight** of the fifteen pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13, 15, 16). **Four unambiguous wins now: item 6 (adaptive cadence, 4.29× saturated-regime speedup), item 14 (Leiden refinement, perfect ARI on planted SBM where Louvain collapsed), item 17 (weight-normalized CPM-Leiden, perfect ARI on planted SBM + 109 communities on 70-module default SBM), and item 18 (full-partition ARI metric, lifting CPM's default-SBM score from 0.020 two-way to 0.393 full — 3.7× the modularity-Leiden baseline).** Items 6 and 14 followed the orthogonal-axis pattern. Item 17 was the first "rider from item 16 works as predicted" data point. Item 18 is a different shape again — a **measurement upgrade** that revealed an algorithm's prior 0.020 2-way score was hiding a 0.393 full-partition score. That's a new entry in the lesson catalogue: *a test's coarsening choice is as much a threshold decision as its numerical tolerances.* Three distinct "how a measurement-driven discovery lands" shapes now documented (orthogonal axis / rider matches paper / coarsening upgrade).

--- a/examples/connectome-fly/tests/leiden_cpm.rs
+++ b/examples/connectome-fly/tests/leiden_cpm.rs
@ -372,3 +372,70 @@ fn leiden_cpm_recovers_two_planted_communities() {
    // Publish-only across the sweep. The finding (γ where CPM
    // recovers the planted modules) updates ADR §17 item 16.
 }
+
+#[test]
+fn leiden_cpm_vs_modularity_across_scales() {
+    // N-scaling sweep. The 3.97× full-ARI win (ADR §17 item 20) and
+    // 3.98× mean win across 5 seeds (ADR §17 item 21) were both
+    // measured at N=1024. Does CPM's advantage hold at N=512 and
+    // N=2048? If yes → the pattern is scale-invariant; if it shrinks
+    // or inverts → the advantage is N-dependent and the headline
+    // needs to be qualified.
+    //
+    // Density control: default is N=1024, 70 modules (~14.6
+    // neurons/module). Scale num_modules = N/15 to hold module
+    // size roughly constant; hubs = num_modules / 12 (default ratio
+    // 6/70). Fixed seed isolates scale from seed variance.
+    let scales: [(u32, u16, u16); 3] = [(512, 35, 3), (1024, 70, 6), (2048, 140, 12)];
+    let mut ratios: Vec<f32> = Vec::new();
+    for &(n, m, h) in &scales {
+        let cfg = ConnectomeConfig {
+            num_neurons: n,
+            num_modules: m,
+            num_hub_modules: h,
+            ..ConnectomeConfig::default()
+        };
+        let conn = Connectome::generate(&cfg);
+        let an = Analysis::new(AnalysisConfig::default());
+        let truth_labels: Vec<u32> = (0..conn.num_neurons())
+            .map(|i| conn.meta(connectome_fly::NeuronId(i as u32)).module as u32)
+            .collect();
+        let cpm_labels = connectome_fly::analysis::leiden::leiden_labels_cpm(&conn, 2.25);
+        let mod_labels = an.leiden_labels(&conn);
+        let cpm_full = full_partition_ari(&cpm_labels, &truth_labels);
+        let mod_full = full_partition_ari(&mod_labels, &truth_labels);
+        let ratio = if mod_full.abs() > 1e-4 {
+            cpm_full / mod_full
+        } else {
+            f32::INFINITY
+        };
+        let cpm_d = count_unique(&cpm_labels);
+        let mod_d = count_unique(&mod_labels);
+        eprintln!(
+            "cpm-scale-sweep: N={}  modules={}  cpm_full={:.3} ({}c)  mod_full={:.3} ({}c)  ratio={:.2}×",
+            n, m, cpm_full, cpm_d, mod_full, mod_d, ratio
+        );
+        if ratio.is_finite() {
+            ratios.push(ratio);
+        }
+    }
+    if !ratios.is_empty() {
+        let mean: f32 = ratios.iter().sum::<f32>() / ratios.len() as f32;
+        let min = ratios.iter().cloned().fold(f32::INFINITY, f32::min);
+        let max = ratios.iter().cloned().fold(f32::NEG_INFINITY, f32::max);
+        eprintln!(
+            "cpm-scale-sweep: ratio across {} scales — mean={:.2}×  min={:.2}×  max={:.2}×",
+            ratios.len(),
+            mean,
+            min,
+            max
+        );
+    }
+    // Regression gate: at least one scale must still show CPM beating
+    // modularity-Leiden (ratio > 1.0). If every scale regresses below
+    // parity, the CPM path or normalization broke — loud failure.
+    assert!(
+        ratios.iter().any(|r| *r > 1.0),
+        "cpm-scale-sweep: CPM no longer beats modularity at ANY scale — regression"
+    );
+}