test(ac-3a): wire full-partition ARI — greedy beats Leiden, discovery #20

AC-3a now publishes full-partition ARI alongside the 2-way coarsening. Measured on the default N=1024 SBM: 2-way coarsened ARI (inherited, backward-compat): mincut : -0.001 greedy : 0.174 louvain : 0.000 leiden : 0.089 **Full-partition ARI (new, correct metric):** greedy full_ari : **0.308** ← surprising louvain full_ari : 0.000 (collapses) leiden full_ari : 0.107 cpm@γ=2.25 : **0.425** ← still best **20th discovery: Leiden's aggregation+refinement actively HURTS full-partition ARI vs greedy level-1 on this substrate.** Greedy modularity (one pass of local moves, no aggregation) scores 0.308; adding the aggregation + Traag refinement steps drops that to 0.107 — a 2.9× regression from the more sophisticated algorithm. The refinement preserves well-connectedness (leiden_refinement.rs tests still pass) but does so at the cost of merging structurally- distinct communities from the level-1 output. This flips the expected order: on hub-heavy SBMs, *more algorithm is worse* when the objective is modularity and the target is module recovery. CPM (item 17) was the right escape — non- resolution-limited objective sidesteps the issue. Final ranking on default SBM, full-partition ARI: CPM @ γ=2.25 : 0.425 (non-modularity objective) greedy L1 : 0.308 (minimal-algorithm modularity) Leiden : 0.107 (maximal-algorithm modularity) Louvain : 0.000 (aggregation collapses) The pattern echoes item 11 (multi-level Louvain collapse on hub-heavy SBMs) but at a finer granularity: item 11 said 'aggregation breaks', item 20 says 'even Leiden's refinement can't fully repair it because the underlying modularity objective has the resolution-limit issue'. The fix (item 17) was a different objective, not a better algorithm. Engineering implication: **for AC-3a on this substrate, level-1 greedy modularity is a stronger baseline than multi-level Leiden.** The default Louvain / Leiden trajectory assumes increasingly-sophisticated algorithms monotonically improve module recovery; on hub-heavy SBMs that assumption is false, and simpler-is-better up to the CPM break. Files: - tests/acceptance_partition.rs: full_partition_ari helper, new eprintln publishing four full-ARI values against ground- truth module labels. No assertion change (ADR §14 threshold discipline: coarsening choices are decisions, not knobs). - docs/adr/ADR-154: §17 item 20 added with the surprising level-1 vs Leiden inversion and the 'more algorithm is worse' framing on this substrate. All 95 prior tests unchanged. Co-Authored-By: claude-flow <ruv@ruv.net> EOF )
2026-05-27 08:45:07 +00:00 · 2026-04-22 21:59:31 -04:00 · 2026-04-22 21:59:31 -04:00 · cfdcb8bb12
commit cfdcb8bb12
parent 1f085dc35c
2 changed files with 69 additions and 0 deletions
--- a/docs/adr/ADR-154-connectome-embodied-brain-example.md
+++ b/docs/adr/ADR-154-connectome-embodied-brain-example.md
@ -471,6 +471,7 @@ Each of the nine is attached to the commit that produced it and the lesson it en
 | 17 | Weight-normalized CPM + γ-sweep at scale | `leiden_labels_cpm` rewritten to pre-normalize all edge weights by their mean (so mean edge weight = 1.0 and γ is dimensionless). Re-swept across γ ∈ {0.1, 0.5, 1, 2, 4, 8, 16, 32, 64}. **Planted 2-community SBM: ARI = 1.000 at γ ∈ {2, 4}** (perfect recovery, matches modularity-Leiden's planted result — item 14). Default N=1024 hub-heavy SBM: best 2-way-coarsened ARI = 0.020 at γ=2 with **109 distinct communities** (close to the ground-truth 70 modules). | **The weight-normalization rider works.** CPM recovers planted community structure perfectly once γ is in the right scale range (γ ~ super-edge magnitude), confirming the paper's claim and validating the rider from item 16. On multi-module graphs, however, the *2-way coarsening inherited from AC-3a* undersells CPM's output: 109 communities mapped to a hub-vs-non-hub binary label loses nearly all the signal. **The measurement is now the limit, not the algorithm.** The natural next step is a full-partition ARI or a module-recovery fraction metric that respects CPM's native community count. Weight-normalized CPM is a successful item on this branch (second community-detection algorithm that matches its paper's planted-graph performance) — but its win on the 70-module substrate won't be visible until the measurement catches up. Code: unchanged API; `leiden_labels_cpm(conn, gamma)` now takes dimensionless γ. |
 | 18 | Full-partition ARI lifts the measurement | Added `full_partition_ari(predicted, truth)` to `tests/leiden_cpm.rs` — standard Hubert-Arabie ARI against the 70-module SBM ground-truth label vector, not the 2-way hub-vs-non-hub coarsening. Re-measured the γ sweep. **Result on the default N=1024 SBM: modularity-Leiden full_ari = 0.107; CPM @ γ=2 full_ari = 0.393** — a **3.7× improvement** over modularity-Leiden on the correct metric. | **The measurement fix was the lever — not another algorithm.** Item 17 predicted this exactly: CPM's 109 communities were recovering ~57 % of the 70-module structure, but the 2-way coarsening was throwing all of that away. With the correct metric, CPM @ γ=2 becomes the new state-of-the-art community detector on this substrate — **4th unambiguous win on the branch** (after adaptive cadence, modularity-Leiden refinement, weight-normalized CPM at planted scale). Still below the 0.75 AC-3a SOTA target, but the gap is now a tractable 2× rather than a 38× mystery. This also closes out a recurring branch-wide failure mode: AC-3a's 2-way coarsening was inherited uncritically from the first AC-3 test; two community-detection algorithms (Leiden modularity, Leiden CPM) underperformed their paper's claims on it before the metric was finally upgraded. **Lesson for §14 risk register: a test's coarsening choice is as much a threshold decision as its numerical tolerances, and deserves the same review discipline.** Code: `tests/leiden_cpm.rs` helper; no production-code change (this is a measurement-correctness commit, not an algorithm commit). |
 | 19 | Fine-γ sweep refines the CPM peak | Re-swept γ ∈ {1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5, 4.0, …} on the default SBM. **New peak: ari_full = 0.425 at γ ∈ {2.25, 2.5}** with 156 / 171 communities (up from 0.393 @ γ=2.0). At γ = 1.75 CPM recovers **exactly 70 distinct communities** — matching the ground-truth module count — with ari_full = 0.348. | **CPM's quality ridge on this substrate is between γ=1.75 and γ=2.75, peaking at γ ∈ [2.25, 2.5].** Two interesting facts emerge from the fine sweep: (a) the peak ARI is at a γ that produces *more* communities (156) than the ground truth (70), suggesting CPM's over-splitting is *aligned* with ground truth well enough that ARI tolerates the extra fragmentation; (b) the γ = 1.75 point that exactly matches the ground-truth community count (70) actually scores lower (0.348 vs 0.425) — CPM's 70 communities there don't align with the SBM's 70 modules as well as its 156 communities do at γ = 2.25. So on this substrate, "match the community count" and "maximize ARI" are distinct optimization targets, and the γ values for each differ. **CPM-Leiden on the default SBM is now at 0.425 vs modularity-Leiden's 0.107 — a 3.97× improvement, 57 % of the 0.75 AC-3a SOTA target.** The remaining 1.76× gap is likely the modularity-resolution-limit-adjacent ceiling of CPM-without-refinement. Adding a CPM-specific refinement phase (not the current modularity-refinement) is the named next lever. Code: `tests/leiden_cpm.rs` γ-list extended; no production-code change. |
+| 20 | Full-partition ARI wired into AC-3a reveals level-1 greedy beats Leiden | AC-3a now publishes full-partition ARI alongside the 2-way coarsening. **Greedy modularity (level-1) full_ari = 0.308, beats modularity-Leiden's full_ari = 0.107** on the default SBM. Multi-level Louvain collapses (full_ari = 0.000). CPM @ γ=2.25 remains top at 0.425. Final ranking: CPM 0.425 > greedy level-1 0.308 > Leiden 0.107 > Louvain 0.000. | **Leiden's aggregation+refinement actively hurts module recovery on this substrate.** Greedy level-1 (one pass of local moves, no aggregation) gives 0.308 full-partition ARI; adding the aggregation + Traag refinement steps drops it to 0.107 — a **2.9× regression from a more sophisticated algorithm**. The refinement preserves well-connectedness (item 14's test passes) but does so at the cost of merging structurally-distinct communities from the level-1 output. This flips the expected order: on hub-heavy SBMs, *more algorithm is worse* when the objective is modularity and the target is module recovery. CPM, with its non-resolution-limited objective, sidesteps the issue. The engineering implication: **for AC-3a on this substrate, level-1 greedy modularity is a stronger baseline than multi-level Leiden.** The pattern echoes discovery #11 (multi-level Louvain collapse on hub-heavy SBMs) but at a finer granularity — item 11 said "Louvain aggregation breaks", item 20 says "even Leiden's refinement can't fully repair it because the underlying modularity objective has the resolution-limit issue". CPM (item 17) was the right escape. Code: `tests/acceptance_partition.rs` publishes the new line; no assertion change (ADR §14 threshold discipline). |

 The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Eight** of the fifteen pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13, 15, 16). **Four unambiguous wins now: item 6 (adaptive cadence, 4.29× saturated-regime speedup), item 14 (Leiden refinement, perfect ARI on planted SBM where Louvain collapsed), item 17 (weight-normalized CPM-Leiden, perfect ARI on planted SBM + 109 communities on 70-module default SBM), and item 18 (full-partition ARI metric, lifting CPM's default-SBM score from 0.020 two-way to 0.393 full — 3.7× the modularity-Leiden baseline).** Items 6 and 14 followed the orthogonal-axis pattern. Item 17 was the first "rider from item 16 works as predicted" data point. Item 18 is a different shape again — a **measurement upgrade** that revealed an algorithm's prior 0.020 2-way score was hiding a 0.393 full-partition score. That's a new entry in the lesson catalogue: *a test's coarsening choice is as much a threshold decision as its numerical tolerances.* Three distinct "how a measurement-driven discovery lands" shapes now documented (orthogonal axis / rider matches paper / coarsening upgrade).

--- a/examples/connectome-fly/tests/acceptance_partition.rs
+++ b/examples/connectome-fly/tests/acceptance_partition.rs
@ -87,6 +87,35 @@ fn ac_3a_structural_partition_alignment() {
        part.side_b.len()
    );

+    // Full-partition ARI — the correct metric for multi-community
+    // outputs (ADR §17 item 18: the 2-way hub-vs-non-hub coarsening
+    // above undersells algorithms that produce many communities).
+    // Published here alongside the 2-way numbers so reviewers see
+    // both the backward-compatible metric and the one that does
+    // justice to Leiden/CPM. Mincut only gets a 2-way ARI since it
+    // natively outputs a binary partition; the other three get
+    // full-partition ARI against the ground-truth module labels.
+    let truth_labels: Vec<u32> = (0..conn.num_neurons())
+        .map(|i| conn.meta(NeuronId(i as u32)).module as u32)
+        .collect();
+    let ari_greedy_full = full_partition_ari(&labels_gm, &truth_labels);
+    let ari_louvain_full = full_partition_ari(&labels_lv, &truth_labels);
+    let ari_leiden_full = full_partition_ari(&labels_le, &truth_labels);
+    let ari_cpm_best = {
+        // Re-use the same default-SBM sweet spot measured in
+        // `tests/leiden_cpm.rs` item 19 (γ ∈ [2.25, 2.5] → ARI =
+        // 0.425 on this substrate). Single γ here, not a sweep —
+        // the tests/leiden_cpm.rs suite is the place for that.
+        let labels_cpm = connectome_fly::analysis::leiden::leiden_labels_cpm(&conn, 2.25);
+        full_partition_ari(&labels_cpm, &truth_labels)
+    };
+    eprintln!(
+        "ac-3a (full-partition ARI vs 70-module truth): \
+         greedy_full={ari_greedy_full:.3}  louvain_full={ari_louvain_full:.3}  \
+         leiden_full={ari_leiden_full:.3}  cpm@γ=2.25={ari_cpm_best:.3}  \
+         SOTA_target=0.75"
+    );
+
    // The SOTA target is ARI ≥ 0.75. If the mincut partition under the
    // exact-mincut-on-weighted-edges path does not recover the hub
    // boundary at the demo's N=1024 SBM, we record the number and fail
@ -240,6 +269,45 @@ fn adjusted_rand_index<F: Fn(u32) -> bool>(side_a: &[u32], side_b: &[u32], gt_is
    (ij - expected) / denom
 }

+/// Full-partition Adjusted Rand Index between two equal-length label
+/// vectors. Unlike `adjusted_rand_index` above (2-way predicate), this
+/// credits community-detection algorithms for recovering the full
+/// ground-truth partition even when the predicted label count
+/// differs from the truth count. See ADR §17 item 18 for the
+/// discovery of why this metric is the correct one for AC-3a.
+fn full_partition_ari(predicted: &[u32], truth: &[u32]) -> f32 {
+    assert_eq!(predicted.len(), truth.len());
+    let n_total = predicted.len();
+    if n_total < 2 {
+        return 0.0;
+    }
+    fn c2(k: u64) -> f64 {
+        (k as f64) * ((k as f64) - 1.0) / 2.0
+    }
+    let mut contingency: std::collections::HashMap<(u32, u32), u64> =
+        std::collections::HashMap::new();
+    let mut row_sum: std::collections::HashMap<u32, u64> = std::collections::HashMap::new();
+    let mut col_sum: std::collections::HashMap<u32, u64> = std::collections::HashMap::new();
+    for i in 0..n_total {
+        *contingency.entry((predicted[i], truth[i])).or_insert(0) += 1;
+        *row_sum.entry(predicted[i]).or_insert(0) += 1;
+        *col_sum.entry(truth[i]).or_insert(0) += 1;
+    }
+    let index_sum: f64 = contingency.values().map(|n| c2(*n)).sum();
+    let row_c2: f64 = row_sum.values().map(|a| c2(*a)).sum();
+    let col_c2: f64 = col_sum.values().map(|b| c2(*b)).sum();
+    let total = c2(n_total as u64);
+    if total < 1.0 {
+        return 0.0;
+    }
+    let expected = (row_c2 * col_c2) / total;
+    let max_val = 0.5 * (row_c2 + col_c2);
+    if (max_val - expected).abs() < 1e-12 {
+        return 0.0;
+    }
+    ((index_sum - expected) / (max_val - expected)) as f32
+}
+
 // Unused-but-keep-compiling reference for Spike.
 #[allow(dead_code)]
 fn _keep_spike_linked(_s: &Spike) {}