diff --git a/docs/adr/ADR-154-connectome-embodied-brain-example.md b/docs/adr/ADR-154-connectome-embodied-brain-example.md index e8252f09..6ba3496d 100644 --- a/docs/adr/ADR-154-connectome-embodied-brain-example.md +++ b/docs/adr/ADR-154-connectome-embodied-brain-example.md @@ -471,6 +471,7 @@ Each of the nine is attached to the commit that produced it and the lesson it en | 17 | Weight-normalized CPM + γ-sweep at scale | `leiden_labels_cpm` rewritten to pre-normalize all edge weights by their mean (so mean edge weight = 1.0 and γ is dimensionless). Re-swept across γ ∈ {0.1, 0.5, 1, 2, 4, 8, 16, 32, 64}. **Planted 2-community SBM: ARI = 1.000 at γ ∈ {2, 4}** (perfect recovery, matches modularity-Leiden's planted result — item 14). Default N=1024 hub-heavy SBM: best 2-way-coarsened ARI = 0.020 at γ=2 with **109 distinct communities** (close to the ground-truth 70 modules). | **The weight-normalization rider works.** CPM recovers planted community structure perfectly once γ is in the right scale range (γ ~ super-edge magnitude), confirming the paper's claim and validating the rider from item 16. On multi-module graphs, however, the *2-way coarsening inherited from AC-3a* undersells CPM's output: 109 communities mapped to a hub-vs-non-hub binary label loses nearly all the signal. **The measurement is now the limit, not the algorithm.** The natural next step is a full-partition ARI or a module-recovery fraction metric that respects CPM's native community count. Weight-normalized CPM is a successful item on this branch (second community-detection algorithm that matches its paper's planted-graph performance) — but its win on the 70-module substrate won't be visible until the measurement catches up. Code: unchanged API; `leiden_labels_cpm(conn, gamma)` now takes dimensionless γ. | | 18 | Full-partition ARI lifts the measurement | Added `full_partition_ari(predicted, truth)` to `tests/leiden_cpm.rs` — standard Hubert-Arabie ARI against the 70-module SBM ground-truth label vector, not the 2-way hub-vs-non-hub coarsening. Re-measured the γ sweep. **Result on the default N=1024 SBM: modularity-Leiden full_ari = 0.107; CPM @ γ=2 full_ari = 0.393** — a **3.7× improvement** over modularity-Leiden on the correct metric. | **The measurement fix was the lever — not another algorithm.** Item 17 predicted this exactly: CPM's 109 communities were recovering ~57 % of the 70-module structure, but the 2-way coarsening was throwing all of that away. With the correct metric, CPM @ γ=2 becomes the new state-of-the-art community detector on this substrate — **4th unambiguous win on the branch** (after adaptive cadence, modularity-Leiden refinement, weight-normalized CPM at planted scale). Still below the 0.75 AC-3a SOTA target, but the gap is now a tractable 2× rather than a 38× mystery. This also closes out a recurring branch-wide failure mode: AC-3a's 2-way coarsening was inherited uncritically from the first AC-3 test; two community-detection algorithms (Leiden modularity, Leiden CPM) underperformed their paper's claims on it before the metric was finally upgraded. **Lesson for §14 risk register: a test's coarsening choice is as much a threshold decision as its numerical tolerances, and deserves the same review discipline.** Code: `tests/leiden_cpm.rs` helper; no production-code change (this is a measurement-correctness commit, not an algorithm commit). | | 19 | Fine-γ sweep refines the CPM peak | Re-swept γ ∈ {1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5, 4.0, …} on the default SBM. **New peak: ari_full = 0.425 at γ ∈ {2.25, 2.5}** with 156 / 171 communities (up from 0.393 @ γ=2.0). At γ = 1.75 CPM recovers **exactly 70 distinct communities** — matching the ground-truth module count — with ari_full = 0.348. | **CPM's quality ridge on this substrate is between γ=1.75 and γ=2.75, peaking at γ ∈ [2.25, 2.5].** Two interesting facts emerge from the fine sweep: (a) the peak ARI is at a γ that produces *more* communities (156) than the ground truth (70), suggesting CPM's over-splitting is *aligned* with ground truth well enough that ARI tolerates the extra fragmentation; (b) the γ = 1.75 point that exactly matches the ground-truth community count (70) actually scores lower (0.348 vs 0.425) — CPM's 70 communities there don't align with the SBM's 70 modules as well as its 156 communities do at γ = 2.25. So on this substrate, "match the community count" and "maximize ARI" are distinct optimization targets, and the γ values for each differ. **CPM-Leiden on the default SBM is now at 0.425 vs modularity-Leiden's 0.107 — a 3.97× improvement, 57 % of the 0.75 AC-3a SOTA target.** The remaining 1.76× gap is likely the modularity-resolution-limit-adjacent ceiling of CPM-without-refinement. Adding a CPM-specific refinement phase (not the current modularity-refinement) is the named next lever. Code: `tests/leiden_cpm.rs` γ-list extended; no production-code change. | +| 20 | Full-partition ARI wired into AC-3a reveals level-1 greedy beats Leiden | AC-3a now publishes full-partition ARI alongside the 2-way coarsening. **Greedy modularity (level-1) full_ari = 0.308, beats modularity-Leiden's full_ari = 0.107** on the default SBM. Multi-level Louvain collapses (full_ari = 0.000). CPM @ γ=2.25 remains top at 0.425. Final ranking: CPM 0.425 > greedy level-1 0.308 > Leiden 0.107 > Louvain 0.000. | **Leiden's aggregation+refinement actively hurts module recovery on this substrate.** Greedy level-1 (one pass of local moves, no aggregation) gives 0.308 full-partition ARI; adding the aggregation + Traag refinement steps drops it to 0.107 — a **2.9× regression from a more sophisticated algorithm**. The refinement preserves well-connectedness (item 14's test passes) but does so at the cost of merging structurally-distinct communities from the level-1 output. This flips the expected order: on hub-heavy SBMs, *more algorithm is worse* when the objective is modularity and the target is module recovery. CPM, with its non-resolution-limited objective, sidesteps the issue. The engineering implication: **for AC-3a on this substrate, level-1 greedy modularity is a stronger baseline than multi-level Leiden.** The pattern echoes discovery #11 (multi-level Louvain collapse on hub-heavy SBMs) but at a finer granularity — item 11 said "Louvain aggregation breaks", item 20 says "even Leiden's refinement can't fully repair it because the underlying modularity objective has the resolution-limit issue". CPM (item 17) was the right escape. Code: `tests/acceptance_partition.rs` publishes the new line; no assertion change (ADR §14 threshold discipline). | The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Eight** of the fifteen pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13, 15, 16). **Four unambiguous wins now: item 6 (adaptive cadence, 4.29× saturated-regime speedup), item 14 (Leiden refinement, perfect ARI on planted SBM where Louvain collapsed), item 17 (weight-normalized CPM-Leiden, perfect ARI on planted SBM + 109 communities on 70-module default SBM), and item 18 (full-partition ARI metric, lifting CPM's default-SBM score from 0.020 two-way to 0.393 full — 3.7× the modularity-Leiden baseline).** Items 6 and 14 followed the orthogonal-axis pattern. Item 17 was the first "rider from item 16 works as predicted" data point. Item 18 is a different shape again — a **measurement upgrade** that revealed an algorithm's prior 0.020 2-way score was hiding a 0.393 full-partition score. That's a new entry in the lesson catalogue: *a test's coarsening choice is as much a threshold decision as its numerical tolerances.* Three distinct "how a measurement-driven discovery lands" shapes now documented (orthogonal axis / rider matches paper / coarsening upgrade). diff --git a/examples/connectome-fly/tests/acceptance_partition.rs b/examples/connectome-fly/tests/acceptance_partition.rs index a462651b..5dea0d2a 100644 --- a/examples/connectome-fly/tests/acceptance_partition.rs +++ b/examples/connectome-fly/tests/acceptance_partition.rs @@ -87,6 +87,35 @@ fn ac_3a_structural_partition_alignment() { part.side_b.len() ); + // Full-partition ARI — the correct metric for multi-community + // outputs (ADR §17 item 18: the 2-way hub-vs-non-hub coarsening + // above undersells algorithms that produce many communities). + // Published here alongside the 2-way numbers so reviewers see + // both the backward-compatible metric and the one that does + // justice to Leiden/CPM. Mincut only gets a 2-way ARI since it + // natively outputs a binary partition; the other three get + // full-partition ARI against the ground-truth module labels. + let truth_labels: Vec = (0..conn.num_neurons()) + .map(|i| conn.meta(NeuronId(i as u32)).module as u32) + .collect(); + let ari_greedy_full = full_partition_ari(&labels_gm, &truth_labels); + let ari_louvain_full = full_partition_ari(&labels_lv, &truth_labels); + let ari_leiden_full = full_partition_ari(&labels_le, &truth_labels); + let ari_cpm_best = { + // Re-use the same default-SBM sweet spot measured in + // `tests/leiden_cpm.rs` item 19 (γ ∈ [2.25, 2.5] → ARI = + // 0.425 on this substrate). Single γ here, not a sweep — + // the tests/leiden_cpm.rs suite is the place for that. + let labels_cpm = connectome_fly::analysis::leiden::leiden_labels_cpm(&conn, 2.25); + full_partition_ari(&labels_cpm, &truth_labels) + }; + eprintln!( + "ac-3a (full-partition ARI vs 70-module truth): \ + greedy_full={ari_greedy_full:.3} louvain_full={ari_louvain_full:.3} \ + leiden_full={ari_leiden_full:.3} cpm@γ=2.25={ari_cpm_best:.3} \ + SOTA_target=0.75" + ); + // The SOTA target is ARI ≥ 0.75. If the mincut partition under the // exact-mincut-on-weighted-edges path does not recover the hub // boundary at the demo's N=1024 SBM, we record the number and fail @@ -240,6 +269,45 @@ fn adjusted_rand_index bool>(side_a: &[u32], side_b: &[u32], gt_is (ij - expected) / denom } +/// Full-partition Adjusted Rand Index between two equal-length label +/// vectors. Unlike `adjusted_rand_index` above (2-way predicate), this +/// credits community-detection algorithms for recovering the full +/// ground-truth partition even when the predicted label count +/// differs from the truth count. See ADR §17 item 18 for the +/// discovery of why this metric is the correct one for AC-3a. +fn full_partition_ari(predicted: &[u32], truth: &[u32]) -> f32 { + assert_eq!(predicted.len(), truth.len()); + let n_total = predicted.len(); + if n_total < 2 { + return 0.0; + } + fn c2(k: u64) -> f64 { + (k as f64) * ((k as f64) - 1.0) / 2.0 + } + let mut contingency: std::collections::HashMap<(u32, u32), u64> = + std::collections::HashMap::new(); + let mut row_sum: std::collections::HashMap = std::collections::HashMap::new(); + let mut col_sum: std::collections::HashMap = std::collections::HashMap::new(); + for i in 0..n_total { + *contingency.entry((predicted[i], truth[i])).or_insert(0) += 1; + *row_sum.entry(predicted[i]).or_insert(0) += 1; + *col_sum.entry(truth[i]).or_insert(0) += 1; + } + let index_sum: f64 = contingency.values().map(|n| c2(*n)).sum(); + let row_c2: f64 = row_sum.values().map(|a| c2(*a)).sum(); + let col_c2: f64 = col_sum.values().map(|b| c2(*b)).sum(); + let total = c2(n_total as u64); + if total < 1.0 { + return 0.0; + } + let expected = (row_c2 * col_c2) / total; + let max_val = 0.5 * (row_c2 + col_c2); + if (max_val - expected).abs() < 1e-12 { + return 0.0; + } + ((index_sum - expected) / (max_val - expected)) as f32 +} + // Unused-but-keep-compiling reference for Spike. #[allow(dead_code)] fn _keep_spike_linked(_s: &Spike) {}