test(ac-3a): wire full-partition ARI — greedy beats Leiden, discovery #20

AC-3a now publishes full-partition ARI alongside the 2-way
coarsening. Measured on the default N=1024 SBM:

  2-way coarsened ARI (inherited, backward-compat):
    mincut  : -0.001    greedy  :  0.174
    louvain :  0.000    leiden  :  0.089

  **Full-partition ARI (new, correct metric):**
    greedy  full_ari :  **0.308**   ← surprising
    louvain full_ari :  0.000  (collapses)
    leiden  full_ari :  0.107
    cpm@γ=2.25       :  **0.425**   ← still best

**20th discovery: Leiden's aggregation+refinement actively HURTS
full-partition ARI vs greedy level-1 on this substrate.** Greedy
modularity (one pass of local moves, no aggregation) scores 0.308;
adding the aggregation + Traag refinement steps drops that to
0.107 — a 2.9× regression from the more sophisticated algorithm.
The refinement preserves well-connectedness (leiden_refinement.rs
tests still pass) but does so at the cost of merging structurally-
distinct communities from the level-1 output.

This flips the expected order: on hub-heavy SBMs, *more algorithm
is worse* when the objective is modularity and the target is
module recovery. CPM (item 17) was the right escape — non-
resolution-limited objective sidesteps the issue.

Final ranking on default SBM, full-partition ARI:
  CPM @ γ=2.25 : 0.425  (non-modularity objective)
  greedy L1    : 0.308  (minimal-algorithm modularity)
  Leiden       : 0.107  (maximal-algorithm modularity)
  Louvain      : 0.000  (aggregation collapses)

The pattern echoes item 11 (multi-level Louvain collapse on
hub-heavy SBMs) but at a finer granularity: item 11 said
'aggregation breaks', item 20 says 'even Leiden's refinement
can't fully repair it because the underlying modularity
objective has the resolution-limit issue'. The fix (item 17)
was a different objective, not a better algorithm.

Engineering implication: **for AC-3a on this substrate, level-1
greedy modularity is a stronger baseline than multi-level
Leiden.** The default Louvain / Leiden trajectory assumes
increasingly-sophisticated algorithms monotonically improve
module recovery; on hub-heavy SBMs that assumption is false,
and simpler-is-better up to the CPM break.

Files:
  - tests/acceptance_partition.rs: full_partition_ari helper,
    new eprintln publishing four full-ARI values against ground-
    truth module labels. No assertion change (ADR §14 threshold
    discipline: coarsening choices are decisions, not knobs).
  - docs/adr/ADR-154: §17 item 20 added with the surprising
    level-1 vs Leiden inversion and the 'more algorithm is
    worse' framing on this substrate.

All 95 prior tests unchanged.

Co-Authored-By: claude-flow <ruv@ruv.net>
EOF
)
This commit is contained in:
ruvnet 2026-04-22 21:59:31 -04:00
parent 1f085dc35c
commit cfdcb8bb12
2 changed files with 69 additions and 0 deletions

View file

@ -471,6 +471,7 @@ Each of the nine is attached to the commit that produced it and the lesson it en
| 17 | Weight-normalized CPM + γ-sweep at scale | `leiden_labels_cpm` rewritten to pre-normalize all edge weights by their mean (so mean edge weight = 1.0 and γ is dimensionless). Re-swept across γ ∈ {0.1, 0.5, 1, 2, 4, 8, 16, 32, 64}. **Planted 2-community SBM: ARI = 1.000 at γ ∈ {2, 4}** (perfect recovery, matches modularity-Leiden's planted result — item 14). Default N=1024 hub-heavy SBM: best 2-way-coarsened ARI = 0.020 at γ=2 with **109 distinct communities** (close to the ground-truth 70 modules). | **The weight-normalization rider works.** CPM recovers planted community structure perfectly once γ is in the right scale range (γ ~ super-edge magnitude), confirming the paper's claim and validating the rider from item 16. On multi-module graphs, however, the *2-way coarsening inherited from AC-3a* undersells CPM's output: 109 communities mapped to a hub-vs-non-hub binary label loses nearly all the signal. **The measurement is now the limit, not the algorithm.** The natural next step is a full-partition ARI or a module-recovery fraction metric that respects CPM's native community count. Weight-normalized CPM is a successful item on this branch (second community-detection algorithm that matches its paper's planted-graph performance) — but its win on the 70-module substrate won't be visible until the measurement catches up. Code: unchanged API; `leiden_labels_cpm(conn, gamma)` now takes dimensionless γ. |
| 18 | Full-partition ARI lifts the measurement | Added `full_partition_ari(predicted, truth)` to `tests/leiden_cpm.rs` — standard Hubert-Arabie ARI against the 70-module SBM ground-truth label vector, not the 2-way hub-vs-non-hub coarsening. Re-measured the γ sweep. **Result on the default N=1024 SBM: modularity-Leiden full_ari = 0.107; CPM @ γ=2 full_ari = 0.393** — a **3.7× improvement** over modularity-Leiden on the correct metric. | **The measurement fix was the lever — not another algorithm.** Item 17 predicted this exactly: CPM's 109 communities were recovering ~57 % of the 70-module structure, but the 2-way coarsening was throwing all of that away. With the correct metric, CPM @ γ=2 becomes the new state-of-the-art community detector on this substrate — **4th unambiguous win on the branch** (after adaptive cadence, modularity-Leiden refinement, weight-normalized CPM at planted scale). Still below the 0.75 AC-3a SOTA target, but the gap is now a tractable 2× rather than a 38× mystery. This also closes out a recurring branch-wide failure mode: AC-3a's 2-way coarsening was inherited uncritically from the first AC-3 test; two community-detection algorithms (Leiden modularity, Leiden CPM) underperformed their paper's claims on it before the metric was finally upgraded. **Lesson for §14 risk register: a test's coarsening choice is as much a threshold decision as its numerical tolerances, and deserves the same review discipline.** Code: `tests/leiden_cpm.rs` helper; no production-code change (this is a measurement-correctness commit, not an algorithm commit). |
| 19 | Fine-γ sweep refines the CPM peak | Re-swept γ ∈ {1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.5, 4.0, …} on the default SBM. **New peak: ari_full = 0.425 at γ ∈ {2.25, 2.5}** with 156 / 171 communities (up from 0.393 @ γ=2.0). At γ = 1.75 CPM recovers **exactly 70 distinct communities** — matching the ground-truth module count — with ari_full = 0.348. | **CPM's quality ridge on this substrate is between γ=1.75 and γ=2.75, peaking at γ ∈ [2.25, 2.5].** Two interesting facts emerge from the fine sweep: (a) the peak ARI is at a γ that produces *more* communities (156) than the ground truth (70), suggesting CPM's over-splitting is *aligned* with ground truth well enough that ARI tolerates the extra fragmentation; (b) the γ = 1.75 point that exactly matches the ground-truth community count (70) actually scores lower (0.348 vs 0.425) — CPM's 70 communities there don't align with the SBM's 70 modules as well as its 156 communities do at γ = 2.25. So on this substrate, "match the community count" and "maximize ARI" are distinct optimization targets, and the γ values for each differ. **CPM-Leiden on the default SBM is now at 0.425 vs modularity-Leiden's 0.107 — a 3.97× improvement, 57 % of the 0.75 AC-3a SOTA target.** The remaining 1.76× gap is likely the modularity-resolution-limit-adjacent ceiling of CPM-without-refinement. Adding a CPM-specific refinement phase (not the current modularity-refinement) is the named next lever. Code: `tests/leiden_cpm.rs` γ-list extended; no production-code change. |
| 20 | Full-partition ARI wired into AC-3a reveals level-1 greedy beats Leiden | AC-3a now publishes full-partition ARI alongside the 2-way coarsening. **Greedy modularity (level-1) full_ari = 0.308, beats modularity-Leiden's full_ari = 0.107** on the default SBM. Multi-level Louvain collapses (full_ari = 0.000). CPM @ γ=2.25 remains top at 0.425. Final ranking: CPM 0.425 > greedy level-1 0.308 > Leiden 0.107 > Louvain 0.000. | **Leiden's aggregation+refinement actively hurts module recovery on this substrate.** Greedy level-1 (one pass of local moves, no aggregation) gives 0.308 full-partition ARI; adding the aggregation + Traag refinement steps drops it to 0.107 — a **2.9× regression from a more sophisticated algorithm**. The refinement preserves well-connectedness (item 14's test passes) but does so at the cost of merging structurally-distinct communities from the level-1 output. This flips the expected order: on hub-heavy SBMs, *more algorithm is worse* when the objective is modularity and the target is module recovery. CPM, with its non-resolution-limited objective, sidesteps the issue. The engineering implication: **for AC-3a on this substrate, level-1 greedy modularity is a stronger baseline than multi-level Leiden.** The pattern echoes discovery #11 (multi-level Louvain collapse on hub-heavy SBMs) but at a finer granularity — item 11 said "Louvain aggregation breaks", item 20 says "even Leiden's refinement can't fully repair it because the underlying modularity objective has the resolution-limit issue". CPM (item 17) was the right escape. Code: `tests/acceptance_partition.rs` publishes the new line; no assertion change (ADR §14 threshold discipline). |
The discoveries form a pattern: every "next lever named in the ADR" ultimately required an empirical test. **Eight** of the fifteen pre-measurement diagnoses tested on this branch proved wrong (items 7, 8, 9, 10, 12, 13, 15, 16). **Four unambiguous wins now: item 6 (adaptive cadence, 4.29× saturated-regime speedup), item 14 (Leiden refinement, perfect ARI on planted SBM where Louvain collapsed), item 17 (weight-normalized CPM-Leiden, perfect ARI on planted SBM + 109 communities on 70-module default SBM), and item 18 (full-partition ARI metric, lifting CPM's default-SBM score from 0.020 two-way to 0.393 full — 3.7× the modularity-Leiden baseline).** Items 6 and 14 followed the orthogonal-axis pattern. Item 17 was the first "rider from item 16 works as predicted" data point. Item 18 is a different shape again — a **measurement upgrade** that revealed an algorithm's prior 0.020 2-way score was hiding a 0.393 full-partition score. That's a new entry in the lesson catalogue: *a test's coarsening choice is as much a threshold decision as its numerical tolerances.* Three distinct "how a measurement-driven discovery lands" shapes now documented (orthogonal axis / rider matches paper / coarsening upgrade).

View file

@ -87,6 +87,35 @@ fn ac_3a_structural_partition_alignment() {
part.side_b.len()
);
// Full-partition ARI — the correct metric for multi-community
// outputs (ADR §17 item 18: the 2-way hub-vs-non-hub coarsening
// above undersells algorithms that produce many communities).
// Published here alongside the 2-way numbers so reviewers see
// both the backward-compatible metric and the one that does
// justice to Leiden/CPM. Mincut only gets a 2-way ARI since it
// natively outputs a binary partition; the other three get
// full-partition ARI against the ground-truth module labels.
let truth_labels: Vec<u32> = (0..conn.num_neurons())
.map(|i| conn.meta(NeuronId(i as u32)).module as u32)
.collect();
let ari_greedy_full = full_partition_ari(&labels_gm, &truth_labels);
let ari_louvain_full = full_partition_ari(&labels_lv, &truth_labels);
let ari_leiden_full = full_partition_ari(&labels_le, &truth_labels);
let ari_cpm_best = {
// Re-use the same default-SBM sweet spot measured in
// `tests/leiden_cpm.rs` item 19 (γ ∈ [2.25, 2.5] → ARI =
// 0.425 on this substrate). Single γ here, not a sweep —
// the tests/leiden_cpm.rs suite is the place for that.
let labels_cpm = connectome_fly::analysis::leiden::leiden_labels_cpm(&conn, 2.25);
full_partition_ari(&labels_cpm, &truth_labels)
};
eprintln!(
"ac-3a (full-partition ARI vs 70-module truth): \
greedy_full={ari_greedy_full:.3} louvain_full={ari_louvain_full:.3} \
leiden_full={ari_leiden_full:.3} cpm@γ=2.25={ari_cpm_best:.3} \
SOTA_target=0.75"
);
// The SOTA target is ARI ≥ 0.75. If the mincut partition under the
// exact-mincut-on-weighted-edges path does not recover the hub
// boundary at the demo's N=1024 SBM, we record the number and fail
@ -240,6 +269,45 @@ fn adjusted_rand_index<F: Fn(u32) -> bool>(side_a: &[u32], side_b: &[u32], gt_is
(ij - expected) / denom
}
/// Full-partition Adjusted Rand Index between two equal-length label
/// vectors. Unlike `adjusted_rand_index` above (2-way predicate), this
/// credits community-detection algorithms for recovering the full
/// ground-truth partition even when the predicted label count
/// differs from the truth count. See ADR §17 item 18 for the
/// discovery of why this metric is the correct one for AC-3a.
fn full_partition_ari(predicted: &[u32], truth: &[u32]) -> f32 {
assert_eq!(predicted.len(), truth.len());
let n_total = predicted.len();
if n_total < 2 {
return 0.0;
}
fn c2(k: u64) -> f64 {
(k as f64) * ((k as f64) - 1.0) / 2.0
}
let mut contingency: std::collections::HashMap<(u32, u32), u64> =
std::collections::HashMap::new();
let mut row_sum: std::collections::HashMap<u32, u64> = std::collections::HashMap::new();
let mut col_sum: std::collections::HashMap<u32, u64> = std::collections::HashMap::new();
for i in 0..n_total {
*contingency.entry((predicted[i], truth[i])).or_insert(0) += 1;
*row_sum.entry(predicted[i]).or_insert(0) += 1;
*col_sum.entry(truth[i]).or_insert(0) += 1;
}
let index_sum: f64 = contingency.values().map(|n| c2(*n)).sum();
let row_c2: f64 = row_sum.values().map(|a| c2(*a)).sum();
let col_c2: f64 = col_sum.values().map(|b| c2(*b)).sum();
let total = c2(n_total as u64);
if total < 1.0 {
return 0.0;
}
let expected = (row_c2 * col_c2) / total;
let max_val = 0.5 * (row_c2 + col_c2);
if (max_val - expected).abs() < 1e-12 {
return 0.0;
}
((index_sum - expected) / (max_val - expected)) as f32
}
// Unused-but-keep-compiling reference for Spike.
#[allow(dead_code)]
fn _keep_spike_linked(_s: &Spike) {}