# Community Benchmark Reruns ## Rerun packs, comparisons, and route-aware benchmark evidence This folder is for community-contributed reruns that test atlas routing, first repair moves, or troubleshooting improvements on repeatable examples. Typical contributions here include: - small rerun packs - before and after comparisons - benchmark slices tied to one failure family - structured rerun notes for one troubleshooting setting - route-aware comparison packs --- ## What belongs here Good rerun contributions include: - one small benchmark slice - one clear rerun protocol - one route-aware before and after comparison - one compact result table with method note - one reproducible troubleshooting benchmark example A good rerun contribution should be: - scoped - method-aware - explicit about data source - explicit about limits - tied to atlas routing --- ## What does not belong here Please do not use this folder for: - unsupported score claims - screenshots with no method note - giant benchmark reports with no case framing - unclear comparisons with moving variables - claims that a rerun proves the whole atlas by itself --- ## Suggested rerun pattern A useful rerun contribution usually includes: 1. target task or failure family 2. rerun setup 3. baseline behavior 4. routed or repaired behavior 5. compact result summary 6. limitations That is enough to make the rerun informative. --- ## Suggested naming style Examples: - `f1-grounding-rerun-v1.md` - `f5-trace-uplift-rerun-v1.md` - `f7-structured-output-rerun-v1.md` If code or notebooks are included, place them in a clearly named subfolder. --- ## Before contributing Please read: - [Community Fix Lab](../README.md) - [Contribution Checklist](../../templates/contribution-checklist.md) - [Flagship Fix Demos v1](../../official/flagship-fix-demos-v1.md) --- ## One-line status **This folder holds community reruns that test atlas-guided troubleshooting in compact, repeatable benchmark-style settings.**