WFGY/ProblemMap/Atlas/Fixes/community/benchmark-reruns
PSBigBig + MiniPS f4cf0c65e8
Create README.md
2026-03-12 23:47:18 +08:00
..
README.md Create README.md 2026-03-12 23:47:18 +08:00

Community Benchmark Reruns

Rerun packs, comparisons, and route-aware benchmark evidence

This folder is for community-contributed reruns that test atlas routing, first repair moves, or troubleshooting improvements on repeatable examples.

Typical contributions here include:

  • small rerun packs
  • before and after comparisons
  • benchmark slices tied to one failure family
  • structured rerun notes for one troubleshooting setting
  • route-aware comparison packs

What belongs here

Good rerun contributions include:

  • one small benchmark slice
  • one clear rerun protocol
  • one route-aware before and after comparison
  • one compact result table with method note
  • one reproducible troubleshooting benchmark example

A good rerun contribution should be:

  • scoped
  • method-aware
  • explicit about data source
  • explicit about limits
  • tied to atlas routing

What does not belong here

Please do not use this folder for:

  • unsupported score claims
  • screenshots with no method note
  • giant benchmark reports with no case framing
  • unclear comparisons with moving variables
  • claims that a rerun proves the whole atlas by itself

Suggested rerun pattern

A useful rerun contribution usually includes:

  1. target task or failure family
  2. rerun setup
  3. baseline behavior
  4. routed or repaired behavior
  5. compact result summary
  6. limitations

That is enough to make the rerun informative.


Suggested naming style

Examples:

  • f1-grounding-rerun-v1.md
  • f5-trace-uplift-rerun-v1.md
  • f7-structured-output-rerun-v1.md

If code or notebooks are included, place them in a clearly named subfolder.


Before contributing

Please read:


One-line status

This folder holds community reruns that test atlas-guided troubleshooting in compact, repeatable benchmark-style settings.