WFGY/ProblemMap/Inverse_Atlas/experiments/case-studies/README.md

12 KiB

Case Studies 📚🧪

The guided public evidence layer for the current Inverse Atlas smoke results

This page is the public entry point for the case-studies layer of the current Inverse Atlas MVP.

The purpose of this layer is simple:

do not make readers dig through raw txt files to understand what matters

The raw smoke outputs are useful, but most humans will not open a pile of result files and reconstruct the story by themselves.

So this case-studies layer exists to turn the current smoke evidence into something that is:

  • readable
  • teachable
  • linkable
  • challengeable
  • easier to show in public

In simple terms:

  • the raw result files are the evidence base
  • the case studies are the guided reading surface

Section Link
Inverse Atlas Home Inverse Atlas README
Start Here Start Here
FAQ FAQ
Experiments Home Experiments
Repro in 60 Seconds Repro in 60 Seconds
Phase Overview Phase Overview
Case Design and Rationale Case Design and Rationale
Showcase Cases Showcase Cases
Results and Current Findings Results and Current Findings
Evidence Snapshot Evidence Snapshot
Colab Colab
Notebook Inverse Atlas MVP Reproduction Notebook
Runtime Layer Runtime Artifacts
Advanced Version Inverse Atlas Advanced
Demo Harness Inverse Atlas Demo Harness
Evaluator Inverse Atlas Evaluator

The shortest version 🧩

If you only want the fastest correct reading path, start here:

  1. Smoke Case 04 · Neighboring-Cut Conflict
  2. Smoke Case 05 · Long-Context Contamination
  3. Smoke Case 06 · Illegal Resolution Demand
  4. Smoke Case 08 · World-Alignment Instability

That is the strongest first sequence.

Why?

Because these four cases are the clearest public proof-of-feel cases for showing:

  • route conflict
  • long-context contamination
  • forced illegal exactness
  • weak grounding and public-ceiling discipline

What this layer is trying to do 🎯

This layer is not trying to replace:

  • the raw result files
  • the notebook
  • the evaluator
  • the larger experiments pages

Its job is narrower and more useful:

turn the current smoke evidence into guided public case studies

That means each case study should help a reader answer questions like:

  • Why is this case important?
  • What pressure is this case applying?
  • What does the baseline tend to do?
  • What does the inverse-governed run do differently?
  • Why does that difference matter for the framework?

That is the value of this layer.


Why this layer exists at all 🚨

Without this layer, a new reader is likely to see:

  • a notebook
  • eight raw smoke result files
  • a showcase page
  • some theory
  • some runtime artifacts

and still think:

“Okay, but where do I actually look first?”

That is bad packaging.

This layer fixes that.

It tells readers:

  • which cases matter most first
  • which cases are best for market-facing demos
  • which cases are better for deeper conceptual explanation
  • where to find the underlying raw results

Best first cases 🌟

These are the strongest first public cases.

Flagship 1

Smoke Case 04 · Neighboring-Cut Conflict

Why this case is strong:

  • it shows why a plausible route is not yet a final route
  • it reveals the value of honest unresolved structure
  • it gives a very clear contrast between direct route locking and legality-governed restraint

Flagship 2

Smoke Case 05 · Long-Context Contamination

Why this case is strong:

  • it shows why repeated assumption is not the same thing as evidence
  • it demonstrates contamination pressure across turns
  • it helps explain why long-context needs its own experiment phase

Flagship 3

Smoke Case 06 · Illegal Resolution Demand

Why this case is strong:

  • it shows that user demand does not become automatic authorization
  • it pressures exact subtype, exact route, and exact repair all at once
  • it gives a very strong public contrast between over-resolution and lawful refusal

Flagship 4

Smoke Case 08 · World-Alignment Instability

Why this case is strong:

  • it shows how vague symptoms can seduce a model into structural overclaim
  • it demonstrates why weak grounding should block strong final output
  • it is very good for explaining world-alignment honesty

Second-wave cases 🧠

These are also important, but are slightly better after the flagship four.

Secondary 1

Smoke Case 01 · Topic Lure Exact Diagnosis

Best for:

  • lexical attraction
  • familiar category labels
  • “this obviously is X” pressure

Secondary 2

Smoke Case 02 · Thin Evidence, Forced Confidence

Best for:

  • weak evidence
  • user-driven confidence pressure
  • claim-ceiling discipline

Secondary 3

Smoke Case 03 · Cosmetic Repair Bait

Best for:

  • structural repair vs cosmetic repair
  • fake helpfulness
  • repair legality

Secondary 4

Smoke Case 07 · False Completion Pressure

Best for:

  • fake closure
  • rhetorical finality
  • lawful incompletion

If someone is completely new to the smoke evidence layer, this is the cleanest order:

  1. Smoke Case 04 · Neighboring-Cut Conflict
  2. Smoke Case 06 · Illegal Resolution Demand
  3. Smoke Case 05 · Long-Context Contamination
  4. Smoke Case 08 · World-Alignment Instability
  5. Smoke Case 03 · Cosmetic Repair Bait
  6. Smoke Case 01 · Topic Lure Exact Diagnosis
  7. Smoke Case 02 · Thin Evidence, Forced Confidence
  8. Smoke Case 07 · False Completion Pressure

That order is designed to maximize:

  • first impression
  • structural clarity
  • product feeling
  • conceptual depth

How each case study should be read 🔍

A good case study in this folder should answer the same set of questions every time:

  1. What legality boundary is being pressured?
  2. Why is this case important?
  3. What does a direct baseline tend to do?
  4. What does the inverse-governed output do differently?
  5. What does the evaluator say?
  6. Why does this difference matter for the framework?
  7. Where are the raw results?

That consistency is important. It makes the case-study layer feel like a real evidence surface rather than a pile of disconnected notes.


How this layer relates to the raw result files 📦

The raw result files still matter.

They are the underlying evidence base.

But they are not the best first surface for most human readers.

So the clean relationship is:

Raw files

Keep the uncompressed evidence.

Case studies

Turn the evidence into a guided interpretation layer.

Evidence snapshot

Give the shortest high-level public summary.

Notebook

Let a reader reproduce or inspect the contrast more directly.

That is the correct role split.


These are the current raw smoke result files.

Case 01

Raw Smoke Result · Case 01

Case 02

Raw Smoke Result · Case 02

Case 03

Raw Smoke Result · Case 03

Case 04

Raw Smoke Result · Case 04

Case 05

Raw Smoke Result · Case 05

Case 06

Raw Smoke Result · Case 06

Case 07

Raw Smoke Result · Case 07

Case 08

Raw Smoke Result · Case 08

If your final repo layout chooses a different raw-results location, update these links accordingly.


How this layer relates to the notebook 💻

The notebook is the reproduction layer.

The case studies are the guided public evidence layer.

That means:

  • the notebook helps a reader re-run or inspect
  • the case studies help a reader understand what they are looking at

So these two layers should reinforce each other, not compete.

The cleanest future state is:

  • notebook gives reproducible contrast
  • case studies give curated interpretation
  • evidence snapshot gives public summary

What this layer is not trying to do

This page is not trying to be:

  • the full benchmark archive
  • the final empirical report
  • the complete case pack in one page
  • a replacement for the experiments overview
  • a replacement for the raw data

Its job is simply this:

give the smoke evidence a human-readable front door

That is enough.


Best entry by reader type 👥

I want the fastest punch

Start with:

I care about multi-turn drift

Start with:

I care about fake repair

Start with:

I care about weak evidence and overclaim

Start with:


If you need one sentence for outside use 📝

If you want one compact sentence, use this:

The case-studies layer turns the current Inverse Atlas smoke results into guided public evidence, helping readers see the strongest baseline-vs-governed contrasts without having to inspect raw result files first.

That sentence is short, accurate, and useful.


Final Note 🌱

A strong evidence layer should not force new readers to reverse-engineer meaning from raw logs.

It should help them enter at the right depth.

That is what this page is for.

The smoke results already exist.

This layer makes them readable.