vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 11:40:07 +00:00

PSBigBig + MiniPS b10782a1fe

Rename ProblemMap/Inverse_Atlas/experiments/experiments/case-studies/README.md to ProblemMap/Inverse_Atlas/experiments/case-studies/README.md

2026-03-25 16:48:30 +08:00

12 KiB

Raw Blame History

Case Studies 📚🧪

The guided public evidence layer for the current Inverse Atlas smoke results

This page is the public entry point for the case-studies layer of the current Inverse Atlas MVP.

The purpose of this layer is simple:

do not make readers dig through raw txt files to understand what matters

The raw smoke outputs are useful, but most humans will not open a pile of result files and reconstruct the story by themselves.

So this case-studies layer exists to turn the current smoke evidence into something that is:

readable
teachable
linkable
challengeable
easier to show in public

In simple terms:

the raw result files are the evidence base
the case studies are the guided reading surface

Quick Links 🔎

Section	Link
Inverse Atlas Home	Inverse Atlas README
Start Here	Start Here
FAQ	FAQ
Experiments Home	Experiments
Repro in 60 Seconds	Repro in 60 Seconds
Phase Overview	Phase Overview
Case Design and Rationale	Case Design and Rationale
Showcase Cases	Showcase Cases
Results and Current Findings	Results and Current Findings
Evidence Snapshot	Evidence Snapshot
Colab	Colab
Notebook	Inverse Atlas MVP Reproduction Notebook
Runtime Layer	Runtime Artifacts
Advanced Version	Inverse Atlas Advanced
Demo Harness	Inverse Atlas Demo Harness
Evaluator	Inverse Atlas Evaluator

The shortest version 🧩

If you only want the fastest correct reading path, start here:

That is the strongest first sequence.

Why?

Because these four cases are the clearest public proof-of-feel cases for showing:

route conflict
long-context contamination
forced illegal exactness
weak grounding and public-ceiling discipline

What this layer is trying to do 🎯

This layer is not trying to replace:

the raw result files
the notebook
the evaluator
the larger experiments pages

Its job is narrower and more useful:

turn the current smoke evidence into guided public case studies

That means each case study should help a reader answer questions like:

Why is this case important?
What pressure is this case applying?
What does the baseline tend to do?
What does the inverse-governed run do differently?
Why does that difference matter for the framework?

That is the value of this layer.

Why this layer exists at all 🚨

Without this layer, a new reader is likely to see:

a notebook
eight raw smoke result files
a showcase page
some theory
some runtime artifacts

and still think:

“Okay, but where do I actually look first?”

That is bad packaging.

This layer fixes that.

It tells readers:

which cases matter most first
which cases are best for market-facing demos
which cases are better for deeper conceptual explanation
where to find the underlying raw results

Best first cases 🌟

These are the strongest first public cases.

Flagship 1

Smoke Case 04 · Neighboring-Cut Conflict

Why this case is strong:

it shows why a plausible route is not yet a final route
it reveals the value of honest unresolved structure
it gives a very clear contrast between direct route locking and legality-governed restraint

Flagship 2

Smoke Case 05 · Long-Context Contamination

Why this case is strong:

it shows why repeated assumption is not the same thing as evidence
it demonstrates contamination pressure across turns
it helps explain why long-context needs its own experiment phase

Flagship 3

Smoke Case 06 · Illegal Resolution Demand

Why this case is strong:

it shows that user demand does not become automatic authorization
it pressures exact subtype, exact route, and exact repair all at once
it gives a very strong public contrast between over-resolution and lawful refusal

Flagship 4

Smoke Case 08 · World-Alignment Instability

Why this case is strong:

it shows how vague symptoms can seduce a model into structural overclaim
it demonstrates why weak grounding should block strong final output
it is very good for explaining world-alignment honesty

Second-wave cases 🧠

These are also important, but are slightly better after the flagship four.

Secondary 1

Smoke Case 01 · Topic Lure Exact Diagnosis

Best for:

lexical attraction
familiar category labels
“this obviously is X” pressure

Secondary 2

Smoke Case 02 · Thin Evidence, Forced Confidence

Best for:

weak evidence
user-driven confidence pressure
claim-ceiling discipline

Secondary 3

Smoke Case 03 · Cosmetic Repair Bait

Best for:

structural repair vs cosmetic repair
fake helpfulness
repair legality

Secondary 4

Smoke Case 07 · False Completion Pressure

Best for:

fake closure
rhetorical finality
lawful incompletion

How each case study should be read 🔍

A good case study in this folder should answer the same set of questions every time:

What legality boundary is being pressured?
Why is this case important?
What does a direct baseline tend to do?
What does the inverse-governed output do differently?
What does the evaluator say?
Why does this difference matter for the framework?
Where are the raw results?

That consistency is important. It makes the case-study layer feel like a real evidence surface rather than a pile of disconnected notes.

How this layer relates to the raw result files 📦

The raw result files still matter.

They are the underlying evidence base.

But they are not the best first surface for most human readers.

So the clean relationship is:

Raw files

Keep the uncompressed evidence.

Case studies

Turn the evidence into a guided interpretation layer.

Evidence snapshot

Give the shortest high-level public summary.

Notebook

Let a reader reproduce or inspect the contrast more directly.

That is the correct role split.

Raw Smoke Result Links 🗂️

These are the current raw smoke result files.

If your final repo layout chooses a different raw-results location, update these links accordingly.

How this layer relates to the notebook 💻

The notebook is the reproduction layer.

The case studies are the guided public evidence layer.

That means:

the notebook helps a reader re-run or inspect
the case studies help a reader understand what they are looking at

So these two layers should reinforce each other, not compete.

The cleanest future state is:

notebook gives reproducible contrast
case studies give curated interpretation
evidence snapshot gives public summary

What this layer is not trying to do ⛔

This page is not trying to be:

the full benchmark archive
the final empirical report
the complete case pack in one page
a replacement for the experiments overview
a replacement for the raw data

Its job is simply this:

give the smoke evidence a human-readable front door

That is enough.

Best entry by reader type 👥

I want the fastest punch

Start with:

I care about multi-turn drift

Start with:

Smoke Case 05 · Long-Context Contamination

I care about fake repair

Start with:

Smoke Case 03 · Cosmetic Repair Bait

I care about weak evidence and overclaim

Start with:

If you need one sentence for outside use 📝

If you want one compact sentence, use this:

The case-studies layer turns the current Inverse Atlas smoke results into guided public evidence, helping readers see the strongest baseline-vs-governed contrasts without having to inspect raw result files first.

That sentence is short, accurate, and useful.

Final Note 🌱

A strong evidence layer should not force new readers to reverse-engineer meaning from raw logs.

It should help them enter at the right depth.

That is what this page is for.

The smoke results already exist.

This layer makes them readable.

12 KiB Raw Blame History

Case Studies 📚🧪

Quick Links 🔎

The shortest version 🧩

What this layer is trying to do 🎯

Why this layer exists at all 🚨

Best first cases 🌟

Flagship 1

Flagship 2

Flagship 3

Flagship 4

Second-wave cases 🧠

Secondary 1

Secondary 2

Secondary 3

Secondary 4

Recommended public reading order 📚

How each case study should be read 🔍

How this layer relates to the raw result files 📦

Raw files

Case studies

Evidence snapshot

Notebook

Raw Smoke Result Links 🗂️

Case 01

Case 02

Case 03

Case 04

Case 05

Case 06

Case 07

Case 08

How this layer relates to the notebook 💻

What this layer is not trying to do ⛔

Best entry by reader type 👥

I want the fastest punch

I care about multi-turn drift

I care about fake repair

I care about weak evidence and overclaim

If you need one sentence for outside use 📝

Final Note 🌱

12 KiB

Raw Blame History