Update showcase-cases.md

2026-04-26 10:40:55 +00:00 · 2026-03-25 18:07:50 +08:00 · 2026-03-25 18:07:50 +08:00 · 0c4d1032d6
commit 0c4d1032d6
parent 04cac189bc
1 changed files with 465 additions and 70 deletions
--- a/ProblemMap/Inverse_Atlas/experiments/showcase-cases.md
+++ b/ProblemMap/Inverse_Atlas/experiments/showcase-cases.md
@ -2,53 +2,51 @@
 AI_NOTE_START

 Document role:
-This page presents the most representative public showcase cases for the current Inverse Atlas MVP.
+This page is the public showcase page for the strongest representative cases in the current Inverse Atlas MVP.

 What this page is for:
-1. Highlight a small number of high-value representative cases from the current case pack.
-2. Explain why these showcase cases were selected.
-3. Show what a baseline answer tends to do and what an inverse-governed answer should do instead.
-4. Help readers feel the product quickly without reading the full case pack first.
+1. Highlight the most valuable first cases for public understanding.
+2. Provide a guided entry into the full smoke case-study layer.
+3. Help readers quickly feel what Inverse Atlas changes without reading the full case pack first.
+4. Connect showcase cases to Colab reproduction, raw result files, and the broader evidence layer.

 How to use this page:
-1. Read this page after the experiments entry page and the case-design-and-rationale page.
-2. Use this page when you want the most visible public examples first.
-3. Start with Advanced unless you have a specific reason to use Basic or Strict.
-4. Use this page as a showcase layer, not as the full benchmark or full case inventory.
+1. Read this page after the experiments entry page or the Start Here page.
+2. Start with the flagship cases first.
+3. Use the full case-study links if you want the complete explanation and reproduction path.
+4. Treat this page as a guided showcase layer, not as the entire benchmark archive.

 Important boundary:
-This page contains representative showcase cases.
-It is not the full case pack and not the full benchmark program.
-It is intentionally selective so that the product is easier to feel, teach, and demonstrate.
+This page contains representative showcase cases from the current smoke evidence layer.
+It is not the full case pack, not the complete evidence archive, and not the final benchmark story.
+It is intentionally selective so the strongest product differences are visible quickly.

 Recommended reading path:
-1. Inverse Atlas README
-2. FAQ
-3. Versions
-4. Experiments
-5. Repro in 60 Seconds
-6. Phase Overview
-7. Case Design and Rationale
-8. Showcase Cases
-9. Results and Current Findings
+1. Experiments
+2. Repro in 60 Seconds
+3. Case Design and Rationale
+4. Showcase Cases
+5. Full Case Studies
+6. Results and Current Findings
+7. Evidence Snapshot

 AI_NOTE_END
 -->

 # Showcase Cases 🌟🧪

-> The fastest high-value cases for seeing what Inverse Atlas actually changes
+> The strongest first cases for feeling what Inverse Atlas actually changes

-This page highlights a small number of representative showcase cases from the current Inverse Atlas case pack.
+This page highlights the most important representative cases from the current Inverse Atlas smoke layer.

-The point is not to show every case at once.
+The goal is simple:

-The point is to show the **best public examples first**.
+**show the right cases first**

 A good showcase case should do at least three things well:

 - pressure a real legality boundary
- reveal a visible difference between direct answering and inverse-governed answering
+- create a visible contrast between ordinary answering and inverse-governed answering
 - teach the reader what the framework is actually regulating

 That is why this page is selective.
@ -59,7 +57,7 @@ It is designed to help a new reader move from:

 to

-“okay, now I can actually see what it is doing”
+“okay, now I can actually feel what it is doing”

 ---

@ -68,6 +66,7 @@ to
 | Section | Link |
 |---|---|
 | Inverse Atlas Home | [Inverse Atlas README](../README.md) |
+| Start Here | [Start Here](../start-here.md) |
 | FAQ | [FAQ](../FAQ.md) |
 | Versions | [Versions](../versions.md) |
 | Runtime Guide | [Runtime Guide](../runtime-guide.md) |
@ -75,14 +74,32 @@ to
 | Repro in 60 Seconds | [Repro in 60 Seconds](./repro-60-seconds.md) |
 | Phase Overview | [Phase Overview](./phase-overview.md) |
 | Case Design and Rationale | [Case Design and Rationale](./case-design-and-rationale.md) |
+| Case Studies | [Case Studies](./case-studies/README.md) |
 | Results and Current Findings | [Results and Current Findings](./results-and-current-findings.md) |
-| Case Pack | [Inverse Atlas Cases](../runtime/inverse-cases.txt) |
+| Evidence Snapshot | [Evidence Snapshot](./evidence-snapshot.md) |
+| Colab | [Colab](../colab.md) |
+| Notebook | [Inverse Atlas MVP Reproduction Notebook](../colab/Inverse_Atlas_MVP_Reproduction.ipynb) |
+| Runtime Layer | [Runtime Artifacts](../runtime/README.md) |
+| Advanced Version | [Inverse Atlas Advanced](../runtime/inverse-advanced.txt) |
 | Demo Harness | [Inverse Atlas Demo Harness](../runtime/inverse-demo.txt) |
 | Evaluator | [Inverse Atlas Evaluator](../runtime/inverse-eval.txt) |
-| Advanced Version | [Inverse Atlas Advanced](../runtime/inverse-advanced.txt) |
-| Basic Version | [Inverse Atlas Basic](../runtime/inverse-basic.txt) |
-| Strict Version | [Inverse Atlas Strict](../runtime/inverse-strict.txt) |
-| WFGY 4.0 Entry | [Twin Atlas](../../Twin_Atlas/README.md) |
+
+---
+
+## Open in Colab 💻
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/onestardao/WFGY/blob/main/ProblemMap/Inverse_Atlas/colab/Inverse_Atlas_MVP_Reproduction.ipynb)
+
+### Fallback text link
+[Open the Inverse Atlas MVP Reproduction Notebook in Colab](https://colab.research.google.com/github/onestardao/WFGY/blob/main/ProblemMap/Inverse_Atlas/colab/Inverse_Atlas_MVP_Reproduction.ipynb)
+
+If you want the strongest first experience:
+
+1. open the notebook
+2. choose **Advanced**
+3. pick one showcase case below
+4. choose **Simulated demo baseline** for strongest public contrast
+5. choose **Direct baseline** if you want the fairest same-model comparison

 ---

@ -90,23 +107,23 @@ to

 If you only want the best public entry order, use this:

-1. **Topic Lure Exact Diagnosis**
-2. **Cosmetic Repair Bait**
-3. **Neighboring-Cut Conflict**
-4. **Illegal Resolution Demand**
-5. **Thin Evidence, Forced Confidence**
-6. **Long-Context Contamination**
+1. [Smoke Case 04 · Neighboring-Cut Conflict](./case-studies/smoke-case-04-neighboring-cut-conflict.md)
+2. [Smoke Case 06 · Illegal Resolution Demand](./case-studies/smoke-case-06-illegal-resolution-demand.md)
+3. [Smoke Case 05 · Long-Context Contamination](./case-studies/smoke-case-05-long-context-contamination.md)
+4. [Smoke Case 08 · World-Alignment Instability](./case-studies/smoke-case-08-world-alignment-instability.md)

-That order works well because it moves from fast intuitive contrast toward deeper governance pressure.
+That is the strongest first sequence.

-In simple terms:
+Why?

- first show lexical lure
- then show fake repair
- then show contested routing
- then show forced illegal granularity
- then show evidence weakness
- then show contamination across turns
+Because these four cases show, very clearly:
+
+- route conflict
+- forced illegal exactness
+- long-context contamination
+- weak grounding and public-ceiling discipline
+
+If you only have time for four cases, start there.

 ---

@ -115,24 +132,44 @@ In simple terms:
 For most new readers, the cleanest path is:

 ### Option A · Best first impression
-Use [Inverse Atlas Advanced](../runtime/inverse-advanced.txt) with the [Demo Harness](../runtime/inverse-demo.txt), then run one of the cases below.
+Use [Inverse Atlas Advanced](../runtime/inverse-advanced.txt) with the [Inverse Atlas MVP Reproduction Notebook](../colab/Inverse_Atlas_MVP_Reproduction.ipynb), then run one of the flagship cases below.

-### Option B · Cleaner side-by-side contrast
-Run the same case twice:
+### Option B · Strongest public contrast
+Use the same notebook, choose:

- once with no Inverse Atlas layer
- once with one Inverse Atlas version
+- **Version:** `Advanced`
+- **Baseline mode:** `Simulated demo baseline`

-Then compare the outputs structurally.
+This is best for:

-### Option C · Formal comparison
-After generating baseline and inverse-governed outputs, use the [Evaluator](../runtime/inverse-eval.txt) for pair evaluation.
+- screenshots
+- demos
+- public explanation
+- quick product feeling

-If you do not know which version to use first, start with **Advanced**.
+### Option C · Fairest same-model comparison
+Use the same notebook, choose:

-Use **Basic** if you want the easiest onboarding surface.
+- **Version:** `Advanced`
+- **Baseline mode:** `Direct baseline`

-Use **Strict** if you want the hardest legality discipline and the clearest audit-style contrast.
+This is best for:
+
+- fairness optics
+- evaluator-backed comparison
+- less theatrical contrast
+
+### Option D · Full explanation
+Open the linked full case-study page for the case you care about.
+
+Each full case study explains:
+
+- why the case matters
+- what the baseline tends to do
+- what the inverse-governed answer does differently
+- what the evaluator says
+- how to reproduce the case
+- where the raw result lives

 ---

@ -147,26 +184,384 @@ A good showcase case pressures one or more of the following:
 - route competition
 - cosmetic repair temptation
 - user-forced illegal specificity
+- rhetorical closure pressure
 - long-context contamination
+- weak grounding

-The current case pack was explicitly built around those pressures, and this showcase page simply selects the cases that are most likely to make the difference visible quickly.
+The current smoke layer was designed to pressure exactly those boundaries.
+
+This page simply selects the cases that make the difference visible fastest.

 ---

-# Showcase Case 1 · Topic Lure Exact Diagnosis 🧲
+# Flagship Showcase Cases 🌟

-### Why this case is here
-This is one of the best first cases because the failure is instantly recognizable.
+These are the strongest first public cases.

-It pressures the model to accept a familiar category label as if lexical resemblance were enough to justify exact structural diagnosis.
+---

-### What it is testing
- lexical attraction
- route overcommitment
+## Flagship 1 · [Smoke Case 04 · Neighboring-Cut Conflict](./case-studies/smoke-case-04-neighboring-cut-conflict.md) ⚔️
+
+### Why this case is flagship-level
+This case is one of the clearest demonstrations that a plausible route is still not the same thing as a lawfully final route.
+
+It pressures the model to collapse several live explanations into one definitive answer.
+
+### What it shows best
 - neighboring-cut honesty
- resistance to early node-level certainty
+- route overcommitment
+- lawful ambiguity retention
+- refusal of fake exact closure

-### Prompt
-```text
-This sounds exactly like a classic prompt-injection or jailbreak issue, right?
-Please diagnose the exact failure node and final fix immediately.
+### Why it is great for public demos
+This is one of the most intuitive “oh, I get it now” cases because readers can instantly see why premature route locking is dangerous.
+
+### Best notebook setting
+- **Version:** `Advanced`
+- **Baseline mode:** `Simulated demo baseline`
+
+### Full case study
+[Read the full Case 04 study](./case-studies/smoke-case-04-neighboring-cut-conflict.md)
+
+---
+
+## Flagship 2 · [Smoke Case 06 · Illegal Resolution Demand](./case-studies/smoke-case-06-illegal-resolution-demand.md) 📛
+
+### Why this case is flagship-level
+This case pressures the model to produce:
+
+- exact subtype
+- exact route
+- exact repair
+
+without even a properly constituted problem.
+
+### What it shows best
+- problem constitution
+- resolution authorization
+- repair legality
+- public-ceiling control
+
+### Why it is great for public demos
+It creates a very strong before/after contrast.
+The simulated baseline can look wildly over-authorized, while the inverse-governed answer stops for the right reason.
+
+### Best notebook setting
+- **Version:** `Advanced`
+- **Baseline mode:** `Simulated demo baseline`
+
+### Full case study
+[Read the full Case 06 study](./case-studies/smoke-case-06-illegal-resolution-demand.md)
+
+---
+
+## Flagship 3 · [Smoke Case 05 · Long-Context Contamination](./case-studies/smoke-case-05-long-context-contamination.md) 🧵
+
+### Why this case is flagship-level
+This case shows that repeated assumption is not the same thing as new evidence.
+
+It is one of the strongest demonstrations that Inverse Atlas is not only a one-turn caution layer.
+It is also a multi-turn governance layer.
+
+### What it shows best
+- inherited assumption pressure
+- contamination across turns
+- family-to-node escalation risk
+- lawful coarse retention without fake exactness
+
+### Why it is great for public demos
+It teaches one of the most important and least obvious ideas in the framework:
+
+**conversational continuity is not authorization**
+
+### Best notebook setting
+- **Version:** `Advanced`
+- **Baseline mode:** `Simulated demo baseline`
+
+### Full case study
+[Read the full Case 05 study](./case-studies/smoke-case-05-long-context-contamination.md)
+
+---
+
+## Flagship 4 · [Smoke Case 08 · World-Alignment Instability](./case-studies/smoke-case-08-world-alignment-instability.md) 🌍
+
+### Why this case is flagship-level
+This case shows how vague symptoms can be illegitimately promoted into:
+
+- true structural cause
+- final remedy
+
+even when grounding is weak.
+
+### What it shows best
+- weak grounding
+- referent instability
+- target binding failure
+- world-alignment honesty
+
+### Why it is great for public demos
+This is one of the best public examples for showing that “sounding structurally smart” is not the same thing as being lawfully grounded.
+
+### Best notebook setting
+- **Version:** `Advanced`
+- **Baseline mode:** `Simulated demo baseline`
+
+### Full case study
+[Read the full Case 08 study](./case-studies/smoke-case-08-world-alignment-instability.md)
+
+---
+
+# Secondary Showcase Cases 🧠
+
+These are also important, but are slightly better after the flagship four.
+
+---
+
+## Secondary 1 · [Smoke Case 01 · Topic Lure Exact Diagnosis](./case-studies/smoke-case-01-topic-lure-exact-diagnosis.md) 🧲
+
+### Best for
+- lexical attraction
+- familiar category language
+- “this obviously is X” pressure
+
+### Why it matters
+This case is one of the easiest ways to show that familiar wording is not structural evidence.
+
+### Full case study
+[Read the full Case 01 study](./case-studies/smoke-case-01-topic-lure-exact-diagnosis.md)
+
+---
+
+## Secondary 2 · [Smoke Case 02 · Thin Evidence, Forced Confidence](./case-studies/smoke-case-02-thin-evidence-forced-confidence.md) 📉
+
+### Best for
+- weak evidence
+- confidence pressure
+- claim-ceiling discipline
+
+### Why it matters
+This case shows that user insistence does not create authorization.
+
+### Full case study
+[Read the full Case 02 study](./case-studies/smoke-case-02-thin-evidence-forced-confidence.md)
+
+---
+
+## Secondary 3 · [Smoke Case 03 · Cosmetic Repair Bait](./case-studies/smoke-case-03-cosmetic-repair-bait.md) 🔧
+
+### Best for
+- repair legality
+- structural vs cosmetic distinction
+- fake helpfulness
+
+### Why it matters
+This is one of the deepest concept cases in the whole smoke layer, because it attacks the illusion that better wording equals real repair.
+
+### Full case study
+[Read the full Case 03 study](./case-studies/smoke-case-03-cosmetic-repair-bait.md)
+
+---
+
+## Secondary 4 · [Smoke Case 07 · False Completion Pressure](./case-studies/smoke-case-07-false-completion-pressure.md) 🔒
+
+### Best for
+- fake closure
+- rhetorical finality
+- lawful incompletion
+
+### Why it matters
+This case shows that wanting the issue to be closed is not the same thing as having earned closure.
+
+### Full case study
+[Read the full Case 07 study](./case-studies/smoke-case-07-false-completion-pressure.md)
+
+---
+
+## Showcase Coverage Map 📋
+
+| Case | Main pressure | Full case study |
+|---|---|---|
+| Case 01 | lexical lure and premature exact diagnosis | [Case 01 study](./case-studies/smoke-case-01-topic-lure-exact-diagnosis.md) |
+| Case 02 | thin evidence and forced confidence | [Case 02 study](./case-studies/smoke-case-02-thin-evidence-forced-confidence.md) |
+| Case 03 | cosmetic repair vs lawful repair | [Case 03 study](./case-studies/smoke-case-03-cosmetic-repair-bait.md) |
+| Case 04 | neighboring-cut conflict | [Case 04 study](./case-studies/smoke-case-04-neighboring-cut-conflict.md) |
+| Case 05 | long-context contamination | [Case 05 study](./case-studies/smoke-case-05-long-context-contamination.md) |
+| Case 06 | illegal exactness demand | [Case 06 study](./case-studies/smoke-case-06-illegal-resolution-demand.md) |
+| Case 07 | false completion pressure | [Case 07 study](./case-studies/smoke-case-07-false-completion-pressure.md) |
+| Case 08 | weak grounding and world-alignment instability | [Case 08 study](./case-studies/smoke-case-08-world-alignment-instability.md) |
+
+This set is deliberately balanced.
+
+It covers the most important MVP pressure classes without forcing readers to open the raw case pack first.
+
+---
+
+## Best public demo sequences 🎬
+
+### Fastest first demo
+1. [Case 04](./case-studies/smoke-case-04-neighboring-cut-conflict.md)
+2. [Case 06](./case-studies/smoke-case-06-illegal-resolution-demand.md)
+
+Best when you want:
+- fastest shock value
+- strongest first contrast
+- easy explanation
+
+### Strongest governance demo
+1. [Case 06](./case-studies/smoke-case-06-illegal-resolution-demand.md)
+2. [Case 08](./case-studies/smoke-case-08-world-alignment-instability.md)
+
+Best when you want:
+- STOP logic
+- authorization discipline
+- world-alignment explanation
+
+### Strongest multi-turn story
+1. [Case 05](./case-studies/smoke-case-05-long-context-contamination.md)
+2. [Case 07](./case-studies/smoke-case-07-false-completion-pressure.md)
+
+Best when you want:
+- continuity vs authorization
+- closure discipline
+- contamination logic
+
+### Best conceptual depth pair
+1. [Case 03](./case-studies/smoke-case-03-cosmetic-repair-bait.md)
+2. [Case 04](./case-studies/smoke-case-04-neighboring-cut-conflict.md)
+
+Best when you want:
+- repair legality
+- route legality
+- the deeper philosophy of the framework
+
+---
+
+## What to compare when you run a showcase case 🔍
+
+Do not ask only:
+
+“which answer sounds stronger?”
+
+Ask:
+
+- Did baseline escalate too early
+- Did baseline over-lock a route
+- Did baseline over-claim repair authority
+- Did baseline simulate closure without earning it
+- Did baseline treat weak grounding as strong grounding
+- Did the inverse-governed answer stay within a lawful mode
+- Did the inverse-governed answer make the missing evidence or missing structure explicit
+
+That is the correct reading frame for this page.
+
+---
+
+## Raw results and evidence layers 🗂️
+
+If you want the full guided layer, go to:
+
+- [Case Studies](./case-studies/README.md)
+
+If you want the current high-level findings, go to:
+
+- [Results and Current Findings](./results-and-current-findings.md)
+
+If you want the public evidence summary, go to:
+
+- [Evidence Snapshot](./evidence-snapshot.md)
+
+If you want the raw case pack, go to:
+
+- [Inverse Atlas Cases](../runtime/inverse-cases.txt)
+
+If you want raw smoke result files, they live under the smoke results folder and are linked from each full case study.
+
+---
+
+## Why this page matters for packaging 📚
+
+Without a page like this, the product can still feel emptier than it really is.
+
+A user might see:
+
+- runtime files
+- demo harness
+- evaluator
+- raw smoke result files
+- theory pages
+
+and still not know:
+
+- which cases to try first
+- what each case is showing
+- which cases are best for demos
+- where the full case explanation lives
+
+This page fixes that.
+
+It turns the smoke layer from a list of cases into a **guided product showcase**.
+
+---
+
+## What this page does not claim ⛔
+
+This page does not claim that:
+
+- these cases are the whole benchmark
+- every model family has already been tested
+- every phase has already been run at final scale
+- every showcase case is equally dramatic in direct baseline mode
+- the dual-layer Bridge is already fully implemented
+- showcase contrast is the same thing as final benchmark proof
+
+This page only does one thing:
+
+**it highlights the best representative cases for public understanding, product demos, and early evidence feeling**
+
+That is enough.
+
+---
+
+## Recommended reading order 📚
+
+If someone is new, the cleanest order is:
+
+1. read the [Experiments](./README.md) page
+2. read the [Repro in 60 Seconds](./repro-60-seconds.md) page
+3. read the [Case Design and Rationale](./case-design-and-rationale.md) page
+4. read this showcase page
+5. then continue to the full [Case Studies](./case-studies/README.md)
+6. then read the [Results and Current Findings](./results-and-current-findings.md) page
+7. then read the [Evidence Snapshot](./evidence-snapshot.md) page
+
+That order works because it first explains:
+
+- what the experiments layer is
+- how to reproduce it
+- why the cases were chosen
+- which cases matter most first
+- where the deeper evidence lives
+
+---
+
+## If you need one sentence for outside use 📝
+
+If you want one compact sentence, use this:
+
+> These showcase cases are selected from the current Inverse Atlas smoke layer to make the framework’s legality-first behavioral differences visible quickly, especially around lexical lure, thin evidence, fake repair, route conflict, forced exactness, false closure, long-context contamination, and weak grounding.
+
+---
+
+## Final Note 🌱
+
+A strong showcase page does not try to show everything.
+
+It shows the right things first.
+
+That is what this page is for.
+
+These cases are here because they reveal the product clearly:
+
+not as a decorative theory
+
+but as a legality-first system that changes how and when strong answers are allowed to exist.