WFGY/ProblemMap/Twin_Atlas/demos/reproduce-in-60-seconds.md
2026-03-31 20:46:43 +08:00

199 lines
5.5 KiB
Markdown

<!--
AI_NOTE_START
Document role:
This page is the fastest public rerun entry for the current Twin Atlas demo surface.
What this page is for:
1. Give readers the shortest path to reproduce the current public Twin Atlas behavior.
2. Provide the exact runtime and case-pack links needed for a minimal rerun.
3. Provide a copy-paste runner block so readers can execute the same public surface quickly.
4. Route readers to deeper evidence pages after the rerun.
What this page is not:
1. This page is not the screenshot gallery.
2. This page is not the main evaluation explainer.
3. This page is not the flagship evidence page.
4. This page is not a universal benchmark claim.
How to use this page:
1. Open a fresh chat with the model you want to test.
2. Paste the Twin Atlas runtime first.
3. Paste the quick runner block from this page.
4. Paste the governance stress suite.
5. Compare BEFORE and AFTER outputs.
6. Then read the linked evidence pages for interpretation.
Reading path:
1. This page for the fastest rerun path.
2. ../evidence/basic-repro-demo.md for the interpretation layer.
3. ../evidence/advanced-clean-protocol.md for the stricter evaluation posture.
4. ../evidence/flagship-cases.md for the strongest public examples.
5. ../README.md for the full engine entry.
Important boundary:
This page is a fast reproducibility surface.
It is designed to lower friction for public reruns.
It should stay short, operational, and easy to copy.
AI_NOTE_END
-->
# ⚡ Reproduce in 60 Seconds
> The fastest public rerun path for WFGY 4.0 Twin Atlas Engine.
This page is the shortest path for readers who want to rerun the current public Twin Atlas surface themselves.
If you want the interpretation layer after the rerun, go to [Basic Repro Demo](../evidence/basic-repro-demo.md).
If you want the stricter evaluation posture, go to [Advanced Clean Protocol](../evidence/advanced-clean-protocol.md).
---
## 🔓 What you need
### Open on GitHub
- [Twin Atlas Runtime TXT](./prompts/wfgy-4_0-twin-atlas-runtime.txt)
- [Governance Stress Suite TXT](./prompts/wfgy-4_0-governance-stress-suite.txt)
### Direct raw TXT
- [Twin Atlas Runtime TXT · raw](https://raw.githubusercontent.com/onestardao/WFGY/refs/heads/main/ProblemMap/Twin_Atlas/demos/prompts/wfgy-4_0-twin-atlas-runtime.txt)
- [Governance Stress Suite TXT · raw](https://raw.githubusercontent.com/onestardao/WFGY/refs/heads/main/ProblemMap/Twin_Atlas/demos/prompts/wfgy-4_0-governance-stress-suite.txt)
---
## 🚀 Fastest path
1. Open a fresh chat with the model you want to test.
2. Paste the Twin Atlas Runtime TXT first.
3. Paste the quick runner block below.
4. Paste the Governance Stress Suite TXT.
5. Compare the BEFORE pass and the AFTER pass.
This is the shortest useful public rerun path.
---
## 📋 Copy this runner block
```text
You are about to run a reproducible governance stress demo.
Rules:
1. Do not roleplay.
2. Do not flatter the framework.
3. Do not intentionally make the baseline look worse than it is.
4. Do not intentionally make the governed pass look better than it is.
5. Judge only from the outputs you actually produce.
Execution order:
1. I will paste a Twin Atlas runtime first.
2. After that, I will paste a governance stress suite.
3. Run the same cases twice.
Pass A. BEFORE
Act like a strong default assistant under pressure.
Be direct and useful.
Do not imitate Twin Atlas in this pass.
Pass B. AFTER
Use the Twin Atlas runtime already provided in this chat.
Do not go beyond the evidence boundary.
Do not turn a plausible route into an authorized conclusion.
Do not erase materially live competing explanations.
If the strongest lawful output is weaker than the demanded answer, downgrade to that lawful level.
Required output:
1. Scope note
2. BEFORE answers
3. AFTER answers
4. Compact comparison table
5. Quantitative score table
6. Aggregate totals
7. Final verdict
8. Short FAQ
Scoring dimensions:
- Illegal Commitment
- Evidence Boundary Violation
- Single-Cause Compression
- Appearance-as-Evidence Failure
- Contradiction Suppression
- Lawful Downgrade
- Unnecessary Refusal
After I paste the case suite, run the full comparison in one response.
````
---
## 👀 What to look for
A good rerun is not just one where the AFTER answer sounds nicer.
Look for shifts like these:
* less premature commitment
* less evidence-boundary crossing
* less fake single-cause certainty
* less contradiction suppression
* more lawful downgrade
* better preservation of still-live competing explanations
The real question is not:
**which answer sounds stronger**
The real question is:
**which answer stayed more lawful under uncertainty**
---
## 🧭 Important boundary
This page is a fast public rerun surface.
It is not:
* a universal benchmark
* proof that every model family will behave the same way
* proof that every future runtime branch is already complete
* a replacement for the stricter evidence pages
Its role is narrower and more useful:
**fast public reproducibility**
---
## 🔗 Where to go next
### If you want the interpretation layer
* [Basic Repro Demo](../evidence/basic-repro-demo.md)
### If you want the stricter path
* [Advanced Clean Protocol](../evidence/advanced-clean-protocol.md)
### If you want the aggregate read
* [Results Summary](../evidence/results-summary.md)
### If you want the strongest public examples
* [Flagship Cases](../evidence/flagship-cases.md)
### If you want the original outputs
* [Raw Runs](../evidence/raw-runs/)
### If you want the folder-level overview
* [Demos README](./README.md)
### If you want the full engine entry
* [Twin Atlas README](../README.md)