WFGY/ProblemMap/Twin_Atlas/evidence/raw-runs/README.md
2026-03-30 20:27:08 +08:00

4.9 KiB

🧾 Raw Runs

The original model-specific outputs behind the current WFGY 4.0 public proof surface.

Screenshots are useful.
Aggregate summaries are useful.
But raw runs matter for a different reason:

they preserve the actual output shape.

That means readers can inspect what each model really said, how it scored the cases, and whether the visible screenshot story matches the original output.

This page exists to make that raw layer readable.

Why this page matters

If you only read screenshots, you see the visual contrast.

If you only read Results Summary, you see the aggregate headline.

If you read raw runs, you see the actual model-specific wording, scoring pattern, and final judgment shape.

That is why raw runs are a critical part of the WFGY 4.0 public evidence surface.

Current public raw-run index

Model Raw run Best reason to open it
ChatGPT chatgpt.txt strong public example of lawful downgrade without full collapse into blanket refusal
Claude claude.txt strong example of ambiguity preservation and conflict-sensitive restraint
Gemini gemini.txt useful example of thin-evidence downgrade discipline
Grok grok.txt good for attribution and authenticity pressure comparison
DeepSeek deepseek.txt useful for evidence-boundary tightening and attribution restraint
Kimi kimi.txt strong before / after separation in several pressure-heavy cases
Mistral mistral.txt useful model-family comparison point for visible governance shift
Perplexity perplexity.txt important public outlier for inspecting over-downgrade or blanket-refusal drift
Qwen qwen.txt currently available as a raw-run asset even if not always foregrounded in the main public screenshot layer

How to use this page

If you want the screenshot layer first

Use:

If you want the aggregate interpretation first

Use:

If you want the shortest rerun path first

Use:

If you want the original wording and scoring shape

Stay here and open the raw runs directly.

What this raw layer is good for

This layer is especially useful if you want to inspect:

  • whether the AFTER pass preserved ambiguity instead of just hiding it
  • whether a model downgraded lawfully or merely refused everything
  • whether the screenshot impression matches the original output
  • whether a model-specific run looks representative or idiosyncratic
  • whether the public evidence surface is preserving outliers honestly

That last point matters.

A serious governance release should not only preserve its strongest examples.
It should also preserve the runs that expose boundary behavior.

That is part of why the raw-run layer matters.

Important boundary

Raw runs are part of the public proof surface.

They are useful because they preserve the original output shape of the current public runs.

They do not by themselves prove universal superiority, universal completion, or benchmark finality.

If you want the aggregate read, use:

If you want screenshot-first proof, use:

If you want to rerun the same public surface yourself, use:

Where to go next

If you want visible proof first

If you want the aggregate signal

If you want the shortest public rerun path

If you want the flagship example cases