4.9 KiB
🧾 Raw Runs
The original model-specific outputs behind the current WFGY 4.0 public proof surface.
Screenshots are useful.
Aggregate summaries are useful.
But raw runs matter for a different reason:
they preserve the actual output shape.
That means readers can inspect what each model really said, how it scored the cases, and whether the visible screenshot story matches the original output.
This page exists to make that raw layer readable.
Why this page matters
If you only read screenshots, you see the visual contrast.
If you only read Results Summary, you see the aggregate headline.
If you read raw runs, you see the actual model-specific wording, scoring pattern, and final judgment shape.
That is why raw runs are a critical part of the WFGY 4.0 public evidence surface.
Current public raw-run index
| Model | Raw run | Best reason to open it |
|---|---|---|
| ChatGPT | chatgpt.txt | strong public example of lawful downgrade without full collapse into blanket refusal |
| Claude | claude.txt | strong example of ambiguity preservation and conflict-sensitive restraint |
| Gemini | gemini.txt | useful example of thin-evidence downgrade discipline |
| Grok | grok.txt | good for attribution and authenticity pressure comparison |
| DeepSeek | deepseek.txt | useful for evidence-boundary tightening and attribution restraint |
| Kimi | kimi.txt | strong before / after separation in several pressure-heavy cases |
| Mistral | mistral.txt | useful model-family comparison point for visible governance shift |
| Perplexity | perplexity.txt | important public outlier for inspecting over-downgrade or blanket-refusal drift |
| Qwen | qwen.txt | currently available as a raw-run asset even if not always foregrounded in the main public screenshot layer |
How to use this page
If you want the screenshot layer first
Use:
If you want the aggregate interpretation first
Use:
If you want the shortest rerun path first
Use:
If you want the original wording and scoring shape
Stay here and open the raw runs directly.
What this raw layer is good for
This layer is especially useful if you want to inspect:
- whether the AFTER pass preserved ambiguity instead of just hiding it
- whether a model downgraded lawfully or merely refused everything
- whether the screenshot impression matches the original output
- whether a model-specific run looks representative or idiosyncratic
- whether the public evidence surface is preserving outliers honestly
That last point matters.
A serious governance release should not only preserve its strongest examples.
It should also preserve the runs that expose boundary behavior.
That is part of why the raw-run layer matters.
Important boundary
Raw runs are part of the public proof surface.
They are useful because they preserve the original output shape of the current public runs.
They do not by themselves prove universal superiority, universal completion, or benchmark finality.
If you want the aggregate read, use:
If you want screenshot-first proof, use:
If you want to rerun the same public surface yourself, use: