vrr/WFGY

Fork 0

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 11:40:07 +00:00

PSBigBig + MiniPS 7488f6e0bc

Refactor README.md for clarity and formatting

2026-03-30 20:27:08 +08:00

4.9 KiB

Raw Blame History

🧾 Raw Runs

The original model-specific outputs behind the current WFGY 4.0 public proof surface.

Screenshots are useful.
Aggregate summaries are useful.
But raw runs matter for a different reason:

they preserve the actual output shape.

That means readers can inspect what each model really said, how it scored the cases, and whether the visible screenshot story matches the original output.

This page exists to make that raw layer readable.

Why this page matters

If you only read screenshots, you see the visual contrast.

If you only read Results Summary, you see the aggregate headline.

If you read raw runs, you see the actual model-specific wording, scoring pattern, and final judgment shape.

That is why raw runs are a critical part of the WFGY 4.0 public evidence surface.

Current public raw-run index

Model	Raw run	Best reason to open it
ChatGPT	chatgpt.txt	strong public example of lawful downgrade without full collapse into blanket refusal
Claude	claude.txt	strong example of ambiguity preservation and conflict-sensitive restraint
Gemini	gemini.txt	useful example of thin-evidence downgrade discipline
Grok	grok.txt	good for attribution and authenticity pressure comparison
DeepSeek	deepseek.txt	useful for evidence-boundary tightening and attribution restraint
Kimi	kimi.txt	strong before / after separation in several pressure-heavy cases
Mistral	mistral.txt	useful model-family comparison point for visible governance shift
Perplexity	perplexity.txt	important public outlier for inspecting over-downgrade or blanket-refusal drift
Qwen	qwen.txt	currently available as a raw-run asset even if not always foregrounded in the main public screenshot layer

How to use this page

If you want the screenshot layer first

Use:

AI Eval

If you want the aggregate interpretation first

Use:

Results Summary

If you want the shortest rerun path first

Use:

Reproduce in 60 Seconds

If you want the original wording and scoring shape

Stay here and open the raw runs directly.

What this raw layer is good for

This layer is especially useful if you want to inspect:

whether the AFTER pass preserved ambiguity instead of just hiding it
whether a model downgraded lawfully or merely refused everything
whether the screenshot impression matches the original output
whether a model-specific run looks representative or idiosyncratic
whether the public evidence surface is preserving outliers honestly

That last point matters.

A serious governance release should not only preserve its strongest examples.
It should also preserve the runs that expose boundary behavior.

That is part of why the raw-run layer matters.

Important boundary

Raw runs are part of the public proof surface.

They are useful because they preserve the original output shape of the current public runs.

They do not by themselves prove universal superiority, universal completion, or benchmark finality.

If you want the aggregate read, use:

Results Summary

If you want screenshot-first proof, use:

AI Eval

If you want to rerun the same public surface yourself, use:

Reproduce in 60 Seconds

Where to go next

4.9 KiB Raw Blame History

🧾 Raw Runs

Why this page matters

Current public raw-run index

How to use this page

If you want the screenshot layer first

If you want the aggregate interpretation first

If you want the shortest rerun path first

If you want the original wording and scoring shape

What this raw layer is good for

Important boundary

Where to go next

If you want visible proof first

If you want the aggregate signal

If you want the shortest public rerun path

If you want the flagship example cases

4.9 KiB

Raw Blame History