vrr/WFGY

Fork 0

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 03:29:51 +00:00

PSBigBig + MiniPS deb2e7d8f6

Update README.md

2026-03-15 17:49:31 +08:00

21 KiB

Raw Blame History

Flagship Runnable Demo Pack

Problem Map 3.0 Troubleshooting Atlas

Official MVP demo entry for route-first repair

Quick links:

This folder contains the first flagship runnable demos for the Atlas fix system.

These demos are not meant to prove every possible case.
They are meant to prove something more important:

the atlas does not only name failures
it changes the first repair move

That is the entire reason this demo pack exists.

If the atlas is only a naming system, people may find it interesting.

If the atlas can show that different routing decisions lead to different repair decisions, people begin to feel that it is actually useful.

This demo pack is the smallest runnable surface that makes that claim visible.

Each demo is built around the same high-signal pattern:

baseline failure
atlas routing
first repair move
visible outcome shift
optional deeper WFGY 3.0 exploration

The goal is not scale.
The goal is clarity.

This folder should be read as the official public demo surface of the first Atlas fix release.
It is not a giant benchmark zoo.
It is a compact, high-signal proof layer.

What this demo pack proves

This demo pack is designed to make four claims visible.

1. Different failures should not be repaired the same way

Many systems fail with surface similarities.

A fluent wrong answer, a broken workflow, a symbolic collapse, and a black-box debugging problem can all feel like “the system is bad.”

The atlas says that is too coarse.

This demo pack shows that once a case is routed into the right family, the first repair move changes.

2. The atlas is not just a checklist

A checklist can name symptoms.

A troubleshooting atlas should help you decide:

what kind of failure this is
why this family is primary
why a neighboring family is secondary
what should be repaired first
what should not be repaired first

These demos are built to make that difference visible.

3. Route-first repair is practical

The purpose of these notebooks is not to simulate a giant production stack.

The purpose is to show a minimal but convincing pattern:

baseline failure
atlas routing
first repair move
result shift
optional deeper WFGY 3.0 exploration

That is enough to make the system feel real.

4. Community growth becomes much easier once the flagship set exists

These four demos are also templates.

They are not only proofs.
They are seed assets for future contributed demos in:

Colab
JSON fixtures
prompt packs
workflow reproductions
benchmark reruns

This is why the first official set matters so much.

Demo overview

Demo	Family	Core proof	Recommended entry
Demo 1	F1 Grounding & Evidence Integrity	re-grounding changes the first repair path	`demo_01_f1_grounding_anchor_recovery_live.ipynb`
Demo 2	F5 Observability & Diagnosability Integrity	visibility uplift comes before answer repair	`demo_02_f5_observability_first_replay_v2.ipynb`
Demo 3	F4 Execution & Contract Integrity	execution closure repair comes before reasoning-level repair	`demo_03_f4_execution_closure_replay_v2.ipynb`
Demo 4	F7 Representation & Localization Integrity	container repair changes what can stabilize next	`demo_04_f7_container_fidelity_replay_v2.ipynb`

Current MVP status

The first flagship demo pack is now in a usable MVP state.

At the current stage:

Demo 1 includes a live notebook and replay support
Demo 2 is replay-first, with v2 now serving as the recommended replay notebook
Demo 3 is replay-first, with v2 now serving as the recommended replay notebook
Demo 4 is replay-first, with v2 now serving as the recommended replay notebook

This is intentional.

The current pack is designed to prove the strongest teaching pattern in the clearest possible way.

That means:

use live mode where live comparison adds real proof value
use replay mode where replay is clearer, safer, and more honest

This is not a shortcut.
It is a deliberate MVP teaching decision.

At the current release stage, this pack should be treated as the official recommended first demo set for route-first repair.

Current official notebook choices

At the current MVP stage, the recommended notebook entry points are:

Demo 1
- official live notebook:
  - demo_01_f1_grounding_anchor_recovery_live.ipynb
Demo 2
- official recommended replay notebook:
  - demo_02_f5_observability_first_replay_v2.ipynb
- original replay notebook retained as:
  - demo_02_f5_observability_first_replay.ipynb
Demo 3
- official recommended replay notebook:
  - demo_03_f4_execution_closure_replay_v2.ipynb
- original replay notebook retained as:
  - demo_03_f4_execution_closure_replay.ipynb
Demo 4
- official recommended replay notebook:
  - demo_04_f7_container_fidelity_replay_v2.ipynb
- original replay notebook retained as:
  - demo_04_f7_container_fidelity_replay.ipynb

The rule is simple:

original notebooks are preserved as first-pass MVP assets
v2 notebooks are the cleaner recommended replay versions for Demo 2, Demo 3, and Demo 4

If multiple notebooks exist for the same demo, the README and this page should always make the recommended entry point explicit.

Current shared support layer

The demo pack also includes a small official shared support layer under:

shared/

At the current MVP stage, that folder already includes:

README.md
demo_utils.py
display_helpers.py
routing_schema.md

These files exist to keep the official demos more aligned, more readable, and easier to audit.

They are not meant to turn the demo pack into a hidden mini-framework.

In short:

the shared layer already exists
but it remains intentionally small

Why these four demos were chosen

The first flagship set uses four families:

F1 Grounding & Evidence Integrity
F5 Observability & Diagnosability Integrity
F4 Execution & Contract Integrity
F7 Representation & Localization Integrity

This combination was chosen on purpose.

F1 is the best entry point

It is easy to understand and immediately useful.

People instantly understand what it means when an answer looks fluent but is attached to the wrong evidence.

F5 makes engineers pay attention

This demo proves the atlas is not limited to answer quality.

Sometimes the first repair move is not “fix the answer.”

Sometimes the first repair move is “make the failure visible.”

That is a mature debugging idea.

F4 proves the atlas can touch workflow skeletons

This demo shows that the atlas is not only about content generation.

It can also classify and repair problems involving:

readiness
ordering
bridge integrity
liveness
closure

That gives the system real architectural weight.

F7 gives the atlas its sharpest identity

This is one of the most distinctive cuts in the whole map.

It shows that some failures are not reasoning-first or grounding-first.

Sometimes the container that carries structure fails first.

That is a powerful and memorable cut.

The four official demos

This folder is organized around four flagship demos.

Demo 1 · F1 Grounding Anchor Recovery

Theme

A fluent answer fails because it is attached to the wrong evidence anchor.

What this demo proves

the failure is grounding-first
the problem is not mainly “the model is dumb”
the first repair move should be re-grounding, not style rewriting
evidence verification changes the repair path

Who this demo will hit hardest

RAG builders
retrieval engineers
enterprise QA builders
doc QA users
people tired of shallow hallucination discourse

Main lesson

Not all wrong answers are “hallucination” in the same way.

Some are evidence-anchor failures first.

Folder

Demo 1 · F1 Grounding Anchor Recovery

Official notebook

demo_01_f1_grounding_anchor_recovery_live.ipynb

Demo 2 · F5 Observability First

Theme

A failing workflow cannot be repaired correctly because its failure path is still hidden.

What this demo proves

the first failure is diagnosability
the correct first repair move is observability insertion
fixing the answer too early is the wrong move
visibility changes the repair landscape

Who this demo will hit hardest

agent builders
workflow orchestrators
evaluation engineers
anyone who has said “I know it is broken, but I cannot see why”

Main lesson

Sometimes the first repair is not “repair the system.”

Sometimes the first repair is “make the system visible.”

Folder

Demo 2 · F5 Observability First

Official notebook

demo_02_f5_observability_first_replay_v2.ipynb

Original notebook retained

demo_02_f5_observability_first_replay.ipynb

Demo 3 · F4 Execution Closure

Theme

A system fails because execution skeleton closure breaks before reasoning quality even matters.

What this demo proves

the problem is not primarily memory or reasoning
the problem is readiness, ordering, bridge, or liveness
the correct first repair move is execution closure repair
system structure can fail before model reasoning becomes the limiting factor

Who this demo will hit hardest

AI workflow engineers
multi-step system builders
pipeline designers
tool-calling framework users
anyone who has seen “it failed because the sequence itself was wrong”

Main lesson

Some failures are caused by the workflow skeleton, not by intelligence quality.

Folder

Demo 3 · F4 Execution Closure

Official notebook

demo_03_f4_execution_closure_replay_v2.ipynb

Original notebook retained

demo_03_f4_execution_closure_replay.ipynb

Demo 4 · F7 Container Fidelity

Theme

A task looks like reasoning failure, but the structure carrier fails first.

What this demo proves

the problem is not purely progression-first
symbolic or formal containers can fail before reasoning becomes the main issue
the first repair move should target descriptor fidelity or formal adequacy
container-first repair changes what the system can stably do next

Who this demo will hit hardest

structured output builders
JSON and schema users
code and symbolic reasoning users
OCR or layout-sensitive pipeline users
anyone interested in the atlas’s most distinctive knife-cut

Main lesson

Sometimes the system does not fail because it “cannot think.”

Sometimes it fails because the box carrying the thinking is already broken.

Folder

Demo 4 · F7 Container Fidelity

Official notebook

demo_04_f7_container_fidelity_replay_v2.ipynb

Original notebook retained

demo_04_f7_container_fidelity_replay.ipynb

Demo modes

The flagship pack currently uses two practical modes plus one growth mode.

Mode A · Replay mode

This is the default and most important public mode.

It works without any API key.

The user can inspect:

the case
the baseline
the atlas route
the first repair move
the replayed before / after outputs
the explanation of what changed

A person should be able to understand the demo even without running anything.

Mode B · Live reproduction mode

This is optional and only used when live execution adds real value.

If it exists, it should be clearly treated as:

optional
for reproduction
not required to understand the demo
not required to evaluate the atlas concept

Mode C · Community extension mode

This is the growth mode.

Once the official demo exists, contributors should be able to:

swap the input case
swap the model
swap the prompt
swap the fixture
extend the repair path
compare their result to the official version

This is how the long tail grows.

Why only Demo 1 has live mode in the first MVP

This point matters and should be explicit.

In the first MVP release:

Demo 1 includes a live notebook
Demo 2, Demo 3, and Demo 4 are intentionally replay-first, and their current recommended notebooks are the v2 replay notebooks

This is not because the other demos are weaker.

It is because the first thing they need to prove is different.

Why Demo 1 gets live mode first

Demo 1 is the cleanest place to show a real before / after answer shift.

Its teaching value becomes stronger when a reader can see:

baseline answer
repaired answer
anchor correction
result movement

That makes live reproduction especially worthwhile.

Why Demo 2 stays replay-first in the first MVP

Demo 2 is about failure-path visibility.

Its first teaching job is not to show a model being more impressive.
Its first teaching job is to show that:

the system was too opaque to diagnose safely
and the first repair move was visibility uplift

Replay mode is already enough to teach that clearly.

Why Demo 3 stays replay-first in the first MVP

Demo 3 is about execution skeleton closure.

Its teaching center is:

readiness
ordering
bridge integrity
closure

These are structural logic shifts, not model-performance showpieces.

Replay mode is the cleanest and most honest way to teach that in the first release.

Why Demo 4 stays replay-first in the first MVP

Demo 4 is about container fidelity.

Its first teaching job is to make one thing visible:

the form was already failing before deeper reasoning could stabilize

This is mostly a structure-comparison demo, not a live-performance demo.

Replay mode is enough for the first public proof.

The honest design rule

The first MVP should choose the mode that best teaches the pattern.

That means:

use live mode where live comparison adds real proof value
use replay mode where replay is clearer, safer, and more honest

This is not a shortcut.
It is a deliberate teaching design.

API key policy

Some live notebooks may require an API key.

If so, the policy is simple:

no hard-coded keys
no saved secrets in the repository
key entry should happen only at run time
replay mode should still remain readable without a key

Recommended pattern for notebooks:

ask for the key at execution time
keep replay mode readable without key access
clearly state that the notebook is for reproduction, not mandatory usage

This matters because the demos are designed to be understandable even when not executed.

They are proofs of use, not mandatory benchmark rituals.

Minimal asset structure for each demo

Each flagship demo folder should contain the following.

Required

README.md
input_case.json
replay_outputs.json
expected_output.json

Shared support

The folder shared contains the small official support layer for:

formatting
simple output display
schema handling
compact route presentation
optional run-time utilities

This keeps each notebook smaller and easier to audit.

If multiple notebooks exist in one demo folder, the README should clearly identify which one is the recommended official entry point.

What each demo README should explain

Each demo folder README should follow a stable structure.

Required sections

what this demo proves
family route
why not neighbor
baseline failure
first repair move
optional WFGY 3.0 escalation
replay mode
files in this folder
expected outcome
limits of this demo
community extension ideas

This is important because many readers will understand the system from the README alone, without opening the notebook.

Official vs community scope

This folder is the official flagship pack.

That means it should stay:

small
sharp
readable
high-signal
reviewable

The official goal is not to cover everything.

The official goal is to provide the strongest first proofs.

Long-tail expansion belongs to the community structure under:

Community Fix Lab

That is intentional.

Official demos prove the core.
Community demos scale the edge.

Relationship to WFGY 3.0

These demos sit in the middle of a larger repair flow.

Atlas layer

The atlas routes the failure.

Fix surface layer

The official fix surface suggests the first repair move.

WFGY 3.0 layer

WFGY 3.0 supports deeper structural and experimental exploration.

That means these demos should not pretend to be the final end of repair logic.

Instead, they should clearly show:

what the first move is
what changes after that move
when deeper WFGY exploration becomes appropriate

This is why each demo may include an optional section called:

Optional WFGY 3.0 escalation

That section should remain compact and honest.

Why these demos matter

These demos matter because they turn the atlas from:

a strong classification system

into:

a visible troubleshooting system

They help a reader feel, not just believe, that:

different routes lead to different repairs
different repairs produce different outcomes
the atlas changes what happens next

That is the real threshold.

Once that becomes visible, the project stops feeling like a theory-only system.

It starts feeling like a real operating layer.

What this pack does not claim

This pack does not claim that:

four demos are enough to cover the whole atlas
every family already has a runnable asset
every demo must be live-run to be meaningful
replay mode is inferior
deeper repair is already fully solved
community growth is no longer needed

This pack claims only that:

a first official set of flagship demos now exists to prove that route-first repair can be made visible, teachable, and reproducible

What to explore next

After reading this demo pack, most readers continue with:

If this demo pack helped you understand the Atlas, consider:

starring the WFGY repo
opening an issue
testing the demo folders
contributing a clean community extension later

One-line version

This demo pack is the first official proof that Atlas routing changes the first repair move in visible, teachable, and reproducible ways.

Closing note

These four demos are small on purpose.

They are not trying to be a giant benchmark.

They are trying to be the strongest first signal.

If they work, people will immediately understand three things:

the atlas can classify failures more cleanly
the classification changes what should be repaired first
the system can grow far beyond these four examples

That is enough for a flagship MVP.

21 KiB Raw Blame History Unescape Escape

Flagship Runnable Demo Pack

Problem Map 3.0 Troubleshooting Atlas

Official MVP demo entry for route-first repair

What this demo pack proves

1. Different failures should not be repaired the same way

2. The atlas is not just a checklist

3. Route-first repair is practical

4. Community growth becomes much easier once the flagship set exists

Demo overview

Current MVP status

Current official notebook choices

Current shared support layer

Why these four demos were chosen

F1 is the best entry point

F5 makes engineers pay attention

F4 proves the atlas can touch workflow skeletons

F7 gives the atlas its sharpest identity

The four official demos

Demo 1 · F1 Grounding Anchor Recovery

Demo 2 · F5 Observability First

Demo 3 · F4 Execution Closure

Demo 4 · F7 Container Fidelity

Demo modes

Mode A · Replay mode

Mode B · Live reproduction mode

Mode C · Community extension mode

Why only Demo 1 has live mode in the first MVP

Why Demo 1 gets live mode first

Why Demo 2 stays replay-first in the first MVP

Why Demo 3 stays replay-first in the first MVP

Why Demo 4 stays replay-first in the first MVP

The honest design rule

API key policy

Minimal asset structure for each demo

Required

Recommended

Shared support

What each demo README should explain

Required sections

Official vs community scope

Relationship to WFGY 3.0

Atlas layer

Fix surface layer

WFGY 3.0 layer

Why these demos matter

What this pack does not claim

Recommended reading order

What to explore next

One-line version

Closing note

21 KiB

Raw Blame History