vrr/WFGY

Fork 0

mirror of https://github.com/onestardao/WFGY.git synced 2026-05-01 21:11:11 +00:00

PSBigBig + MiniPS 429d64493e

Update README.md

2026-03-15 21:48:08 +08:00

16 KiB

Raw Blame History

Demo 3 · F4 Execution Closure

Problem Map 3.0 Troubleshooting Atlas

Official flagship demo for execution-first repair

Quick links:

Open in Colab to inspect the recommended replay notebook presentation.
The README and replay artifacts are already enough to understand the teaching pattern.

Replay-only MVP
No API key required

This is the third flagship demo in the official runnable demo pack.

If the AI eval snapshots suggest that better routing may reduce hidden debugging waste, this page goes one step further.

This demo is meant to make one mechanism-level claim visible:

when the first broken layer is execution closure,
the first repair move becomes readiness, bridge, ordering, and closure repair

It was chosen because many multi-step systems fail in a way that gets misread immediately.

People often say:

the model forgot context
memory is unstable
the agent lost the thread
reasoning quality is weak

Sometimes those explanations are tempting.

But in many real cases, the first failure is simpler and more structural:

the workflow skeleton failed to close
a readiness gate was skipped
a dependency order broke
a bridge did not connect
liveness collapsed before the task could complete

This demo is designed to make that difference visible.

It shows that once a case is routed as F4 Execution & Contract Integrity, the first repair move changes from “help the model think better” to “repair the execution skeleton.”

Quick start

Use this order for the shortest path through the demo:

read Section 1 · What this demo proves
read Section 3 · Why not F3 first
read Section 5 · First repair move
open replay_outputs.json
open the recommended replay notebook only if you want the notebook presentation in Colab

The shortest takeaway is:

if the workflow moved forward before closure existed, repair the skeleton first

1. What this demo proves

This demo proves four things.

A. Some failures are caused by sequence, not intelligence

A system can fail even if the model is fully capable of the task in principle.

The reason may be:

a precondition was not met
a readiness check was skipped
a bridge between steps was broken
the system advanced in the wrong order
a liveness path stalled before completion

That is not a continuity-first failure.
That is an execution-first failure.

B. Correct routing changes the first repair move

If the case is routed correctly into F4, the first repair move becomes:

readiness validation
ordering repair
bridge integrity check
closure-path tracing
liveness restoration

This is different from trying to repair memory, role persistence, or generic reasoning first.

C. Replay mode is enough to teach the pattern

This demo is intentionally replay-first.

A user should be able to understand the logic of the failure without running anything.

The replay artifacts should make visible that the task failed because the operational skeleton broke, not because the model lacked raw capability.

D. This MVP does not require live mode

For this first release, replay mode is enough.

The point of this demo is not notebook realism.
The point is to make the before / after execution-state shift easy to inspect and easy to teach.

2. Family route

Primary family

F4 · Execution & Contract Integrity

Secondary family

F3 · State & Continuity Integrity

Why F4 is primary

The first broken invariant is not that the system forgot who it was, lost continuity, or failed to preserve state.

The first broken invariant is that the workflow skeleton itself failed to close correctly.

Typical forms include:

readiness failure
ordering failure
bridge failure
liveness failure
closure failure

Short routing statement

the task failed
the sequence or skeleton broke first
execution closure fails before continuity becomes the main issue

Best current fit

For the cleanest MVP version, the recommended center is:

F4_N03 Pre-Readiness Execution Failure

This makes the structural failure easy to teach and visually obvious.

3. Why not F3 first

The main tempting neighboring cut is F3.

That temptation is understandable because many execution failures look like continuity failures from the outside.

A user may observe:

the agent seems inconsistent
state seems lost
the thread looks broken
role persistence looks weak

But this demo is not mainly about persistence failure.

It is mainly about this:

the system moved forward before the task was actually ready to move forward

That means the first failure lies in execution closure, not in continuity thread preservation.

Wrong cut

“This is mainly a memory or continuity problem.”

Better cut

“This is mainly an execution-closure problem with continuity pressure at the edge.”

That distinction matters because the first repair move changes immediately.

If you cut too early to F3, you may waste time strengthening persistence when the real problem is a broken gate or sequencing rule.

4. Baseline failure

The baseline case is intentionally small, readable, and mechanically clear.

Core pattern

A workflow has multiple stages.

A later stage depends on an earlier stage being complete or valid.

But the system proceeds anyway.

Possible versions include:

a retrieval stage fails, but answer generation still fires
a tool output is missing, but summarization still runs
a structured parse is invalid, but downstream transformation still continues
a bridge between two stages is thin or missing, but the system acts as if closure exists

What the baseline should make visible

The user should be able to see:

the intended multi-step flow
the missing or broken precondition
the fact that the system still advanced
the bad result that followed
why this is a closure failure rather than a generic “bad output”

Important design note

Do not make the workflow too complicated.

The goal is not to simulate a giant orchestration framework.
The goal is to make one execution-first failure pattern easy to teach.

5. First repair move

Once the case is routed to F4, the first repair move should be structural and explicit.

Recommended first repair stack

readiness validation
Check whether the upstream stage actually produced what the downstream stage requires.
ordering validation
Confirm that steps are running in the correct dependency order.
bridge integrity check
Verify that the output of one stage is actually usable by the next stage.
closure-path tracing
Make the dependency path visible enough to know where closure broke.
liveness repair if needed
If the system stalls after partial progress, restore a viable completion path.

What should not happen first

Do not start with:

memory strengthening
longer prompts
generic chain-of-thought expansion
more retries without gate repair
“just use a stronger model” reactions

Those may change surface behavior, but they do not repair the broken skeleton.

First repair principle

if the workflow advanced before it was ready, repair readiness and closure first

That is the teaching core of this demo.

6. Optional WFGY 3.0 escalation

This demo can teach its core lesson without deeper escalation.

But if the user wants to explore a stronger execution diagnosis path, this is where WFGY 3.0 becomes useful.

When escalation makes sense

Escalate beyond the simple first repair move if:

closure still fails after obvious gate repair
multiple bridge layers are involved
local fixes improve one stage but break another
the workflow contains fragile cross-layer dependencies
the team wants a deeper experimental map of liveness and closure failure

What WFGY 3.0 can add

A WFGY 3.0 handoff can help explore:

closure stress variants
bridge fragility surfaces
sequencing experiments
liveness degradation paths
retry / fallback interaction patterns
reusable execution-harness design

Correct relationship

The correct order remains:

atlas route
first execution repair
reassess the case
escalate into deeper WFGY exploration if needed

The atlas is still the router.
WFGY 3.0 is the deeper experimental engine.

7. Replay mode

Replay mode is the default public reading mode.

It requires no API key and no notebook execution.

Replay mode should show

the intended workflow structure
the baseline flow
the missing readiness or bridge condition
the family route
the why-primary-not-secondary statement
the broken invariant
the first repair move
the replayed improved workflow behavior
a short explanation of what changed

What replay mode proves

Replay mode proves that:

execution-first routing is understandable
the first repair move is visibly different
a structural fix can be shown without giant infrastructure
the demo teaches through workflow logic, not just through output text

Why replay mode matters

A strong F4 demo should let a reader immediately feel:

the system moved too early

If that feeling is not visible in replay mode, the demo is weak.

This is enough for the first public MVP.

Core replay artifacts:

8. Current notebook versions

This folder currently preserves two replay notebook versions.

Recommended version

demo_03_f4_execution_closure_replay_v2.ipynb

This is the current recommended notebook for public reading, Colab use, and MVP presentation.

It keeps the replay-first design, while adding:

clearer route summary
clearer workflow-structure framing
stronger before / after teaching contrast
better aligned shared-layer presentation

Original version

demo_03_f4_execution_closure_replay.ipynb

This version is intentionally retained.

It remains part of the demo history and preserves the earlier first-pass MVP presentation.

Version rule

The rule for this folder is simple:

v1 is preserved
v2 is the recommended replay notebook
replay-first remains the official design center for Demo 3

9. Files in this folder

Core files

Optional future additions

workflow notes
helper references from the shared folder
patch notes for stronger closure contrast

10. Expected outcome

If the demo works, the user should walk away with the following understanding:

the system failed because execution closure broke before the task was actually ready
the atlas routed the case to F4, not F3
the first repair move was structural, not memory-oriented
after readiness or closure repair, the workflow moved in the expected direction

The demo does not need to solve every downstream problem.

It only needs to show that correct routing changes what gets repaired first.

That is enough.

11. Limits of this demo

This demo has real limits, and those limits should be stated clearly.

It does not prove

that every multi-step failure belongs to F4
that continuity never matters in workflow systems
that one gate repair solves all closure failures
that this one workflow pattern covers all orchestration systems
that F4 is more important than neighboring families in all cases

It does prove

that some failures are execution-first
that route-first diagnosis changes the first repair move
that structural closure can be demonstrated cleanly and teachably

These are already strong claims.
There is no need to overclaim.

12. Community extension ideas

This demo is also a seed template for future community work.

Possible extensions include:

adding different workflow types
adding tool-to-tool bridge cases
comparing retry-first vs gate-first repair
adding richer closure traces
testing multi-stage liveness degradation
extending the case toward stronger F4 to F5 or F4 to F6 pressure

Community boundary rule

Contributors should preserve the official logic:

route first
explain why primary not secondary
show the first repair move
keep replay outputs understandable
do not turn the demo into an unreadable workflow jungle

For contribution structure, see:

13. Why this demo matters

This demo teaches one clean lesson: