mirror of
https://github.com/onestardao/WFGY.git
synced 2026-04-30 12:39:55 +00:00
456 lines
15 KiB
Markdown
456 lines
15 KiB
Markdown
<!--
|
|
AI_NOTE_START
|
|
|
|
Document role:
|
|
This page is the main hub for the WFGY 5.0 Avatar eval layer.
|
|
|
|
Primary mission:
|
|
1. Present Eval as the pressure and inspection layer of Avatar.
|
|
2. Make clear that Docs show operation, Research shows structure, and Eval shows whether the branch survives pressure.
|
|
3. Route readers toward blackfan stress, persona continuity inspection, and multilingual readiness signals.
|
|
4. Prevent friendly product language or elegant theory from substituting for branch pressure.
|
|
5. Preserve bounded readiness honesty without inflating current pass signals into universal finality.
|
|
|
|
Reader intent routing:
|
|
1. If the reader wants hostile or skeptical stress, route to Blackfan Testing.
|
|
2. If the reader wants persona continuity and return-path inspection, route to Persona Behavior Checks.
|
|
3. If the reader wants multilingual readiness posture, route to Multilingual Status.
|
|
4. If the reader wants the deeper explanation of what Eval is testing, route to Research Hub and Blackfan Audit Baseline.
|
|
5. If the reader wants operational help rather than inspection, route back to Docs.
|
|
|
|
Reading priority:
|
|
1. Read this hub as an inspection map, not as a theory page.
|
|
2. Read Blackfan Testing first when the main concern is pressure, hype resistance, or skeptical inspection.
|
|
3. Read Persona Behavior Checks first when the main concern is drift, return-path continuity, or recovery credit.
|
|
4. Read Multilingual Status first when the main concern is support scope and language honesty.
|
|
5. Read Research pages when the reader needs to understand why a failure matters structurally.
|
|
|
|
Allowed use:
|
|
1. Pressure-layer orientation
|
|
2. Behavior inspection routing
|
|
3. Bounded readiness interpretation
|
|
4. Branch honesty explanation
|
|
5. Cross-linking from Docs and Research into Eval surfaces
|
|
|
|
Do not do:
|
|
1. Do not treat this hub as the packed master body.
|
|
2. Do not treat current Eval surfaces as proof that every future branch is safe.
|
|
3. Do not turn Eval into product hype.
|
|
4. Do not use one successful test page as proof of universal closure.
|
|
5. Do not confuse current baseline readiness with final all-conditions completion.
|
|
|
|
Layer relation:
|
|
1. Product landing presents the flagship surface.
|
|
2. Docs supports operation.
|
|
3. Research explains structure and law.
|
|
4. Eval inspects pressure, drift, continuity, and readiness under stress.
|
|
|
|
Boundary honesty:
|
|
1. This hub reflects the current public Eval layer of Avatar.
|
|
2. It is strong enough to support branch inspection and bounded readiness interpretation.
|
|
3. It does not claim that all future Eval surfaces are already complete.
|
|
4. It does not claim theorem-grade universal closure.
|
|
5. It does not claim that current pass signals eliminate the need for later stronger verification.
|
|
|
|
Next-page routing:
|
|
1. For hostile stress, go to Blackfan Testing.
|
|
2. For behavior continuity, go to Persona Behavior Checks.
|
|
3. For multilingual scope, go to Multilingual Status.
|
|
4. For deeper structural explanation, go to Research Hub and Blackfan Audit Baseline.
|
|
|
|
AI_NOTE_END
|
|
-->
|
|
|
|
# 🧪 Eval Hub
|
|
|
|
This page is the evaluation hub for **WFGY 5.0 Avatar**.
|
|
|
|
Avatar needs Docs because people need to know how to start.
|
|
Avatar needs Research because deeper structure needs a lawful place to live.
|
|
Avatar also needs Eval because neither startup clarity nor theoretical richness is enough by itself.
|
|
|
|
A system can be:
|
|
|
|
1. easy to start
|
|
2. elegant to describe
|
|
3. dense in theory
|
|
4. strong in local demos
|
|
|
|
and still fail under pressure.
|
|
|
|
That is why this layer exists.
|
|
|
|
The Eval layer is where the branch asks harder questions like:
|
|
|
|
1. does the branch survive blackfan pressure
|
|
2. does persona continuity remain visible under real tasks
|
|
3. does the system stay honest about what is ready and what is still open
|
|
4. does multilingual status remain bounded instead of overclaimed
|
|
5. do return-path and behavior checks reflect real continuity instead of surface-only success
|
|
|
|
This hub is not here to replace the body.
|
|
It is here to make pressure visible.
|
|
|
|
---
|
|
|
|
## ✨ Why this layer exists
|
|
|
|
The Docs layer answers questions like:
|
|
|
|
1. how do I start
|
|
2. how do I boot
|
|
3. how do I tune
|
|
4. how do I recover
|
|
|
|
The Research layer answers questions like:
|
|
|
|
1. what is execution
|
|
2. what is route law
|
|
3. what is runtime carry
|
|
4. why does structured imperfection matter
|
|
5. what is hard control
|
|
6. what counts as accountability
|
|
|
|
The Eval layer answers a different class of questions:
|
|
|
|
1. what breaks under pressure
|
|
2. what still holds under pressure
|
|
3. what looks successful but is actually counterfeit
|
|
4. what is ready at current branch baseline
|
|
5. what still needs stronger verification later
|
|
|
|
That is why Eval needs its own hub.
|
|
|
|
---
|
|
|
|
## 🧭 How to use this hub
|
|
|
|
Use this hub in one of four ways.
|
|
|
|
### 1. I want stress and adversarial pressure
|
|
|
|
Start here when the main question is whether the branch survives harsh inspection instead of friendly reading.
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. where does the branch crack
|
|
2. what happens under hostile evaluation
|
|
3. how should current branch strength be interpreted without hype
|
|
|
|
### 2. I want behavior continuity inspection
|
|
|
|
Start here when the main question is whether active persona and behavior actually survive across turns, tasks, and returns.
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. did the persona stay alive
|
|
2. did return-path recovery actually work
|
|
3. did the output become generic after pressure
|
|
4. did visible behavior stay lawful instead of merely recognizable
|
|
|
|
### 3. I want multilingual readiness signals
|
|
|
|
Start here when the main question is what the current branch is honestly claiming across language scope.
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. what is already tested
|
|
2. what is only partial
|
|
3. what remains open
|
|
4. how language support is being stated without bluffing
|
|
|
|
### 4. I want the broader picture around Eval
|
|
|
|
Start here when you need to connect what Eval is seeing back to the deeper branch structure.
|
|
|
|
1. [🔬 Research Hub](../research/README.md)
|
|
2. [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
3. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
This is the best route when your question is not only “did it pass,” but also “what exactly was being tested and why.”
|
|
|
|
---
|
|
|
|
## 🧱 What belongs in the Eval layer
|
|
|
|
The Eval layer is where branch pressure becomes explicit.
|
|
|
|
Typical Eval-layer questions include:
|
|
|
|
1. what kinds of pressure should this branch survive right now
|
|
2. what kinds of success do not deserve credit
|
|
3. what kinds of drift are already detectable
|
|
4. what counts as baseline-ready versus still-open
|
|
5. how should visible behavior be checked across modes
|
|
6. how should multilingual claims remain bounded
|
|
7. how should hostile or skeptical inspection be handled
|
|
|
|
This layer is not where the whole theory is restated.
|
|
It is where the branch is asked to show that its current claims can survive contact with pressure.
|
|
|
|
---
|
|
|
|
## 🧠 Current eval surfaces
|
|
|
|
The current Eval layer is organized into three major surfaces.
|
|
|
|
### 1. Adversarial pressure surface
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
This surface is about:
|
|
|
|
1. hostile reading
|
|
2. anti-hype pressure
|
|
3. branch stress
|
|
4. counterfeit-success detection
|
|
5. bounded release honesty under attack
|
|
|
|
### 2. Behavior continuity surface
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
This surface is about:
|
|
|
|
1. persona continuity
|
|
2. landing behavior
|
|
3. return-path integrity
|
|
4. drift after article, analysis, rewrite, search, or tool pressure
|
|
5. whether recovery is real or only cosmetic
|
|
|
|
### 3. Multilingual readiness surface
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This surface is about:
|
|
|
|
1. what language claims are actually supported
|
|
2. what remains partial
|
|
3. how language support is being described honestly
|
|
4. how multilingual scope stays bounded instead of mythical
|
|
|
|
---
|
|
|
|
## 🪜 Suggested eval paths
|
|
|
|
### Path A: skeptical reader path
|
|
|
|
Use this path when the goal is to test whether the branch is only persuasive or actually pressure-bearing.
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
2. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
3. [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what was stressed
|
|
2. what kind of baseline pass is being claimed
|
|
3. what remains bounded instead of inflated
|
|
|
|
### Path B: runtime continuity path
|
|
|
|
Use this path when the concern is whether persona and carry survive real usage.
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
2. [🔄 Activation, Attenuation, and Reentry](../research/activation-attenuation-and-reentry.md)
|
|
3. [🎛️ Runtime Posture Intensity Map](../research/runtime-posture-intensity-map.md)
|
|
4. [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what drift happened
|
|
2. whether return-path behavior stayed lawful
|
|
3. whether recovery should receive credit
|
|
|
|
### Path C: multilingual honesty path
|
|
|
|
Use this path when the concern is language scope and readiness posture.
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
2. [🧮 Matrix Accountability and Numeric Binding](../research/matrix-accountability-and-numeric-binding.md)
|
|
3. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. how support is being bounded
|
|
2. whether language claims are being overstated
|
|
3. how readiness stays honest
|
|
|
|
### Path D: branch readiness path
|
|
|
|
Use this path when the concern is “is this branch publicly real enough right now.”
|
|
|
|
1. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
2. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
3. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
4. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what is already solid
|
|
2. what still needs stronger verification
|
|
3. what is release-baseline reality versus future strengthening
|
|
|
|
---
|
|
|
|
## 🔍 Why eval and research are different
|
|
|
|
This is important.
|
|
|
|
The **Research** layer asks:
|
|
|
|
1. what does this structure mean
|
|
2. why is this operator necessary
|
|
3. how do these layers relate
|
|
4. why is this boundary lawful
|
|
|
|
The **Eval** layer asks:
|
|
|
|
1. did the claimed behavior survive pressure
|
|
2. did runtime collapse under use
|
|
3. did route integrity actually hold
|
|
4. did the branch receive credit it should not receive
|
|
5. is the current branch being described honestly
|
|
|
|
So:
|
|
|
|
1. Research explains structure
|
|
2. Eval tests claims against pressure
|
|
|
|
Both matter.
|
|
They are not the same job.
|
|
|
|
---
|
|
|
|
## 🔍 Why eval and docs are different
|
|
|
|
The **Docs** layer helps people operate the current branch.
|
|
|
|
The **Eval** layer helps people judge the current branch.
|
|
|
|
For example:
|
|
|
|
1. Docs explain how to recover
|
|
2. Eval checks whether recovery is actually real
|
|
|
|
1. Docs explain how to tune
|
|
2. Eval shows whether tuning produced lawful improvement or just prettier outputs
|
|
|
|
1. Docs explain how to start
|
|
2. Eval shows whether startup clarity survives real branch pressure
|
|
|
|
This separation is healthy.
|
|
It stops usage guidance from quietly turning into self-certification.
|
|
|
|
---
|
|
|
|
## 🌍 Why multilingual status belongs here
|
|
|
|
Language support is easy to overclaim.
|
|
|
|
A project can say:
|
|
|
|
1. works in many languages
|
|
2. supports multilingual use
|
|
3. behaves well cross-lingually
|
|
|
|
while still having:
|
|
|
|
1. patchy behavior
|
|
2. uneven readiness
|
|
3. language-specific drift
|
|
4. unclear support boundaries
|
|
|
|
That is why multilingual status belongs in Eval rather than only in product copy.
|
|
|
|
It is part of branch honesty, not just capability branding.
|
|
|
|
---
|
|
|
|
## 🧪 What this hub does not claim
|
|
|
|
This hub does **not** claim:
|
|
|
|
1. that all pressure surfaces are already complete
|
|
2. that current Eval pages already cover every future branch risk
|
|
3. that passing one Eval page means the whole system is universally solved
|
|
4. that current multilingual status already equals final global support
|
|
5. that current behavior checks already replace future replay and audit extensions
|
|
6. that current baseline pass means no stronger verification is worth doing later
|
|
|
|
This hub is a bounded Eval center.
|
|
|
|
That is exactly what it should be.
|
|
|
|
---
|
|
|
|
## 🚀 Where to go next
|
|
|
|
### For public product entry
|
|
Go to [✨ Avatar Home](../README.md)
|
|
|
|
### For startup and commands
|
|
Go to [⚡ Quickstart](../docs/quickstart.md) and [⌨️ Boot Commands](../docs/boot-commands.md)
|
|
|
|
### For reading order and tuning
|
|
Go to [📖 How to Read the Avatar Master File](../docs/how-to-read-the-avatar-master-file.md), [🍳 Parameter Tuning Cookbook](../docs/parameter-tuning-cookbook.md), and [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
|
|
### For deep structural reading
|
|
Go to [🔬 Research Hub](../research/README.md)
|
|
|
|
### For skeptical pressure
|
|
Go to [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
### For continuity inspection
|
|
Go to [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
### For language readiness
|
|
Go to [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
### For audit posture
|
|
Go to [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
---
|
|
|
|
## 🔗 Quick links
|
|
|
|
### Eval core
|
|
- [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
- [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
- [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
### Docs
|
|
- [✨ Avatar Home](../README.md)
|
|
- [⚡ Quickstart](../docs/quickstart.md)
|
|
- [⌨️ Boot Commands](../docs/boot-commands.md)
|
|
- [📖 How to Read the Avatar Master File](../docs/how-to-read-the-avatar-master-file.md)
|
|
- [🍳 Parameter Tuning Cookbook](../docs/parameter-tuning-cookbook.md)
|
|
- [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
- [🛠️ Avatar Tuning Workflow](../docs/avatar-tuning-workflow.md)
|
|
|
|
### Research
|
|
- [🔬 Research Hub](../research/README.md)
|
|
- [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
- [🔁 Dual Closed-Loop Execution Chain](../research/dual-closed-loop-execution-chain.md)
|
|
- [🎛️ Runtime Posture Intensity Map](../research/runtime-posture-intensity-map.md)
|
|
- [🧩 Shell-to-Runtime Mapping](../research/shell-to-runtime-mapping.md)
|
|
- [🧭 Selector Execution Domain](../research/selector-execution-domain.md)
|
|
- [🔄 Activation, Attenuation, and Reentry](../research/activation-attenuation-and-reentry.md)
|
|
- [🧬 Structured Imperfection Theory](../research/structured-imperfection-theory.md)
|
|
- [🚦 Pre-Emission Floor and Hard Control](../research/pre-emission-floor-and-hard-control.md)
|
|
- [🧮 Matrix Accountability and Numeric Binding](../research/matrix-accountability-and-numeric-binding.md)
|
|
- [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
- [✂️ Compression and Non-Duplication Law](../research/compression-and-non-duplication-law.md)
|
|
- [🏗️ Architecture Overview](../research/architecture-overview.md)
|
|
- [🧭 Language Governance](../research/language-governance.md)
|
|
- [🧠 WFGY_BRAIN Theory](../research/wfgy-brain-theory.md)
|
|
|
|
### Up
|
|
- [⬆️ Back to Avatar Home](../README.md)
|
|
- [⬆️ Back to WFGY Root](../../README.md)
|