mirror of
https://github.com/onestardao/WFGY.git
synced 2026-04-30 12:39:55 +00:00
428 lines
14 KiB
Markdown
428 lines
14 KiB
Markdown
<!--
|
|
AI NOTE: Eval Hub / Reading Contract
|
|
|
|
Purpose:
|
|
1. This page is the main hub for the WFGY 5.0 Avatar evaluation layer.
|
|
2. This page belongs to the eval layer.
|
|
3. This page should be read after Avatar README, docs, or research pages when the reader wants pressure-testing, behavior inspection, or bounded readiness signals.
|
|
|
|
Read this page when:
|
|
1. the user asks where evaluation pages live
|
|
2. the user asks how eval differs from docs and research
|
|
3. the user asks what kinds of pressure the current branch is already checking
|
|
4. the user asks where to inspect blackfan pressure, persona continuity, or multilingual status
|
|
5. the user asks how to think about current readiness without confusing baseline reality with universal finality
|
|
|
|
Do not overclaim:
|
|
1. this hub does not replace the packed master body
|
|
2. this hub does not claim that every future eval surface is already complete
|
|
3. this hub does not claim theorem-grade universal closure
|
|
4. this hub does not claim that passing current eval surfaces means all future branches are automatically safe
|
|
|
|
Primary source anchors:
|
|
1. Avatar/README.md :: public product surface
|
|
2. Avatar/docs/* :: startup, reading, workflow, tuning, and recovery surfaces
|
|
3. Avatar/research/* :: architecture, runtime, route, governance, audit, and reduction law surfaces
|
|
4. Avatar/eval/* :: blackfan pressure, persona behavior, multilingual status, and eval-facing inspection surfaces
|
|
|
|
Routing:
|
|
1. if the reader wants public product entry, go to ../README.md
|
|
2. if the reader wants startup and command syntax, go to ../docs/quickstart.md and ../docs/boot-commands.md
|
|
3. if the reader wants reading order, go to ../docs/how-to-read-the-avatar-master-file.md
|
|
4. if the reader wants tuning and recovery operations, go to ../docs/parameter-tuning-cookbook.md and ../docs/persona-recovery-operations.md
|
|
5. if the reader wants the research overview, go to ../research/README.md
|
|
6. if the reader wants architecture and runtime law, go to ../research/packed-master-structure-map.md and ../research/runtime-posture-intensity-map.md
|
|
-->
|
|
# 🧪 Eval Hub
|
|
|
|
This page is the evaluation hub for **WFGY 5.0 Avatar**.
|
|
|
|
Avatar needs docs because people need to know how to start.
|
|
Avatar needs research because deeper structure needs a lawful place to live.
|
|
Avatar also needs eval because neither startup clarity nor theoretical richness is enough by itself.
|
|
|
|
A system can be:
|
|
|
|
1. easy to start
|
|
2. elegant to describe
|
|
3. dense in theory
|
|
4. strong in local demos
|
|
|
|
and still fail under pressure.
|
|
|
|
That is why this layer exists.
|
|
|
|
The eval layer is where the branch asks harder questions like:
|
|
|
|
1. does the branch survive blackfan pressure
|
|
2. does persona continuity remain visible under real tasks
|
|
3. does the system stay honest about what is ready and what is still open
|
|
4. does multilingual status remain bounded instead of overclaimed
|
|
5. do return-path and behavior checks reflect real continuity instead of surface-only success
|
|
|
|
This hub is not here to replace the body.
|
|
It is here to make pressure visible.
|
|
|
|
---
|
|
|
|
## ✨ Why this layer exists
|
|
|
|
The docs layer answers questions like:
|
|
|
|
1. how do I start
|
|
2. how do I boot
|
|
3. how do I tune
|
|
4. how do I recover
|
|
|
|
The research layer answers questions like:
|
|
|
|
1. what is execution
|
|
2. what is route law
|
|
3. what is runtime carry
|
|
4. why does structured imperfection matter
|
|
5. what is hard control
|
|
6. what counts as accountability
|
|
|
|
The eval layer answers a different class of questions:
|
|
|
|
1. what breaks under pressure
|
|
2. what still holds under pressure
|
|
3. what looks successful but is actually counterfeit
|
|
4. what is ready at current branch baseline
|
|
5. what still needs stronger verification later
|
|
|
|
That is why eval needs its own hub.
|
|
|
|
---
|
|
|
|
## 🧭 How to use this hub
|
|
|
|
Use this hub in one of four ways.
|
|
|
|
### 1. I want stress and adversarial pressure
|
|
|
|
Start here when the main question is whether the branch survives harsh inspection instead of friendly reading.
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. where does the branch crack
|
|
2. what happens under hostile evaluation
|
|
3. how should current branch strength be interpreted without hype
|
|
|
|
### 2. I want behavior continuity inspection
|
|
|
|
Start here when the main question is whether active persona and behavior actually survive across turns, tasks, and returns.
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. did the persona stay alive
|
|
2. did return-path recovery actually work
|
|
3. did the output become generic after pressure
|
|
4. did visible behavior stay lawful instead of merely recognizable
|
|
|
|
### 3. I want multilingual readiness signals
|
|
|
|
Start here when the main question is what the current branch is honestly claiming across language scope.
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This is the right place to begin when your question is:
|
|
|
|
1. what is already tested
|
|
2. what is only partial
|
|
3. what remains open
|
|
4. how language support is being stated without bluffing
|
|
|
|
### 4. I want the broader picture around eval
|
|
|
|
Start here when you need to connect what eval is seeing back to the deeper branch structure.
|
|
|
|
1. [🔬 Research Hub](../research/README.md)
|
|
2. [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
3. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
This is the best route when your question is not only “did it pass,” but also “what exactly was being tested and why.”
|
|
|
|
---
|
|
|
|
## 🧱 What belongs in the eval layer
|
|
|
|
The eval layer is where branch pressure becomes explicit.
|
|
|
|
Typical eval-layer questions include:
|
|
|
|
1. what kinds of pressure should this branch survive right now
|
|
2. what kinds of success do not deserve credit
|
|
3. what kinds of drift are already detectable
|
|
4. what counts as baseline-ready versus still-open
|
|
5. how should visible behavior be checked across modes
|
|
6. how should multilingual claims remain bounded
|
|
7. how should hostile or skeptical inspection be handled
|
|
|
|
This layer is **not** where the whole theory is restated.
|
|
It is where the branch is asked to show that its current claims can survive contact with pressure.
|
|
|
|
---
|
|
|
|
## 🧠 Current eval surfaces
|
|
|
|
The current eval layer is organized into three major surfaces.
|
|
|
|
### 1. Adversarial pressure surface
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
This surface is about:
|
|
|
|
1. hostile reading
|
|
2. anti-hype pressure
|
|
3. branch stress
|
|
4. counterfeit-success detection
|
|
5. bounded release honesty under attack
|
|
|
|
### 2. Behavior continuity surface
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
This surface is about:
|
|
|
|
1. persona continuity
|
|
2. landing behavior
|
|
3. return-path integrity
|
|
4. drift after article, analysis, rewrite, search, or tool pressure
|
|
5. whether recovery is real or only cosmetic
|
|
|
|
### 3. Multilingual readiness surface
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This surface is about:
|
|
|
|
1. what language claims are actually supported
|
|
2. what remains partial
|
|
3. how language support is being described honestly
|
|
4. how multilingual scope stays bounded instead of mythical
|
|
|
|
---
|
|
|
|
## 🪜 Suggested eval paths
|
|
|
|
### Path A: skeptical reader path
|
|
|
|
Use this path when the goal is to test whether the branch is only persuasive or actually pressure-bearing.
|
|
|
|
1. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
2. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
3. [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what was stressed
|
|
2. what kind of baseline pass is being claimed
|
|
3. what remains bounded instead of inflated
|
|
|
|
### Path B: runtime continuity path
|
|
|
|
Use this path when the concern is whether persona and carry survive real usage.
|
|
|
|
1. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
2. [🔄 Activation, Attenuation, and Reentry](../research/activation-attenuation-and-reentry.md)
|
|
3. [🎛️ Runtime Posture Intensity Map](../research/runtime-posture-intensity-map.md)
|
|
4. [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what drift happened
|
|
2. whether return-path behavior stayed lawful
|
|
3. whether recovery should receive credit
|
|
|
|
### Path C: multilingual honesty path
|
|
|
|
Use this path when the concern is language scope and readiness posture.
|
|
|
|
1. [🌍 Multilingual Status](./multilingual-status.md)
|
|
2. [🧮 Matrix Accountability and Numeric Binding](../research/matrix-accountability-and-numeric-binding.md)
|
|
3. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. how support is being bounded
|
|
2. whether language claims are being overstated
|
|
3. how readiness stays honest
|
|
|
|
### Path D: branch readiness path
|
|
|
|
Use this path when the concern is “is this branch publicly real enough right now.”
|
|
|
|
1. [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
2. [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
3. [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
4. [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
This route helps answer:
|
|
|
|
1. what is already solid
|
|
2. what still needs stronger verification
|
|
3. what is release-baseline reality versus future strengthening
|
|
|
|
---
|
|
|
|
## 🔍 Why eval and research are different
|
|
|
|
This is important.
|
|
|
|
The **research** layer asks:
|
|
|
|
1. what does this structure mean
|
|
2. why is this operator necessary
|
|
3. how do these layers relate
|
|
4. why is this boundary lawful
|
|
|
|
The **eval** layer asks:
|
|
|
|
1. did the claimed behavior survive pressure
|
|
2. did runtime collapse under use
|
|
3. did route integrity actually hold
|
|
4. did the branch receive credit it should not receive
|
|
5. is the current branch being described honestly
|
|
|
|
So:
|
|
|
|
1. research explains structure
|
|
2. eval tests claims against pressure
|
|
|
|
Both matter.
|
|
They are not the same job.
|
|
|
|
---
|
|
|
|
## 🔍 Why eval and docs are different
|
|
|
|
The **docs** layer helps people operate the current branch.
|
|
|
|
The **eval** layer helps people judge the current branch.
|
|
|
|
For example:
|
|
|
|
1. docs explain how to recover
|
|
2. eval checks whether recovery is actually real
|
|
|
|
1. docs explain how to tune
|
|
2. eval shows whether tuning produced lawful improvement or just prettier outputs
|
|
|
|
1. docs explain how to start
|
|
2. eval shows whether startup clarity survives real branch pressure
|
|
|
|
This separation is healthy.
|
|
It stops usage guidance from quietly turning into self-certification.
|
|
|
|
---
|
|
|
|
## 🌍 Why multilingual status belongs here
|
|
|
|
Language support is easy to overclaim.
|
|
|
|
A project can say:
|
|
|
|
1. works in many languages
|
|
2. supports multilingual use
|
|
3. behaves well cross-lingually
|
|
|
|
while still having:
|
|
|
|
1. patchy behavior
|
|
2. uneven readiness
|
|
3. language-specific drift
|
|
4. unclear support boundaries
|
|
|
|
That is why multilingual status belongs in eval rather than only in product copy.
|
|
|
|
It is part of branch honesty, not just capability branding.
|
|
|
|
---
|
|
|
|
## 🧪 What this hub does not claim
|
|
|
|
This hub does **not** claim:
|
|
|
|
1. that all pressure surfaces are already complete
|
|
2. that current eval pages already cover every future branch risk
|
|
3. that passing one eval page means the whole system is universally solved
|
|
4. that current multilingual status already equals final global support
|
|
5. that current behavior checks already replace future replay and audit extensions
|
|
6. that current baseline pass means no stronger verification is worth doing later
|
|
|
|
This hub is a bounded eval center.
|
|
|
|
That is exactly what it should be.
|
|
|
|
---
|
|
|
|
## 🚀 Where to go next
|
|
|
|
### For public product entry
|
|
Go to [✨ Avatar Home](../README.md)
|
|
|
|
### For startup and commands
|
|
Go to [⚡ Quickstart](../docs/quickstart.md) and [⌨️ Boot Commands](../docs/boot-commands.md)
|
|
|
|
### For reading order and tuning
|
|
Go to [📖 How to Read the Avatar Master File](../docs/how-to-read-the-avatar-master-file.md), [🍳 Parameter Tuning Cookbook](../docs/parameter-tuning-cookbook.md), and [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
|
|
### For deep structural reading
|
|
Go to [🔬 Research Hub](../research/README.md)
|
|
|
|
### For skeptical pressure
|
|
Go to [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
|
|
### For continuity inspection
|
|
Go to [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
|
|
### For language readiness
|
|
Go to [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
### For audit posture
|
|
Go to [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
|
|
---
|
|
|
|
## 🔗 Quick links
|
|
|
|
### Eval core
|
|
- [🧨 Blackfan Testing](./blackfan-testing.md)
|
|
- [🧭 Persona Behavior Checks](./persona-behavior-checks.md)
|
|
- [🌍 Multilingual Status](./multilingual-status.md)
|
|
|
|
### Docs
|
|
- [✨ Avatar Home](../README.md)
|
|
- [⚡ Quickstart](../docs/quickstart.md)
|
|
- [⌨️ Boot Commands](../docs/boot-commands.md)
|
|
- [📖 How to Read the Avatar Master File](../docs/how-to-read-the-avatar-master-file.md)
|
|
- [🍳 Parameter Tuning Cookbook](../docs/parameter-tuning-cookbook.md)
|
|
- [🔧 Persona Recovery Operations](../docs/persona-recovery-operations.md)
|
|
- [🛠️ Avatar Tuning Workflow](../docs/avatar-tuning-workflow.md)
|
|
|
|
### Research
|
|
- [🔬 Research Hub](../research/README.md)
|
|
- [🗺️ Packed Master Structure Map](../research/packed-master-structure-map.md)
|
|
- [🔁 Dual Closed-Loop Execution Chain](../research/dual-closed-loop-execution-chain.md)
|
|
- [🎛️ Runtime Posture Intensity Map](../research/runtime-posture-intensity-map.md)
|
|
- [🧩 Shell-to-Runtime Mapping](../research/shell-to-runtime-mapping.md)
|
|
- [🧭 Selector Execution Domain](../research/selector-execution-domain.md)
|
|
- [🔄 Activation, Attenuation, and Reentry](../research/activation-attenuation-and-reentry.md)
|
|
- [🧬 Structured Imperfection Theory](../research/structured-imperfection-theory.md)
|
|
- [🚦 Pre-Emission Floor and Hard Control](../research/pre-emission-floor-and-hard-control.md)
|
|
- [🧮 Matrix Accountability and Numeric Binding](../research/matrix-accountability-and-numeric-binding.md)
|
|
- [🧪 Blackfan Audit Baseline](../research/blackfan-audit-baseline.md)
|
|
- [✂️ Compression and Non-Duplication Law](../research/compression-and-non-duplication-law.md)
|
|
- [🏗️ Architecture Overview](../research/architecture-overview.md)
|
|
- [🧭 Language Governance](../research/language-governance.md)
|
|
- [🧠 WFGY_BRAIN Theory](../research/wfgy-brain-theory.md)
|
|
|
|
### Up
|
|
- [⬆️ Back to Avatar Home](../README.md)
|
|
- [⬆️ Back to WFGY Root](../../README.md)
|