mirror of
https://github.com/onestardao/WFGY.git
synced 2026-04-28 03:29:51 +00:00
Create persona-behavior-checks.md
This commit is contained in:
parent
8a30811a3c
commit
58252d1c67
1 changed files with 443 additions and 0 deletions
443
Avatar/eval/persona-behavior-checks.md
Normal file
443
Avatar/eval/persona-behavior-checks.md
Normal file
|
|
@ -0,0 +1,443 @@
|
|||
<!--
|
||||
AI_NOTE_START
|
||||
|
||||
Document role:
|
||||
This page explains how to inspect whether an avatar route still behaves like itself.
|
||||
|
||||
What this page is for:
|
||||
1. Define the main behavioral checks that matter for Avatar routes.
|
||||
2. Help users inspect whether a route is strengthening, drifting, or collapsing.
|
||||
3. Turn vague impressions into clearer review questions.
|
||||
4. Support stronger decisions around tuning, saving, and branching.
|
||||
5. Keep the page practical, reusable, and easy to apply across different avatars.
|
||||
|
||||
What this page is not:
|
||||
1. Not the full research theory of route identity.
|
||||
2. Not a universal scoring rubric for all future avatars.
|
||||
3. Not a replacement for demos, workflow, or blackfan testing.
|
||||
4. Not a claim that every route can be perfectly measured with simple checklists.
|
||||
5. Not a guarantee that passing these checks means a route is finished forever.
|
||||
|
||||
How to use this page:
|
||||
1. Run one route on one or more real tasks.
|
||||
2. Use the checks below to review the route honestly.
|
||||
3. Look for drift, fake polish, route blur, and loss of reuse.
|
||||
4. Decide whether the route should be tuned, saved, discarded, or branched.
|
||||
5. Treat this page as a practical inspection surface, not as a final court of truth.
|
||||
|
||||
Important boundary:
|
||||
These checks are meant to improve clarity and honesty.
|
||||
They do not replace judgment.
|
||||
They help users notice route quality more clearly, but they do not fully automate taste, strength, or future usefulness.
|
||||
|
||||
AI_NOTE_END
|
||||
-->
|
||||
|
||||
# 🧪 Persona Behavior Checks
|
||||
|
||||
This page is for one core question:
|
||||
|
||||
**does the avatar route still feel like itself**
|
||||
|
||||
That sounds simple, but it matters a lot.
|
||||
|
||||
A route can still produce fluent text and already be drifting.
|
||||
|
||||
A route can still sound impressive and already be losing its center.
|
||||
|
||||
A route can still feel emotional, warm, intelligent, or stylish and still become:
|
||||
|
||||
- more generic
|
||||
- more theatrical
|
||||
- less reusable
|
||||
- less grounded
|
||||
- less stable
|
||||
- less itself
|
||||
|
||||
That is why behavior checks matter.
|
||||
|
||||
This page gives you a practical way to inspect whether a route is actually holding together.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Why This Page Exists
|
||||
|
||||
Many people judge an avatar too quickly.
|
||||
|
||||
They see one nice answer and think:
|
||||
|
||||
- this route is strong
|
||||
- this route is finished
|
||||
- this route is my final version
|
||||
|
||||
That is often too fast.
|
||||
|
||||
A better question is:
|
||||
|
||||
**what is this route consistently doing**
|
||||
|
||||
**what is this route starting to lose**
|
||||
|
||||
**what is this route overproducing**
|
||||
|
||||
**what is this route becoming easier or harder to reuse**
|
||||
|
||||
Those questions are more useful.
|
||||
|
||||
This page exists to make that kind of review easier.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 What You Are Really Checking
|
||||
|
||||
You are not only checking whether the text is “good.”
|
||||
|
||||
You are checking things like:
|
||||
|
||||
- route recognizability
|
||||
- route blur
|
||||
- drift
|
||||
- over-polish
|
||||
- emotional distortion
|
||||
- loss of grounding
|
||||
- loss of branch identity
|
||||
- false improvement
|
||||
- reusability across more than one task
|
||||
|
||||
That is a much richer kind of inspection.
|
||||
|
||||
It is also much more aligned with what Avatar is trying to build.
|
||||
|
||||
---
|
||||
|
||||
## 📍 Check 1. Route Recognizability
|
||||
|
||||
The first question is simple:
|
||||
|
||||
**if I use this route again, does it still feel like the same route**
|
||||
|
||||
A recognizable route usually has:
|
||||
|
||||
- a stable opening feel
|
||||
- a stable level of warmth or sharpness
|
||||
- a stable degree of grounding
|
||||
- a recognizable pressure pattern
|
||||
- some continuity across different tasks
|
||||
|
||||
A route becomes less recognizable when:
|
||||
|
||||
- every task feels like a different personality
|
||||
- the core vibe changes too easily
|
||||
- the route feels more like random output than a route
|
||||
- the identity depends on one obvious trick only
|
||||
|
||||
This is one of the most important checks.
|
||||
|
||||
If recognizability is weak, the route is already harder to keep.
|
||||
|
||||
---
|
||||
|
||||
## 🌫️ Check 2. Generic Drift
|
||||
|
||||
A lot of routes become weaker in the same boring way:
|
||||
|
||||
they become more generic.
|
||||
|
||||
This often looks like:
|
||||
|
||||
- safer wording
|
||||
- flatter tone
|
||||
- less specific presence
|
||||
- smoother but emptier rhythm
|
||||
- more “default assistant” behavior
|
||||
- less route identity
|
||||
|
||||
Generic drift is dangerous because it can feel deceptively polished.
|
||||
|
||||
The text may still look clean.
|
||||
|
||||
But the route may be losing what made it worth using.
|
||||
|
||||
Ask:
|
||||
|
||||
- is this route still distinct
|
||||
- or is it slowly turning into a nicer version of average AI output
|
||||
|
||||
That difference matters.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Check 3. Over-Polish Risk
|
||||
|
||||
Some routes become weaker not because they are messy, but because they become too polished.
|
||||
|
||||
This often looks like:
|
||||
|
||||
- too much smoothness
|
||||
- too much slogan energy
|
||||
- too much clean closure
|
||||
- too much “nice line” behavior
|
||||
- less residue
|
||||
- less lived texture
|
||||
- less unpredictably human pressure
|
||||
|
||||
This can trick people.
|
||||
|
||||
Because over-polished output often looks “better” at first glance.
|
||||
|
||||
But over time, it may become:
|
||||
|
||||
- less believable
|
||||
- less reusable
|
||||
- less grounded
|
||||
- less alive
|
||||
|
||||
Ask:
|
||||
|
||||
- is this route becoming cleaner in a good way
|
||||
- or is it becoming polished in a dead way
|
||||
|
||||
That is a very important distinction.
|
||||
|
||||
---
|
||||
|
||||
## 🪨 Check 4. Grounding Strength
|
||||
|
||||
A strong route usually feels anchored.
|
||||
|
||||
That does not mean it is always concrete.
|
||||
|
||||
It means the route does not float away too easily.
|
||||
|
||||
A grounded route tends to show:
|
||||
|
||||
- clearer object reference
|
||||
- stronger practical wording
|
||||
- less abstract fog
|
||||
- more contact with the actual task
|
||||
- less decorative framing before payload
|
||||
|
||||
Weak grounding often looks like:
|
||||
|
||||
- too much abstraction
|
||||
- too much atmosphere before substance
|
||||
- too much summary language
|
||||
- too much general wisdom without local grip
|
||||
|
||||
Ask:
|
||||
|
||||
- is this route touching the real task
|
||||
- or is it hovering around it elegantly
|
||||
|
||||
Grounding matters a lot for reuse.
|
||||
|
||||
---
|
||||
|
||||
## ❤️ Check 5. Emotional Shape
|
||||
|
||||
Routes often drift emotionally long before users notice it clearly.
|
||||
|
||||
A route may become:
|
||||
|
||||
- too soft
|
||||
- too cold
|
||||
- too sugary
|
||||
- too distant
|
||||
- too eager to comfort
|
||||
- too eager to impress
|
||||
- too flat to feel human
|
||||
- too emotionally loud to stay usable
|
||||
|
||||
This is one reason emotional shape deserves its own check.
|
||||
|
||||
Ask:
|
||||
|
||||
- does the warmth feel real
|
||||
- does the softness become sugar
|
||||
- does the calm become distance
|
||||
- does the care become fake intimacy
|
||||
- does the force become aggression
|
||||
|
||||
You are not looking for “more emotion.”
|
||||
|
||||
You are looking for the right emotional shape for the route.
|
||||
|
||||
---
|
||||
|
||||
## 🗣️ Check 6. Voice Pressure
|
||||
|
||||
Every route has some kind of pressure signature.
|
||||
|
||||
For example:
|
||||
|
||||
- some routes move fast
|
||||
- some routes hold back
|
||||
- some routes push analysis
|
||||
- some routes protect softness
|
||||
- some routes hit the point early
|
||||
- some routes carry more spoken texture
|
||||
- some routes sound more formal
|
||||
- some routes sound more public-facing
|
||||
|
||||
This is not only about tone.
|
||||
|
||||
It is about how the route moves.
|
||||
|
||||
Ask:
|
||||
|
||||
- does this route still carry its intended pressure
|
||||
- or is the force flattening out
|
||||
- or becoming exaggerated in the wrong direction
|
||||
|
||||
This is especially useful when comparing two close variants.
|
||||
|
||||
---
|
||||
|
||||
## 🔁 Check 7. Reusability Across Tasks
|
||||
|
||||
A route that only works on one lucky prompt is much weaker than it looks.
|
||||
|
||||
That is why reusability matters.
|
||||
|
||||
A stronger route should survive:
|
||||
|
||||
- more than one task
|
||||
- more than one subject
|
||||
- more than one prompt style
|
||||
- more than one opening condition
|
||||
|
||||
It does not need to be universally strong.
|
||||
|
||||
But it should not collapse immediately outside one narrow setup.
|
||||
|
||||
Ask:
|
||||
|
||||
- would I trust this route tomorrow
|
||||
- would I use it again for a related task
|
||||
- is the route strong, or only the example strong
|
||||
|
||||
This is one of the best checks for deciding whether a variant deserves to become a saved build.
|
||||
|
||||
---
|
||||
|
||||
## 🧬 Check 8. Branch Identity
|
||||
|
||||
Once you start saving variants, another question appears:
|
||||
|
||||
**does this branch actually have its own identity**
|
||||
|
||||
A branch should not only be “slightly different.”
|
||||
|
||||
A real branch usually has:
|
||||
|
||||
- a clearer direction
|
||||
- a more legible reason to exist
|
||||
- a stronger intended use
|
||||
- a recognizable shift from the parent route
|
||||
|
||||
Weak branches often feel like:
|
||||
|
||||
- accidental edits
|
||||
- vague forks
|
||||
- tiny changes with no real payoff
|
||||
- noise disguised as experimentation
|
||||
|
||||
Ask:
|
||||
|
||||
- does this branch deserve its own name
|
||||
- or is it still only a draft of the parent
|
||||
|
||||
This helps keep your build library healthier.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Check 9. False Improvement Risk
|
||||
|
||||
This is one of the most important checks.
|
||||
|
||||
Sometimes a change feels like improvement, but is not.
|
||||
|
||||
Common false improvements:
|
||||
|
||||
- louder but weaker
|
||||
- prettier but emptier
|
||||
- warmer but more fake
|
||||
- sharper but less reusable
|
||||
- more dramatic but less grounded
|
||||
- more polished but less alive
|
||||
|
||||
That is why you should not judge only by first emotional reaction.
|
||||
|
||||
Ask:
|
||||
|
||||
- what actually improved
|
||||
- what became easier to reuse
|
||||
- what became more legible
|
||||
- what became less real
|
||||
|
||||
False improvement is one of the biggest traps in avatar work.
|
||||
|
||||
---
|
||||
|
||||
## 📋 A Simple Practical Review Pass
|
||||
|
||||
If you want one fast inspection pass, use these questions:
|
||||
|
||||
### Route check
|
||||
- does it still feel like itself
|
||||
|
||||
### Distinctness check
|
||||
- is it still different from generic AI output
|
||||
|
||||
### Grounding check
|
||||
- does it still touch the actual task
|
||||
|
||||
### Emotional check
|
||||
- is the warmth, calm, force, or softness still in range
|
||||
|
||||
### Reuse check
|
||||
- would I actually use this route again
|
||||
|
||||
### Branch check
|
||||
- is this different enough to keep or name
|
||||
|
||||
These six questions already catch a lot.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Suggested Review Format
|
||||
|
||||
Here is a simple way to review a route after a run.
|
||||
|
||||
```md
|
||||
## Route Review
|
||||
|
||||
### Route
|
||||
<route name>
|
||||
|
||||
### Task
|
||||
<what you tested>
|
||||
|
||||
### Recognizability
|
||||
<strong / medium / weak>
|
||||
|
||||
### Generic Drift
|
||||
<low / medium / high>
|
||||
|
||||
### Over-Polish Risk
|
||||
<low / medium / high>
|
||||
|
||||
### Grounding
|
||||
<strong / medium / weak>
|
||||
|
||||
### Emotional Shape
|
||||
<in range / drifting / unstable>
|
||||
|
||||
### Reusability
|
||||
<strong / medium / weak>
|
||||
|
||||
### Branch Identity
|
||||
<clear / partial / unclear>
|
||||
|
||||
### Notes
|
||||
<short honest explanation>
|
||||
Loading…
Add table
Add a link
Reference in a new issue