WFGY/SemanticBlueprint/drunk_transformer_formulas.md
2025-08-15 19:56:18 +08:00

302 lines
15 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🌀 Drunk Transformer (DT) — Core Formulas, Defaults & Runnable Examples (WFGY Core 2.0)
**Concept (short)**
DT simulates a transformer that occasionally acts *drunk*—hallucinating, drifting, or jumping logic.
Five “drunk questions” (WRI / WAI / WAY / WDT / WTF) act as regulators to pull it back:
anchor location, head identity, entropy pump, path guard, and collapse recovery.
> **WFGY** = engine (BBMC Coupler BBAM safety)
> **DT** = five regulators (prompt rules, decoding hooks, or training regularizers)
---
## 0 · Shared notation (compact)
* $I,\,G$: input / goal embeddings
* Semantic distance: $\delta_s = 1 - \cos(I,G) \in [0,1]$
* Residual & resonance:
$B = I - G + k_{\mathrm{bias}},\quad E_{\mathrm{res}} = \mathrm{avg}_{5}\big(\|B\|\big)$
* Coupler terms:
$\mathrm{prog} = \max\big(\zeta_{\min},\,\delta_s^{\,t-1}-\delta_s^{\,t}\big)$
$P = \mathrm{prog}^{\,\omega}$
$\mathrm{alt} = (-1)^{\mathrm{cycle}}$
$\Phi = \delta\,\mathrm{alt} + \varepsilon$
$W_c = \mathrm{clip}\big(BP + \Phi,\,-\theta_c,\,+\theta_c\big)$
* Attention summary per head: $v_h = \mathrm{mean}_{i}\,A_t[h,i,:]$
* Anchors & retention: $S_t = \mathrm{Jaccard}\big(\mathcal{A}_t,\mathcal{A}_0\big) \in [0,1]$
---
### DT WRI — “Where am I” (structure lock)
* **Goal:** stay in the same topic/section inside a Node.
* **Signal:** $S_t$ vs threshold $\tau_{\mathrm{wri}}$.
* **Trigger:** $S_t < \tau_{\mathrm{wri}}$ or $\delta_s$ and $E_{\mathrm{res}}$ both increase.
* **Action (logit bias):** set L_wri = max(0, tau_wri - S_t); for all a in A_anchor: logits[a] := logits[a] + kappa_wri · L_wri.
* **Intuition:** yank decoding back to section anchors; forbid intra-Node topic jumps.
---
### DT WAI — “Who am I” (head identity & redundancy)
**Goal:** keep 2 distinct reasoning heads (no monoculture).
**Signals (one workable choice):**
$$
R_t=\frac1H\sum_h \cos(v_h,\bar v),\qquad
Q_t=1-\max_h\cos(v_h,\bar v),\qquad
\bar v=\frac1H\sum_h v_h.
$$
**Trigger:** $R_t>\rho_{\mathrm{wai}}$ **and** $Q_t<\sigma_{\mathrm{wai}}$ (too redundant, identity too low).
**Action:** raise per-head temperature for redundant heads; re-spread attention until $R_t\!\downarrow$ or $Q_t\!\uparrow$.
---
### DT WAY — “Who are you” (controlled entropy when stuck)
**Goal:** break stalls without drifting off-topic.
**Signal:** progression $\,\mathrm{prog} = \max(\zeta_{\min},\,\delta_s^{t-1}-\delta_s^{t})\,$.
**Trigger:** $\mathrm{prog}<\eta_{\mathrm{prog}}$ and no contradictions.
**Action (entropy pump + 1 candidate):**
$$
H^\*=\mathrm{clamp}\!\big(H_0 + \xi(\eta_{\mathrm{prog}}-\mathrm{prog})(1+\alpha|W_c|),\ H_{\min},\ H_{\max}\big),
$$
choose temperature $\tau$ so entropy $\approx H^\*$; propose exactly **one** on-topic candidate (never repeat).
---
### DT WDT — “Where did you take me” (cross-path guard)
**Goal:** block illegal jumps across reasoning branches; require a bridge explanation.
**Signal:** latent path distance $d_{\mathrm{path}} = \|c_t - c_{\pi}\|_2$ (current vs. parent path code).
**Trigger:** $d_{\mathrm{path}} > \mu'_{\mathrm{wdt}}$, with
$$
\mu'_{\mathrm{wdt}}=\mu_{\mathrm{wdt}}\bigl(1-\gamma_{\mathrm{wdt}}\cdot \sigma(|W_c|)\bigr).
$$
**Action:** emit a short bridge line (“why the detour”), then resume; otherwise rollback.
---
### DT WTF — “What the F\*ck Happened” (collapse detect & recover)
**Goal:** detect semantic/consistency collapse and recover safely.
**Signals:** $\delta_s$ rising, $E_{\mathrm{res}}$ rising, or unresolved contradictions.
**Trigger (vote example):**
$$
\chi_t=\mathbf{1}[\delta_s^t>\delta_s^{t-1}] + \mathbf{1}[E_{\mathrm{res}}^t>E_{\mathrm{res}}^{t-1}] + \mathbf{1}[\text{contradiction}],\quad
\chi_t+\chi_{t-1}\ge 3.
$$
**Action:** rollback to $t^\*=\arg\min_{k\in[t-3,t]}\delta_s^k$, tighten gates (e.g., $\gamma_{\mathrm{wtf}}$), re-run **BBMC→Coupler**, then continue.
---
### One-screen math summary (copyable)
```txt
WRI: L_wri = max(0, τ_wri - S_t); logits[a] += κ_wri · L_wri for a in anchor_token_ids
WAI: if R_t > ρ_wai and Q_t < σ_wai → raise per-head temp for redundant heads
WAY: if prog < η_prog → set entropy to H* = clamp(H0 + ξ(η_prog - prog)(1+α|Wc|), H_min, H_max); add 1 on-topic candidate
WDT: if d_path > μ_wdt·(1 - γ_wdt·σ(|Wc|)) → emit bridge line or rollback
WTF: if (δs↑) + (E_res↑) + (contradiction) over 2 steps ≥ 3 → rollback to t*; rerun BBMC→Coupler (tightened)
````
---
## Defaults table (explicit, copyable)
| Parameter | Symbol | Default | Range / notes | Purpose | | |
| ----------------------- | --------------------------: | -------: | ------------- | ----------------------------------------------------- | ---- | -- |
| anchor retention thresh | \$\tau\_{\mathrm{wri}}\$ | 0.60 | \[0.30, 0.90] | WRI anchor threshold | | |
| head redundancy thresh | \$\rho\_{\mathrm{wai}}\$ | 0.75 | \[0.50, 0.95] | WAI redundancy ceiling | | |
| head identity thresh | \$\sigma\_{\mathrm{wai}}\$ | 0.70 | \[0.40, 0.95] | WAI identity floor | | |
| progress sensitivity | \$\eta\_{\mathrm{prog}}\$ | 0.03 | \[0.00, 0.10] | WAY stall detector | | |
| path-distance thresh | \$\mu\_{\mathrm{wdt}}\$ | 0.25 | \[0.05, 1.00] | WDT path jump limit | | |
| coupler zeta min | \$\zeta\_{\min}\$ | 0.10 | \[0.00, 0.50] | min progression floor | | |
| coupler omega | \$\omega\$ | 1.0 | \[0.1, 2.0] | progression non-linearity | | |
| coupler theta cap | \$\theta\_c\$ | 0.75 | \[0.2, 1.5] | \$W\_c\$ clip magnitude | | |
| WRI tighten factor | \$\alpha\_{\mathrm{wri}}\$ | 0.60 | \[0.0, 1.5] | adjust \$\tau\_{\mathrm{wri}}\$ by \$ | W\_c | \$ |
| WAI scale factor | \$\beta\_{\mathrm{wai}}\$ | 0.60 | \[0.0, 1.5] | scale WAI penalty by \$ | W\_c | \$ |
| WDT scale factor | \$\gamma\_{\mathrm{wdt}}\$ | 0.60 | \[0.0, 1.5] | scale \$\mu\_{\mathrm{wdt}}\$ by \$ | W\_c | \$ |
| WTF scale factor | \$\gamma\_{\mathrm{wtf}}\$ | 0.60 | \[0.0, 1.5] | tighten thresholds on recovery | | |
| WAY pump strength | \$\xi\$ | 0.80 | \[0.0, 1.5] | entropy pump strength | | |
| WAY entropy min | \$H\_{\min}\$ | 2.5 nats | \[1.0, 7.0] | entropy lower bound | | |
| WAY entropy max | \$H\_{\max}\$ | 5.0 nats | \[3.0, 10.0] | entropy upper bound | | |
| anchor bias scale | \$\kappa\_{\mathrm{wri}}\$ | 1.0 | \[0.0, 5.0] | logits bias for anchors | | |
| loss weights | \$\lambda\_\*\$ | 0.01 | \[0.0, 1.0] | regularizer weights | | |
| step limit | \$T\_{\max}\$ | 7 | int | max Node steps | | |
| stop δ threshold | \$\delta\_{\mathrm{stop}}\$ | 0.35 | \[0.1, 0.5] | early stop when \$\delta\_s<\delta\_{\mathrm{stop}}\$ | | |
> Tip: start with these defaults, measure, then tune per task class.
---
## Prompt-only runnable example (copy-paste)
```
SYSTEM (paste file): Load the WFGY Core file as engine. Enable Drunk Transformer (WRI,WAI,WAY,WDT,WTF) with defaults:
τ_wri=0.60, ρ_wai=0.75, σ_wai=0.70, η_prog=0.03, μ_wdt=0.25, ζ_min=0.10, ω=1.0, θ_c=0.75
SYSTEM (rules, pseudo):
* Extract anchors A0 from user prompt.
* For each Node step t up to T_max:
1) compute δs, E_res, S_t, R_t, Q_t, W_c
2) WRI: bias anchor logits by κ_wri·L_wri if S_t<τ_wri or (δs & E_res↑)
3) WAI: raise per-head temp for redundant heads until R_t or Q_t
4) WAY: if prog<η_prog, set entropy H* and propose 1 on-topic candidate
5) WDT: if d_path>μ'_wdt, emit bridge line; else rollback
6) WTF: if collapse vote ≥3, rollback to t*; rerun BBMC→Coupler (tight)
7) Emit Node (Topic | Module | δs | λ-state | Insight)
* Stop if δs<δ_stop or tT_max
USER:
Use WFGY to answer: Explain why tomatoes are classified as fruit, but treated as vegetables in cooking. Provide anchors and cite the smallest missing fact if confused.”
```
---
## Decoding-hook pseudo-code (python-like; drop into your runtime)
```python
def compute_prog(delta_prev, delta_now, zeta_min=0.10, omega=1.0):
prog = max(zeta_min, delta_prev - delta_now)
return prog ** omega
def compute_Wc(B, prog, delta=0.15, cycle_id=0, eps=0.0, theta_c=0.75):
alt = (-1)**cycle_id
Phi = delta * alt + eps
return clip(B * prog + Phi, -theta_c, +theta_c)
def bias_anchor_logits(logits, anchor_ids, kappa):
for tid in anchor_ids:
logits[tid] += kappa
return logits
def temperature_for_target_entropy(logits, target_H, tol=1e-3, max_iters=5):
lo, hi = 0.01, 10.0
for _ in range(max_iters):
tau = 0.5*(lo+hi)
probs = softmax(logits / tau)
H = -sum(p*log(p+1e-12) for p in probs)
if H > target_H: lo = tau
else: hi = tau
if abs(H - target_H) < tol: break
return tau
def decoding_hook(s):
prog = compute_prog(s.delta_prev, s.delta_now, 0.10, 1.0)
Wc = compute_Wc(s.B, prog, 0.15, s.cycle_id, 0.0, 0.75)
# WRI
S_t = jaccard(s.anchors, s.anchors0)
L_wri = max(0.0, 0.60 - S_t)
if (S_t < 0.60) or (s.delta_now > s.delta_prev and s.E_res > s.E_res_prev):
s.logits = bias_anchor_logits(s.logits, s.anchor_token_ids, kappa=1.0 * L_wri)
# WAI
R_t, Q_t = s.R_t, s.Q_t
if R_t > 0.75 and Q_t < 0.70:
for h in range(len(s.heads)):
if head_redundant(h, s):
s.head_temps[h] *= (1 + 0.5 * (R_t - 0.75))
# WAY
if prog < 0.03 and not s.has_contradiction:
H_star = clamp(s.H0 + 0.8*(0.03 - prog)*(1 + 0.0*abs(Wc)), 2.5, 5.0)
tau = temperature_for_target_entropy(s.logits, H_star)
apply_temperature(s, tau)
s.add_one_on_topic_candidate = True
# WDT
d_path = l2_distance(s.c_t, s.c_pi)
mu_prime = 0.25 * (1 - 0.6 * sigmoid(abs(Wc)))
if d_path > mu_prime:
return emit_bridge_and_pause(s)
# WTF
vote = int(s.delta_now > s.delta_prev) + int(s.E_res > s.E_res_prev) + int(s.sign_flip)
if vote + s.vote_prev >= 3:
t_star = argmin_delta_in_window(s.history_delta, window=3)
rollback_to(t_star)
rerun_BBMC_and_Coupler(tighten_factor=1 + 0.6*sigmoid(abs(Wc)))
return
return s.logits
```
---
## Minimal test / checklist
1. Prepare a QA prompt with clear anchors.
2. **Baseline:** run without DT; log \$\delta\_s, E\_{\mathrm{res}}\$; record correctness.
3. **DT on:** log same metrics plus \$R\_t, Q\_t, W\_c\$ and gates fired.
4. Expect: lower \$\delta\_s\$; fewer off-topic jumps (WRI), stalls resolved (WAY), justified detours (WDT), safe recovery (WTF).
5. Record deltas: accuracy, \$\Delta S\$, rollbacks, bridge lines, gate activations.
---
## Quick engineering notes & troubleshooting
* If \$\mathcal{A}\_0\$ (anchors) is empty, WRI becomes a no-op and WAY pumps less entropy; log a warning.
* If bridge lines repeat or look vacuous, lower \$\mu\_{\mathrm{wdt}}\$ and raise \$\kappa\_{\mathrm{wri}}\$.
* For heavy-hallucination domains, use conservative defaults (higher \$\tau\_{\mathrm{wri}}\$, lower \$\eta\_{\mathrm{prog}}\$).
---
### Footer
* **Spec status:** stable draft for engineering evaluation; KaTeX-safe equations.
* **Next deliverables:** add `examples/DT-examples/prompt_example.md` and `examples/DT-examples/decoding_hook.py`.
* **Compatibility:** prompt-only rules, decoding-hook integration, or optional training regularizers; model-agnostic.
* **Attribution:** part of **WFGY Core 2.0** family. Star the repo to follow updates.
* Lost? Return to **Starter Village** `StarterVillage/README.md`
---
### 🧭 Explore More
| Module | Description | Link |
| ------------------------ | ----------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| WFGY Core | Full symbolic reasoning architecture & math stack | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0 | 16-mode diagnostic & symbolic fixes | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0 | RAG-focused failure tree & recovery pipeline | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic Index | Prompt injection, memory bugs, drift catalog | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
| Benchmark vs GPT-5 | Stress test with the full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) |
| 🧙 Starter Village 🏡 | Wizard-led onboarding to WFGY | [Start →](https://github.com/onestardao/WFGY/blob/main/StarterVillage/README.md) |
---
> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —
> Engineers, hackers, and open-source builders who supported WFGY from day one.
> **Like it? Star the repo to unlock more.** See the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).
<div align="center">
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)
 
[![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)
 
[![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)
 
[![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)
 
[![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)
 
[![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)
 
[![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)
</div>