WFGY/archive/SemanticBlueprint_archive/drunk_transformer_formulas.md

14 KiB
Raw Blame History

🌀 Drunk Transformer (DT) — Core Formulas, Defaults & Runnable Examples (WFGY Core 2.0)

Concept (short)
DT simulates a transformer that occasionally acts drunk—hallucinating, drifting, or jumping logic.
Five “drunk questions” (WRI / WAI / WAY / WDT / WTF) act as regulators to pull it back:
anchor location, head identity, entropy pump, path guard, and collapse recovery.

WFGY = engine (BBMC Coupler BBAM safety)
DT = five regulators (prompt rules, decoding hooks, or training regularizers)


0 · Shared notation (compact)

  • I,\,G: input / goal embeddings

  • Semantic distance: \delta_s = 1 - \cos(I,G) \in [0,1]

  • Residual & resonance:
    B = I - G + k_{\mathrm{bias}},\quad E_{\mathrm{res}} = \mathrm{avg}_{5}\big(\|B\|\big)

  • Coupler terms:
    \mathrm{prog} = \max\big(\zeta_{\min},\,\delta_s^{\,t-1}-\delta_s^{\,t}\big)
    P = \mathrm{prog}^{\,\omega}
    \mathrm{alt} = (-1)^{\mathrm{cycle}}
    \Phi = \delta\,\mathrm{alt} + \varepsilon
    W_c = \mathrm{clip}\big(BP + \Phi,\,-\theta_c,\,+\theta_c\big)

  • Attention summary per head: v_h = \mathrm{mean}_{i}\,A_t[h,i,:]

  • Anchors & retention: S_t = \mathrm{Jaccard}\big(\mathcal{A}_t,\mathcal{A}_0\big) \in [0,1]


DT WRI — “Where am I” (structure lock)

  • Goal: stay in the same topic/section inside a Node.
  • Signal: S_t vs threshold \tau_{\mathrm{wri}}.
  • Trigger: S_t < \tau_{\mathrm{wri}} or \delta_s and E_{\mathrm{res}} both increase.
  • Action (logit bias): set L_wri = max(0, tau_wri - S_t); for all a in A_anchor: logits[a] := logits[a] + kappa_wri · L_wri.
  • Intuition: yank decoding back to section anchors; forbid intra-Node topic jumps.

DT WAI — “Who am I” (head identity & redundancy)

Goal: keep ≥2 distinct reasoning heads (no monoculture).
Signals (one workable choice):

R_t=\frac1H\sum_h \cos(v_h,\bar v),\qquad Q_t=1-\max_h\cos(v_h,\bar v),\qquad \bar v=\frac1H\sum_h v_h.

Trigger: R_t>\rho_{\mathrm{wai}} and Q_t<\sigma_{\mathrm{wai}} (too redundant, identity too low).
Action: raise per-head temperature for redundant heads; re-spread attention until R_t\!\downarrow or Q_t\!\uparrow.


DT WAY — “Who are you” (controlled entropy when stuck)

Goal: break stalls without drifting off-topic.
Signal: progression \,\mathrm{prog} = \max(\zeta_{\min},\,\delta_s^{t-1}-\delta_s^{t})\,.
Trigger: \mathrm{prog}<\eta_{\mathrm{prog}} and no contradictions.
Action (entropy pump + 1 candidate):

H^*=\mathrm{clamp}!\big(H_0 + \xi(\eta_{\mathrm{prog}}-\mathrm{prog})(1+\alpha|W_c|),\ H_{\min},\ H_{\max}\big),

choose temperature \tau so entropy \approx H^\*; propose exactly one on-topic candidate (never repeat).


DT WDT — “Where did you take me” (cross-path guard)

Goal: block illegal jumps across reasoning branches; require a “bridge” explanation.
Signal: latent path distance d_{\mathrm{path}} = \|c_t - c_{\pi}\|_2 (current vs. parent path code).
Trigger: d_{\mathrm{path}} > \mu'_{\mathrm{wdt}}, with

\mu'{\mathrm{wdt}}=\mu{\mathrm{wdt}}\bigl(1-\gamma_{\mathrm{wdt}}\cdot \sigma(|W_c|)\bigr).

Action: emit a short bridge line (“why the detour”), then resume; otherwise rollback.


DT WTF — “What the F*ck Happened” (collapse detect & recover)

Goal: detect semantic/consistency collapse and recover safely.
Signals: \delta_s rising, E_{\mathrm{res}} rising, or unresolved contradictions.
Trigger (vote example):

\chi_t=\mathbf{1}[\delta_s^t>\delta_s^{t-1}] + \mathbf{1}[E_{\mathrm{res}}^t>E_{\mathrm{res}}^{t-1}] + \mathbf{1}[\text{contradiction}],\quad \chi_t+\chi_{t-1}\ge 3.

Action: rollback to t^\*=\arg\min_{k\in[t-3,t]}\delta_s^k, tighten gates (e.g., \gamma_{\mathrm{wtf}}), re-run BBMC→Coupler, then continue.


One-screen math summary (copyable)

WRI: L_wri = max(0, τ_wri - S_t);  logits[a] += κ_wri · L_wri  for a in anchor_token_ids
WAI: if R_t > ρ_wai and Q_t < σ_wai → raise per-head temp for redundant heads
WAY: if prog < η_prog → set entropy to H* = clamp(H0 + ξ(η_prog - prog)(1+α|Wc|), H_min, H_max); add 1 on-topic candidate
WDT: if d_path > μ_wdt·(1 - γ_wdt·σ(|Wc|)) → emit bridge line or rollback
WTF: if (δs↑) + (E_res↑) + (contradiction) over 2 steps ≥ 3 → rollback to t*; rerun BBMC→Coupler (tightened)

Defaults table (explicit, copyable)

Parameter Symbol Default Range / notes Purpose
anchor retention thresh $\tau_{\mathrm{wri}}$ 0.60 [0.30, 0.90] WRI anchor threshold
head redundancy thresh $\rho_{\mathrm{wai}}$ 0.75 [0.50, 0.95] WAI redundancy ceiling
head identity thresh $\sigma_{\mathrm{wai}}$ 0.70 [0.40, 0.95] WAI identity floor
progress sensitivity $\eta_{\mathrm{prog}}$ 0.03 [0.00, 0.10] WAY stall detector
path-distance thresh $\mu_{\mathrm{wdt}}$ 0.25 [0.05, 1.00] WDT path jump limit
coupler zeta min $\zeta_{\min}$ 0.10 [0.00, 0.50] min progression floor
coupler omega $\omega$ 1.0 [0.1, 2.0] progression non-linearity
coupler theta cap $\theta_c$ 0.75 [0.2, 1.5] $W_c$ clip magnitude
WRI tighten factor $\alpha_{\mathrm{wri}}$ 0.60 [0.0, 1.5] adjust $\tau_{\mathrm{wri}}$ by $ W_c $
WAI scale factor $\beta_{\mathrm{wai}}$ 0.60 [0.0, 1.5] scale WAI penalty by $ W_c $
WDT scale factor $\gamma_{\mathrm{wdt}}$ 0.60 [0.0, 1.5] scale $\mu_{\mathrm{wdt}}$ by $ W_c $
WTF scale factor $\gamma_{\mathrm{wtf}}$ 0.60 [0.0, 1.5] tighten thresholds on recovery
WAY pump strength $\xi$ 0.80 [0.0, 1.5] entropy pump strength
WAY entropy min $H_{\min}$ 2.5 nats [1.0, 7.0] entropy lower bound
WAY entropy max $H_{\max}$ 5.0 nats [3.0, 10.0] entropy upper bound
anchor bias scale $\kappa_{\mathrm{wri}}$ 1.0 [0.0, 5.0] logits bias for anchors
loss weights $\lambda_*$ 0.01 [0.0, 1.0] regularizer weights
step limit $T_{\max}$ 7 int max Node steps
stop δ threshold $\delta_{\mathrm{stop}}$ 0.35 [0.1, 0.5] early stop when $\delta_s<\delta_{\mathrm{stop}}$

Tip: start with these defaults, measure, then tune per task class.


Prompt-only runnable example (copy-paste)

SYSTEM (paste file): Load the WFGY Core file as engine. Enable Drunk Transformer (WRI,WAI,WAY,WDT,WTF) with defaults:
τ_wri=0.60, ρ_wai=0.75, σ_wai=0.70, η_prog=0.03, μ_wdt=0.25, ζ_min=0.10, ω=1.0, θ_c=0.75

SYSTEM (rules, pseudo):

* Extract anchors A0 from user prompt.
* For each Node step t up to T_max:
  1) compute δs, E_res, S_t, R_t, Q_t, W_c
  2) WRI: bias anchor logits by κ_wri·L_wri if S_t<τ_wri or (δs↑ & E_res↑)
  3) WAI: raise per-head temp for redundant heads until R_t↓ or Q_t↑
  4) WAY: if prog<η_prog, set entropy ≈ H* and propose 1 on-topic candidate
  5) WDT: if d_path>μ'_wdt, emit bridge line; else rollback
  6) WTF: if collapse vote ≥3, rollback to t*; rerun BBMC→Coupler (tight)
  7) Emit Node (Topic | Module | δs | λ-state | Insight)
* Stop if δs<δ_stop or t≥T_max

USER:
Use WFGY to answer: “Explain why tomatoes are classified as fruit, but treated as vegetables in cooking. Provide anchors and cite the smallest missing fact if confused.”

Decoding-hook pseudo-code (python-like; drop into your runtime)

def compute_prog(delta_prev, delta_now, zeta_min=0.10, omega=1.0):
    prog = max(zeta_min, delta_prev - delta_now)
    return prog ** omega

def compute_Wc(B, prog, delta=0.15, cycle_id=0, eps=0.0, theta_c=0.75):
    alt = (-1)**cycle_id
    Phi = delta * alt + eps
    return clip(B * prog + Phi, -theta_c, +theta_c)

def bias_anchor_logits(logits, anchor_ids, kappa):
    for tid in anchor_ids:
        logits[tid] += kappa
    return logits

def temperature_for_target_entropy(logits, target_H, tol=1e-3, max_iters=5):
    lo, hi = 0.01, 10.0
    for _ in range(max_iters):
        tau = 0.5*(lo+hi)
        probs = softmax(logits / tau)
        H = -sum(p*log(p+1e-12) for p in probs)
        if H > target_H: lo = tau
        else: hi = tau
        if abs(H - target_H) < tol: break
    return tau

def decoding_hook(s):
    prog = compute_prog(s.delta_prev, s.delta_now, 0.10, 1.0)
    Wc = compute_Wc(s.B, prog, 0.15, s.cycle_id, 0.0, 0.75)

    # WRI
    S_t = jaccard(s.anchors, s.anchors0)
    L_wri = max(0.0, 0.60 - S_t)
    if (S_t < 0.60) or (s.delta_now > s.delta_prev and s.E_res > s.E_res_prev):
        s.logits = bias_anchor_logits(s.logits, s.anchor_token_ids, kappa=1.0 * L_wri)

    # WAI
    R_t, Q_t = s.R_t, s.Q_t
    if R_t > 0.75 and Q_t < 0.70:
        for h in range(len(s.heads)):
            if head_redundant(h, s):
                s.head_temps[h] *= (1 + 0.5 * (R_t - 0.75))

    # WAY
    if prog < 0.03 and not s.has_contradiction:
        H_star = clamp(s.H0 + 0.8*(0.03 - prog)*(1 + 0.0*abs(Wc)), 2.5, 5.0)
        tau = temperature_for_target_entropy(s.logits, H_star)
        apply_temperature(s, tau)
        s.add_one_on_topic_candidate = True

    # WDT
    d_path = l2_distance(s.c_t, s.c_pi)
    mu_prime = 0.25 * (1 - 0.6 * sigmoid(abs(Wc)))
    if d_path > mu_prime:
        return emit_bridge_and_pause(s)

    # WTF
    vote = int(s.delta_now > s.delta_prev) + int(s.E_res > s.E_res_prev) + int(s.sign_flip)
    if vote + s.vote_prev >= 3:
        t_star = argmin_delta_in_window(s.history_delta, window=3)
        rollback_to(t_star)
        rerun_BBMC_and_Coupler(tighten_factor=1 + 0.6*sigmoid(abs(Wc)))
        return

    return s.logits

Minimal test / checklist

  1. Prepare a QA prompt with clear anchors.
  2. Baseline: run without DT; log $\delta_s, E_{\mathrm{res}}$; record correctness.
  3. DT on: log same metrics plus $R_t, Q_t, W_c$ and gates fired.
  4. Expect: lower $\delta_s$; fewer off-topic jumps (WRI), stalls resolved (WAY), justified detours (WDT), safe recovery (WTF).
  5. Record deltas: accuracy, $\Delta S$, rollbacks, bridge lines, gate activations.

Quick engineering notes & troubleshooting

  • If $\mathcal{A}_0$ (anchors) is empty, WRI becomes a no-op and WAY pumps less entropy; log a warning.
  • If bridge lines repeat or look vacuous, lower $\mu_{\mathrm{wdt}}$ and raise $\kappa_{\mathrm{wri}}$.
  • For heavy-hallucination domains, use conservative defaults (higher $\tau_{\mathrm{wri}}$, lower $\eta_{\mathrm{prog}}$).

  • Spec status: stable draft for engineering evaluation; KaTeX-safe equations.
  • Next deliverables: add examples/DT-examples/prompt_example.md and examples/DT-examples/decoding_hook.py.
  • Compatibility: prompt-only rules, decoding-hook integration, or optional training regularizers; model-agnostic.
  • Attribution: part of WFGY Core 2.0 family. Star the repo to follow updates.
  • Lost? Return to Starter VillageStarterVillage/README.md

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars