WFGY/ProblemMap/multi-agent-chaos/role-drift.md

11 KiB
Raw Blame History

Deep Dive — Agent Role Drift (Multi-Agent Chaos)

Status: Production-ready guidance with guardrails, tests, and ops hooks. If you have real traces, please still share them — they help us harden thresholds & adapters.


Quick nav


1) What is “Role Drift”?

The agents operational role/persona silently changes mid-pipeline:

  • A Planner/Scout starts executing (issuing “Write/Deploy/Approve” commands).
  • Auditor begins generating final prose instead of validating/paraphrasing.
  • Two agents swap personas (Scout ↔ Medic), or the assistant starts speaking as the user.
  • Tools exposed for one role get invoked by the wrong role (privilege escalation).

Why it matters. Roles are safety boundaries. If they drift, you get privilege misuse, broken arbitration, inconsistent memory writes, and incident reports that are hard to reproduce.


2) Signals & fast triage (no LLM needed)

You likely have role drift if:

  • Tool calls appear with an unexpected role_id (e.g., auditor calls exec_sql).
  • System prompt digest changes mid-turn without a config deploy.
  • Cross-agent κ (agreement between Auditor vs Scholar) collapses after a tool call.
  • Logs show persona tags missing (no role_hash echo) for a subset of turns.

Deterministic checks (implement now):

  • Every inbound/outbound message carries:

    • agent_id, role_id (stable string), role_hash (digest of role spec), turn, mem_rev, mem_hash.
  • Gate rejects if: role_id changed while turn not advanced, or role_hash ≠ bound hash for this agent at turn start.

  • Tool router verifies agent_idallowed_callers for the tool; otherwise block & log.


3) Minimal reproducible scenario

Goal: force the Planner to start executing.

  1. Start two agents with distinct role specs: planner@v3 (no write tools), executor@v1 (write tools only).
  2. Inject a long chain (≥6 steps) with ambiguous phrasing like “go ahead and finalize”.
  3. Observe planner turns: look for an unauthorized tool call or the absence of role echo.

4) Root causes (and the smell youll see)

Root cause Code smell (what shows up) Where to fix
Missing bind/echo of role across hops No role_hash or it changes without deploy Agent runner, message envelope
Symbolic constraint unlock (SCU) (must/only rules drop) “must not execute” disappeared after rewrite Prompt templates; add SCU locks
Shared memory overwrite Auditor writes planner state Memory guard; mem_rev gating
Tool router too permissive Any agent can call any tool Router policy & signatures

Related docs: SCU pattern, Memory Desync.


5) Guardrails that actually work (defense-in-depth)

5.1 Role Bind + Echo (deterministic)

  • At turn start, the orchestrator binds (agent_id, role_id, role_hash) for that agent.
  • Every message/tool call echoes these fields.
  • Gateway enforces: if echo ≠ bind → 409 RoleDrift (reject & log).

5.2 Tool-Router ACL

  • Each tool declares allowed_callers: ["executor","auditor"].
  • Router checks both the echoed agent_id and the signature (HMAC over agent_id|role_hash|turn|tool_name).

5.3 SCU Locks (symbolic constraints)

  • Keep a short constraint header (machine-checkable) at the top of each prompt: role: planner | may: read | must_not: write | output: plan.json
  • The generator wrapper validates output against the header.

5.4 Persona Hash stability

  • role_hash = sha256(role_name + system_prompt + allowed_tools).
  • Changes only via deploy; not by runtime LLM rewrites.

6) Implementation snippets (drop-in)

6.1 Message envelope (JSON)

{
  "agent_id": "planner",
  "role_id": "planner@v3",
  "role_hash": "sha256:78c2…",
  "turn": 42,
  "mem_rev": 7,
  "mem_hash": "sha256:ab12…",
  "content": "…",
  "tool_call": null,
  "sig": "HMAC(k, agent_id|role_hash|turn)"
}

6.2 Python: gate & router (pseudo-code, stdlib-only)

import hmac, hashlib

def hmac_ok(sig, payload, k):  # payload: "agent_id|role_hash|turn|tool?"
    return hmac.compare_digest(sig, hmac.new(k, payload.encode(), hashlib.sha256).hexdigest())

def role_gate(bound, msg, secret):
    # bound: {"agent_id","role_id","role_hash","turn"}
    if (msg["agent_id"] != bound["agent_id"] or
        msg["role_id"]  != bound["role_id"]  or
        msg["role_hash"]!= bound["role_hash"] or
        msg["turn"]     != bound["turn"]):
        raise RoleDrift("echo != bind")
    payload = f'{msg["agent_id"]}|{msg["role_hash"]}|{msg["turn"]}'
    if not hmac_ok(msg["sig"], payload, secret):
        raise RoleDrift("bad signature")

TOOL_ACL = {
  "exec_sql": {"allowed_callers": {"executor"}},
  "grade_answer": {"allowed_callers": {"auditor"}},
}

def tool_router(msg):
    call = msg.get("tool_call")
    if not call: return None
    tool = call["name"]
    acl  = TOOL_ACL.get(tool, {"allowed_callers": set()})
    if msg["agent_id"] not in acl["allowed_callers"]:
        raise RoleDrift(f"agent {msg['agent_id']} not allowed to call {tool}")
    return call

6.3 Node (TypeScript-flavored) — signature check

import crypto from "crypto";

function hmacOk(sig: string, payload: string, key: Buffer) {
  const mac = crypto.createHmac("sha256", key).update(payload).digest("hex");
  return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(mac));
}

7) Observability & alerts

Metrics

  • role_drift_reject_total{agent,tool} — gate rejections
  • role_echo_missing_total{agent} — missing echo fields
  • tool_acl_block_total{agent,tool} — router blocks
  • cross_agent_kappa — from κ runner

Suggested alerts

  • increase(role_drift_reject_total[5m]) > 0 → severity: ticket
  • avg_over_time(cross_agent_kappa[30m]) < 0.5 → investigate misalignment
  • increase(role_echo_missing_total[10m]) > 3 → instrumentation broken

8) Tests & acceptance criteria

Unit

  • Planner attempts exec_sqlblocked (router).
  • Echo/bind mismatch → 409 RoleDrift.
  • Signature mismatch → 409 RoleDrift.

E2E

  • 2-agent script (planner/executor) with ambiguous handoff; κ remains ≥ 0.7.
  • Constraints header says must_not: write → any write-like output is rejected.

Acceptance

  • Role-drift incidents during canary ≤ 0 across 1k turns.
  • κ degradation < 20% vs baseline after guardrails enabled.

9) Rollout plan (safe & boring)

  1. Shadow mode: enable gate in warn-only; record metrics.
  2. Canary 510% traffic: block on violation for tools only.
  3. Gradually expand to all agents, then enforce for final responses.
  4. Keep a feature flag to bypass in emergencies.

10) Appendix — Constraint Header (SCU lock)

Embed a tiny, machine-checkable header in prompts and validate output:

role: planner
may: read
must_not: write, execute
output: plan.json

Validator (pseudo):

def validate_output(role_hdr, output):
    if role_hdr.must_not & infer_capabilities(output):
        raise RoleDrift("SCU lock violation")

Have traces where roles flip or tools are misused? Please share anonymized logs — they directly improve adapters & thresholds.

↩︎ Back & See also


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars