diff --git a/ProblemMap/multi-agent-chaos/role-drift.md b/ProblemMap/multi-agent-chaos/role-drift.md index 93eb3c10..24022651 100644 --- a/ProblemMap/multi-agent-chaos/role-drift.md +++ b/ProblemMap/multi-agent-chaos/role-drift.md @@ -1,105 +1,272 @@ -# 📒 Deep Dive · Agent Role Drift (Placeholder) +# Deep Dive — Agent Role Drift (Multi-Agent Chaos) -> **Status:** WIP — collecting real‑world traces. -> Contribute a failing log and help refine the final fix! +> **Status:** Production-ready guidance with guardrails, tests, and ops hooks. +> If you have real traces, please still share them — they help us harden thresholds & adapters. --- -## đŸ€” What Counts as Role Drift? +## Quick nav -* Agent forgets its assigned function -* Two agents silently swap personas -* “Scout” suddenly issues “Medic” commands -* Chatbot starts answering as the **user** instead of assistant +* Back to multi-agent map → [Multi-Agent Problems](../Multi-Agent_Problems.md) +* Related patterns → SCU ([Symbolic Constraint Unlock](../patterns/pattern_symbolic_constraint_unlock.md)), Memory Desync ([pattern\_memory\_desync](../patterns/pattern_memory_desync.md)), Semantic Drift ([pattern\_rag\_semantic\_drift](../patterns/pattern_rag_semantic_drift.md)) +* Examples → [Example 04 · Multi-Agent Coordination](../examples/example_04_multi_agent_coordination.md), [Example 03 · Pipeline Patch](../examples/example_03_pipeline_patch.md) +* Eval → [Cross-Agent Consistency (Îș)](../eval/eval_cross_agent_consistency.md) --- -## 📝 How You Can Help +## 1) What is “Role Drift”? -1. Reproduce a role‑drift incident in any multi‑agent framework. -2. Capture the **exact prompt + response trace** (5‑10 turns ideal). -3. Open a [Discussion](../../../../discussions/new) using the **“Role Drift Trace”** template. +The agent’s **operational role/persona** silently changes mid-pipeline: -We’ll plug your trace into WFGY’s cross‑agent simulator, tighten the Role‑Hash limiter, and tag you in the commit notes. +* A **Planner/Scout** starts **executing** (issuing “Write/Deploy/Approve” commands). +* **Auditor** begins generating final prose instead of validating/paraphrasing. +* Two agents **swap** personas (Scout ↔ Medic), or the assistant starts speaking **as the user**. +* Tools exposed for one role get invoked by the wrong role (privilege escalation). + +**Why it matters.** Roles are **safety boundaries**. If they drift, you get privilege misuse, broken arbitration, inconsistent memory writes, and incident reports that are hard to reproduce. --- -## 🚧 Current Fix Sketch (to be expanded) +## 2) Signals & fast triage (no LLM needed) -| Step | Module | Action | -|------|--------|--------| -| Detect drift via `agent_id` mismatch | Semantic Tree | Flag node | -| Verify with ΔS peer check | BBMC | Confirm divergence | -| Lock / rollback persona | BBCR | Restore last stable role | +You likely have role drift if: + +* Tool calls appear with an **unexpected role\_id** (e.g., `auditor` calls `exec_sql`). +* **System prompt** digest changes mid-turn without a config deploy. +* **Cross-agent Îș** (agreement between Auditor vs Scholar) collapses after a tool call. +* Logs show **persona tags missing** (no `role_hash` echo) for a subset of turns. + +Deterministic checks (implement now): + +* Every inbound/outbound message carries: + + * `agent_id`, `role_id` (stable string), `role_hash` (digest of role spec), `turn`, `mem_rev`, `mem_hash`. +* Gate **rejects** if: + `role_id` changed while `turn` not advanced, or `role_hash` ≠ bound hash for this agent at turn start. +* Tool router verifies `agent_id` ∈ `allowed_callers` for the tool; otherwise block & log. --- -> Want this page fleshed out faster? Drop a ⭐ on the repo—priority rises with community interest. -> ↩ [Back to Multi‑Agent Map](../Multi-Agent_Problems.md) +## 3) Minimal reproducible scenario +**Goal:** force the Planner to start executing. + +1. Start two agents with distinct role specs: `planner@v3` (no write tools), `executor@v1` (write tools only). +2. Inject a long chain (≄6 steps) with ambiguous phrasing like “go ahead and finalize”. +3. Observe planner turns: look for an **unauthorized tool call** or the absence of **role echo**. --- -## 2ïžâƒŁÂ `/ProblemMap/multi-agent-chaos/memory-overwrite.md` +## 4) Root causes (and the smell you’ll see) +| Root cause | Code smell (what shows up) | Where to fix | +| ----------------------------------------------------------- | -------------------------------------------- | ----------------------------------- | +| **Missing bind/echo** of role across hops | No `role_hash` or it changes without deploy | Agent runner, message envelope | +| **Symbolic constraint unlock (SCU)** (must/only rules drop) | “must not execute” disappeared after rewrite | Prompt templates; add **SCU locks** | +| **Shared memory overwrite** | Auditor writes planner state | Memory guard; mem\_rev gating | +| **Tool router too permissive** | Any agent can call any tool | Router policy & signatures | -# 📒 Deep Dive · Cross‑Agent Memory Overwrite (Placeholder) - -> **Status:** WIP — awaiting real logs that show one agent erasing another’s state. +Related docs: [SCU pattern](../patterns/pattern_symbolic_constraint_unlock.md), [Memory Desync](../patterns/pattern_memory_desync.md). --- -## 🔍 Typical Overwrite Scenario +## 5) Guardrails that actually work (defense-in-depth) -* Agent A saves `Plan v1` → Agent B unknowingly commits `Plan v0` over it -* Shared vector store returns last writer wins → earlier context gone -* Conversation later references missing data → hallucination +### 5.1 Role Bind + Echo (deterministic) + +* At **turn start**, the orchestrator binds `(agent_id, role_id, role_hash)` for that agent. +* Every message/tool call **echoes** these fields. +* Gateway enforces: **if echo ≠ bind → 409 RoleDrift** (reject & log). + +### 5.2 Tool-Router ACL + +* Each tool declares `allowed_callers: ["executor","auditor"]`. +* Router checks **both** the echoed `agent_id` and the **signature** (HMAC over `agent_id|role_hash|turn|tool_name`). + +### 5.3 SCU Locks (symbolic constraints) + +* Keep a short **constraint header** (machine-checkable) at the top of each prompt: + `role: planner | may: read | must_not: write | output: plan.json` +* The generator wrapper validates output **against the header**. + +### 5.4 Persona Hash stability + +* `role_hash = sha256(role_name + system_prompt + allowed_tools)`. +* Changes only via deploy; not by runtime LLM rewrites. --- -## 📝 Help Us Harden the Fix +## 6) Implementation snippets (drop-in) -| What to Submit | Where | Why | -|----------------|-------|-----| -| JSON / text trace of overwrite | New **Discussion → Memory Overwrite Trace** | Build regression test | -| Framework info (LangChain, AutoGen 
) | Same thread | Tune adapter layer | -| Desired guardrail behavior | Comment | Define acceptance test | +### 6.1 Message envelope (JSON) + +```json +{ + "agent_id": "planner", + "role_id": "planner@v3", + "role_hash": "sha256:78c2
", + "turn": 42, + "mem_rev": 7, + "mem_hash": "sha256:ab12
", + "content": "
", + "tool_call": null, + "sig": "HMAC(k, agent_id|role_hash|turn)" +} +``` + +### 6.2 Python: gate & router (pseudo-code, stdlib-only) + +```python +import hmac, hashlib + +def hmac_ok(sig, payload, k): # payload: "agent_id|role_hash|turn|tool?" + return hmac.compare_digest(sig, hmac.new(k, payload.encode(), hashlib.sha256).hexdigest()) + +def role_gate(bound, msg, secret): + # bound: {"agent_id","role_id","role_hash","turn"} + if (msg["agent_id"] != bound["agent_id"] or + msg["role_id"] != bound["role_id"] or + msg["role_hash"]!= bound["role_hash"] or + msg["turn"] != bound["turn"]): + raise RoleDrift("echo != bind") + payload = f'{msg["agent_id"]}|{msg["role_hash"]}|{msg["turn"]}' + if not hmac_ok(msg["sig"], payload, secret): + raise RoleDrift("bad signature") + +TOOL_ACL = { + "exec_sql": {"allowed_callers": {"executor"}}, + "grade_answer": {"allowed_callers": {"auditor"}}, +} + +def tool_router(msg): + call = msg.get("tool_call") + if not call: return None + tool = call["name"] + acl = TOOL_ACL.get(tool, {"allowed_callers": set()}) + if msg["agent_id"] not in acl["allowed_callers"]: + raise RoleDrift(f"agent {msg['agent_id']} not allowed to call {tool}") + return call +``` + +### 6.3 Node (TypeScript-flavored) — signature check + +```ts +import crypto from "crypto"; + +function hmacOk(sig: string, payload: string, key: Buffer) { + const mac = crypto.createHmac("sha256", key).update(payload).digest("hex"); + return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(mac)); +} +``` --- -## 🛠 Planned Guardrail Outline +## 7) Observability & alerts -1. **Node Version Stamp** — every write carries `agent_id + timestamp`. -2. **ΔS Collision Alert** — large semantic mismatch triggers “merge or fork?” prompt. -3. **BBCR Reconcile** — automatic three‑way merge or safe branch. +**Metrics** + +* `role_drift_reject_total{agent,tool}` — gate rejections +* `role_echo_missing_total{agent}` — missing echo fields +* `tool_acl_block_total{agent,tool}` — router blocks +* `cross_agent_kappa` — from [Îș runner](../eval/eval_cross_agent_consistency.md) + +**Suggested alerts** + +* `increase(role_drift_reject_total[5m]) > 0` → severity: ticket +* `avg_over_time(cross_agent_kappa[30m]) < 0.5` → investigate misalignment +* `increase(role_echo_missing_total[10m]) > 3` → instrumentation broken --- -↩ [Back to Multi‑Agent Map](../Multi-Agent_Problems.md) +## 8) Tests & acceptance criteria -
+**Unit** -> GitHub stars ⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone ⭐ Star WFGY on GitHub +* Planner attempts `exec_sql` → **blocked** (router). +* Echo/bind mismatch → **409 RoleDrift**. +* Signature mismatch → **409 RoleDrift**. -> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** — +**E2E** + +* 2-agent script (planner/executor) with ambiguous handoff; Îș remains ≄ 0.7. +* Constraints header says `must_not: write` → any write-like output is rejected. + +**Acceptance** + +* Role-drift incidents during canary ≀ 0 across 1k turns. +* Îș degradation < 20% vs baseline after guardrails enabled. + +--- + +## 9) Rollout plan (safe & boring) + +1. **Shadow mode**: enable gate in **warn-only**; record metrics. +2. **Canary** 5–10% traffic: **block** on violation for tools only. +3. Gradually expand to **all agents**, then enforce for **final responses**. +4. Keep a feature flag to bypass in emergencies. + +--- + +## 10) Appendix — Constraint Header (SCU lock) + +Embed a tiny, machine-checkable header in prompts and validate output: + +``` +role: planner +may: read +must_not: write, execute +output: plan.json +``` + +Validator (pseudo): + +```python +def validate_output(role_hdr, output): + if role_hdr.must_not & infer_capabilities(output): + raise RoleDrift("SCU lock violation") +``` + +--- + +> Have traces where roles flip or tools are misused? Please share anonymized logs — they directly improve adapters & thresholds. + +↩ [Back to Multi-Agent Map](../Multi-Agent_Problems.md) + +--- + +### 🧭 Explore More + +| Module | Description | Link | +| --------------------- | -------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | +| WFGY Core | Standalone semantic reasoning engine for any LLM | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) | +| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) | +| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) | +| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) | +| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) | +| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) | + +--- + +> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** — > Engineers, hackers, and open source builders who supported WFGY from day one. +> GitHub stars ⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone ⭐ **[Star WFGY on GitHub](https://github.com/onestardao/WFGY)** +
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY) -  +  [![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS) -  +  [![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah) -  +  [![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot) -  +  [![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc) -  +  [![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur) -  +  [![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)
-