Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.

What this page is

A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
Each failure pattern links to its own guide with copy-paste guardrails.
Works without infra changes — schema and prompt fixes only.
Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

Jailbreak attempts slip past normal filters.
Prompts collapse schema or inject rogue tools.
Tool calls drift into free text or JSON breaks.
Role instructions misalign (system vs user vs assistant).
Citations disappear or retrieval bypasses snippet contracts.
Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode	What happens	Open this
Prompt Injection	Hidden instructions override your system prompt	prompt_injection.md
Jailbreaks / Overrides	User tricks model into ignoring rules	jailbreaks_and_overrides.md
Role Confusion	System / user / assistant boundaries collapse	role_confusion.md
Memory Fence Missing	State leaks across runs, no stable key	memory_fences_and_state_keys.md
JSON Drift	Tool calls malformed, fields missing	json_mode_and_tool_calls.md
Citation Lost	Answers skip snippet or no “cite-then-explain”	citation_first.md
Injection Defense Recipes	Ready-to-paste guardrails against common exploits	anti_prompt_injection_recipes.md
Tool Timeouts	Tool calls hang or return late	tool_selection_and_timeouts.md
Role Ordering	Wrong order breaks downstream eval	system_user_role_order.md
Template Gaps	Prompts inconsistent across agents	template_library_min.md
Eval Drift	No stable way to test safety fixes	eval_prompts_and_checks.md

Acceptance targets

ΔS(question, retrieved) ≤ 0.45
Coverage of cited section ≥ 0.70
λ convergent across three paraphrases and two seeds
No uncontrolled free-text execution in JSON or tool modes
Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

Lock system / user / assistant role order.
Enforce citation-first and snippet schema.
Apply JSON fences + argument validation.
Add memory fences keyed by mem_rev and state_key.
Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + ”
TXT OS	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine: full symbolic reasoning & math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic framework	View →
Problem Map 2.0	RAG failure tree and modular fixes	View →
Semantic Clinic	Expanded catalog: injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic mods	View →
Benchmark vs GPT-5	Stress test GPT-5 with WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Wizard will guide you	Start →

👑 Early Stargazers: See the Hall of Fame — ⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

14 KiB Raw Blame History Unescape Escape