Safety & Prompt Integrity — Global Fix Map

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls. Each page maps symptoms → structural fixes with measurable acceptance targets.

Quick routes to per-page guides

Prompt injection patterns → prompt_injection.md
Jailbreaks and override attempts → jailbreaks_and_overrides.md
Role confusion between system / user / assistant → role_confusion.md
Memory fences and state keys → memory_fences_and_state_keys.md
JSON mode and tool call guardrails → json_mode_and_tool_calls.md
Citation-first enforcement → citation_first.md
Anti-injection recipes (ready-to-paste) → anti_prompt_injection_recipes.md
Tool selection and timeouts → tool_selection_and_timeouts.md
System vs user role ordering → system_user_role_order.md
Minimal template library → template_library_min.md
Eval prompts and integrity checks → eval_prompts_and_checks.md

When to use this folder

Jailbreak attempts slip past standard filters.
Prompts collapse schema or inject rogue tools.
Tool calls drift into free text or JSON breaks.
Role instructions are misaligned (system vs user vs assistant).
Citations disappear, or retrieval steps bypass snippet contracts.
Eval pipelines show high ΔS drift even when retrieval is correct.

Acceptance targets

ΔS(question, retrieved) ≤ 0.45
Coverage of cited section ≥ 0.70
λ remains convergent across three paraphrases and two seeds
No uncontrolled free-text execution in JSON or tool modes
Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

Lock system / user / assistant role order.
Enforce citation-first and snippet schema.
Apply JSON fences + argument validation.
Add memory fences keyed by mem_rev and state_key.
Run eval prompts + structural probes before ship.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + ”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

6.9 KiB Raw Blame History Unescape Escape