WFGY/ProblemMap/GlobalFixMap/Safety_PromptIntegrity/README.md

12 KiB
Raw Blame History

Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.


What this page is

  • A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
  • Each failure pattern links to its own guide with copy-paste guardrails.
  • Works without infra changes — schema and prompt fixes only.
  • Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

  • Jailbreak attempts slip past normal filters.
  • Prompts collapse schema or inject rogue tools.
  • Tool calls drift into free text or JSON breaks.
  • Role instructions misalign (system vs user vs assistant).
  • Citations disappear or retrieval bypasses snippet contracts.
  • Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode What happens Open this
Prompt Injection Hidden instructions override your system prompt prompt_injection.md
Jailbreaks / Overrides User tricks model into ignoring rules jailbreaks_and_overrides.md
Role Confusion System / user / assistant boundaries collapse role_confusion.md
Memory Fence Missing State leaks across runs, no stable key memory_fences_and_state_keys.md
JSON Drift Tool calls malformed, fields missing json_mode_and_tool_calls.md
Citation Lost Answers skip snippet or no “cite-then-explain” citation_first.md
Injection Defense Recipes Ready-to-paste guardrails against common exploits anti_prompt_injection_recipes.md
Tool Timeouts Tool calls hang or return late tool_selection_and_timeouts.md
Role Ordering Wrong order breaks downstream eval system_user_role_order.md
Template Gaps Prompts inconsistent across agents template_library_min.md
Eval Drift No stable way to test safety fixes eval_prompts_and_checks.md

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of cited section ≥ 0.70
  • λ convergent across three paraphrases and two seeds
  • No uncontrolled free-text execution in JSON or tool modes
  • Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

  1. Lock system / user / assistant role order.
  2. Enforce citation-first and snippet schema.
  3. Apply JSON fences + argument validation.
  4. Add memory fences keyed by mem_rev and state_key.
  5. Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars