WFGY/ProblemMap/GlobalFixMap/Safety_PromptIntegrity/README.md

13 KiB
Raw Blame History

Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.


What this page is

  • A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
  • Each failure pattern links to its own guide with copy-paste guardrails.
  • Works without infra changes — schema and prompt fixes only.
  • Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

  • Jailbreak attempts slip past normal filters.
  • Prompts collapse schema or inject rogue tools.
  • Tool calls drift into free text or JSON breaks.
  • Role instructions misalign (system vs user vs assistant).
  • Citations disappear or retrieval bypasses snippet contracts.
  • Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode What happens Open this
Prompt Injection Hidden instructions override your system prompt prompt_injection.md
Jailbreaks / Overrides User tricks model into ignoring rules jailbreaks_and_overrides.md
Role Confusion System / user / assistant boundaries collapse role_confusion.md
Memory Fence Missing State leaks across runs, no stable key memory_fences_and_state_keys.md
JSON Drift Tool calls malformed, fields missing json_mode_and_tool_calls.md
Citation Lost Answers skip snippet or no “cite-then-explain” citation_first.md
Injection Defense Recipes Ready-to-paste guardrails against common exploits anti_prompt_injection_recipes.md
Tool Timeouts Tool calls hang or return late tool_selection_and_timeouts.md
Role Ordering Wrong order breaks downstream eval system_user_role_order.md
Template Gaps Prompts inconsistent across agents template_library_min.md
Eval Drift No stable way to test safety fixes eval_prompts_and_checks.md

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of cited section ≥ 0.70
  • λ convergent across three paraphrases and two seeds
  • No uncontrolled free-text execution in JSON or tool modes
  • Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

  1. Lock system / user / assistant role order.
  2. Enforce citation-first and snippet schema.
  3. Apply JSON fences + argument validation.
  4. Add memory fences keyed by mem_rev and state_key.
  5. Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars