WFGY/ProblemMap/GlobalFixMap/Safety_PromptIntegrity/README.md
PSBigBig × MiniPS d23ee4524a
Update README.md
2026-02-10 16:30:53 +08:00

14 KiB
Raw Blame History

Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.


What this page is

  • A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
  • Each failure pattern links to its own guide with copy-paste guardrails.
  • Works without infra changes — schema and prompt fixes only.
  • Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

  • Jailbreak attempts slip past normal filters.
  • Prompts collapse schema or inject rogue tools.
  • Tool calls drift into free text or JSON breaks.
  • Role instructions misalign (system vs user vs assistant).
  • Citations disappear or retrieval bypasses snippet contracts.
  • Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode What happens Open this
Prompt Injection Hidden instructions override your system prompt prompt_injection.md
Jailbreaks / Overrides User tricks model into ignoring rules jailbreaks_and_overrides.md
Role Confusion System / user / assistant boundaries collapse role_confusion.md
Memory Fence Missing State leaks across runs, no stable key memory_fences_and_state_keys.md
JSON Drift Tool calls malformed, fields missing json_mode_and_tool_calls.md
Citation Lost Answers skip snippet or no “cite-then-explain” citation_first.md
Injection Defense Recipes Ready-to-paste guardrails against common exploits anti_prompt_injection_recipes.md
Tool Timeouts Tool calls hang or return late tool_selection_and_timeouts.md
Role Ordering Wrong order breaks downstream eval system_user_role_order.md
Template Gaps Prompts inconsistent across agents template_library_min.md
Eval Drift No stable way to test safety fixes eval_prompts_and_checks.md

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of cited section ≥ 0.70
  • λ convergent across three paraphrases and two seeds
  • No uncontrolled free-text execution in JSON or tool modes
  • Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

  1. Lock system / user / assistant role order.
  2. Enforce citation-first and snippet schema.
  3. Apply JSON fences + argument validation.
  4. Add memory fences keyed by mem_rev and state_key.
  5. Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine: full symbolic reasoning & math stack View →
Problem Map 1.0 Initial 16-mode diagnostic framework View →
Problem Map 2.0 RAG failure tree and modular fixes View →
Semantic Clinic Expanded catalog: injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic mods View →
Benchmark vs GPT-5 Stress test GPT-5 with WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Wizard will guide you Start →

👑 Early Stargazers: See the Hall of FameGitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow