WFGY/ProblemMap/GlobalFixMap/Safety_PromptIntegrity/README.md
2025-09-03 23:53:47 +08:00

6.7 KiB
Raw Blame History

Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.


What this page is

  • A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
  • Each failure pattern links to its own guide with copy-paste guardrails.
  • Works without infra changes — schema and prompt fixes only.
  • Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

  • Jailbreak attempts slip past normal filters.
  • Prompts collapse schema or inject rogue tools.
  • Tool calls drift into free text or JSON breaks.
  • Role instructions misalign (system vs user vs assistant).
  • Citations disappear or retrieval bypasses snippet contracts.
  • Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode What happens Open this
Prompt Injection Hidden instructions override your system prompt prompt_injection.md
Jailbreaks / Overrides User tricks model into ignoring rules jailbreaks_and_overrides.md
Role Confusion System / user / assistant boundaries collapse role_confusion.md
Memory Fence Missing State leaks across runs, no stable key memory_fences_and_state_keys.md
JSON Drift Tool calls malformed, fields missing json_mode_and_tool_calls.md
Citation Lost Answers skip snippet or no “cite-then-explain” citation_first.md
Injection Defense Recipes Ready-to-paste guardrails against common exploits anti_prompt_injection_recipes.md
Tool Timeouts Tool calls hang or return late tool_selection_and_timeouts.md
Role Ordering Wrong order breaks downstream eval system_user_role_order.md
Template Gaps Prompts inconsistent across agents template_library_min.md
Eval Drift No stable way to test safety fixes eval_prompts_and_checks.md

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of cited section ≥ 0.70
  • λ convergent across three paraphrases and two seeds
  • No uncontrolled free-text execution in JSON or tool modes
  • Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

  1. Lock system / user / assistant role order.
  2. Enforce citation-first and snippet schema.
  3. Apply JSON fences + argument validation.
  4. Add memory fences keyed by mem_rev and state_key.
  5. Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine: full symbolic reasoning & math stack View →
Problem Map 1.0 Initial 16-mode diagnostic framework View →
Problem Map 2.0 RAG failure tree and modular fixes View →
Semantic Clinic Expanded catalog: injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic mods View →
Benchmark vs GPT-5 Stress test GPT-5 with WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Wizard will guide you Start →

👑 Early Stargazers: See the Hall of Fame
WFGY Engine 2.0 is already unlocked. Star the repo to unlock more.

WFGY Main
TXT OS
Blah
Blot
Bloc
Blur
Blow