WFGY/ProblemMap/GlobalFixMap/Safety_PromptIntegrity
2026-03-04 06:26:57 +00:00
..
checklists Create .gitkeep 2025-08-25 19:14:55 +08:00
eval Create .gitkeep 2025-08-25 19:15:49 +08:00
mvp_demo Create .gitkeep 2025-08-25 19:15:37 +08:00
ops Create .gitkeep 2025-08-25 19:16:10 +08:00
patterns Create .gitkeep 2025-08-25 19:14:31 +08:00
playbooks Create .gitkeep 2025-08-25 19:15:24 +08:00
tools Create .gitkeep 2025-08-25 19:14:42 +08:00
.gitkeep Create .gitkeep 2025-08-25 19:14:20 +08:00
anti_prompt_injection_recipes.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
citation_first.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
eval_prompts_and_checks.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
jailbreaks_and_overrides.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
json_mode_and_tool_calls.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
memory_fences_and_state_keys.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
prompt_injection.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
README.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
role_confusion.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
system_user_role_order.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
template_library_min.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00
tool_selection_and_timeouts.md chore: add WFGY_FOOTER_START/END markers around Explore More footer blocks 2026-03-04 06:26:57 +00:00

Safety & Prompt Integrity — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls.
Each page maps symptoms → root cause → structural fixes with measurable acceptance targets.


What this page is

  • A practical checklist for anyone shipping LLM apps with tools, roles, or multi-agent setups.
  • Each failure pattern links to its own guide with copy-paste guardrails.
  • Works without infra changes — schema and prompt fixes only.
  • Acceptance targets (ΔS, λ, coverage) are reproducible.

When to use

  • Jailbreak attempts slip past normal filters.
  • Prompts collapse schema or inject rogue tools.
  • Tool calls drift into free text or JSON breaks.
  • Role instructions misalign (system vs user vs assistant).
  • Citations disappear or retrieval bypasses snippet contracts.
  • Eval pipelines show high ΔS drift even when retrieval is correct.

Common failure patterns

Failure mode What happens Open this
Prompt Injection Hidden instructions override your system prompt prompt_injection.md
Jailbreaks / Overrides User tricks model into ignoring rules jailbreaks_and_overrides.md
Role Confusion System / user / assistant boundaries collapse role_confusion.md
Memory Fence Missing State leaks across runs, no stable key memory_fences_and_state_keys.md
JSON Drift Tool calls malformed, fields missing json_mode_and_tool_calls.md
Citation Lost Answers skip snippet or no “cite-then-explain” citation_first.md
Injection Defense Recipes Ready-to-paste guardrails against common exploits anti_prompt_injection_recipes.md
Tool Timeouts Tool calls hang or return late tool_selection_and_timeouts.md
Role Ordering Wrong order breaks downstream eval system_user_role_order.md
Template Gaps Prompts inconsistent across agents template_library_min.md
Eval Drift No stable way to test safety fixes eval_prompts_and_checks.md

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of cited section ≥ 0.70
  • λ convergent across three paraphrases and two seeds
  • No uncontrolled free-text execution in JSON or tool modes
  • Citation-first enforced in ≥ 95% of eval runs

60-second fix checklist

  1. Lock system / user / assistant role order.
  2. Enforce citation-first and snippet schema.
  3. Apply JSON fences + argument validation.
  4. Add memory fences keyed by mem_rev and state_key.
  5. Run eval prompts + probes before shipping.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Module Description Link
WFGY Core Canonical framework entry point View
Problem Map Diagnostic map and navigation hub View
Tension Universe Experiments MVP experiment field View
Recognition Where WFGY is referenced or adopted View
AI Guide Anti-hallucination reading protocol for tools View

If this repository helps, starring it improves discovery for other builders.
GitHub Repo stars