mirror of
https://github.com/onestardao/WFGY.git
synced 2026-04-30 12:39:55 +00:00
6.9 KiB
6.9 KiB
Safety & Prompt Integrity — Global Fix Map
A hub to stabilize prompt-level safety and schema integrity across providers, agents, and eval flows.
Use this folder when failures look like jailbreaks, role confusion, or malformed tool calls. Each page maps symptoms → structural fixes with measurable acceptance targets.
Quick routes to per-page guides
- Prompt injection patterns → prompt_injection.md
- Jailbreaks and override attempts → jailbreaks_and_overrides.md
- Role confusion between system / user / assistant → role_confusion.md
- Memory fences and state keys → memory_fences_and_state_keys.md
- JSON mode and tool call guardrails → json_mode_and_tool_calls.md
- Citation-first enforcement → citation_first.md
- Anti-injection recipes (ready-to-paste) → anti_prompt_injection_recipes.md
- Tool selection and timeouts → tool_selection_and_timeouts.md
- System vs user role ordering → system_user_role_order.md
- Minimal template library → template_library_min.md
- Eval prompts and integrity checks → eval_prompts_and_checks.md
When to use this folder
- Jailbreak attempts slip past standard filters.
- Prompts collapse schema or inject rogue tools.
- Tool calls drift into free text or JSON breaks.
- Role instructions are misaligned (system vs user vs assistant).
- Citations disappear, or retrieval steps bypass snippet contracts.
- Eval pipelines show high ΔS drift even when retrieval is correct.
Acceptance targets
- ΔS(question, retrieved) ≤ 0.45
- Coverage of cited section ≥ 0.70
- λ remains convergent across three paraphrases and two seeds
- No uncontrolled free-text execution in JSON or tool modes
- Citation-first enforced in ≥ 95% of eval runs
60-second fix checklist
- Lock system / user / assistant role order.
- Enforce citation-first and snippet schema.
- Apply JSON fences + argument validation.
- Add memory fences keyed by
mem_revandstate_key. - Run eval prompts + structural probes before ship.
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + ” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.