mirror of
https://github.com/onestardao/WFGY.git
synced 2026-04-28 11:40:07 +00:00
7.6 KiB
7.6 KiB
🔐 Privacy & Governance for RAG Systems — Practical Runbook
Build trustworthy AI: minimize data, control exposure, and prove compliance without killing velocity.
Quick Nav
Data Contracts · Ops · RAG Map 2.0 · Patterns: Memory Desync · SCU
Not legal advice. Use this as a technical baseline and align with your counsel.
0) Principles
- Minimize: ingest only what you truly need; redact at source.
- Fence: per-source prompt fences; cite-then-explain.
- Prove: log every decision via data contracts; keep tight retention.
- Control: least-privilege access; encrypt at rest/in transit.
1) PII taxonomy & redaction
- Categories: identifiers (name, email, gov ID), contact, location, financial, health, biometric, free-text PII.
- Redact at ingest with deterministic tags:
{"text":"Contact Alice at alice@example.com","redactions":[{"span":[8,13],"type":"person"},{"span":[24,43],"type":"email"}]}
- Keep reversible vault only if business requires it; otherwise irreversible.
2) Storage & access control
- Encryption: TLS in transit; AES-GCM at rest.
- Access: service accounts per component; forbid shared tokens; rotate keys.
- Retention: default 30–90 days for logs; 7–30 days for raw prompts unless required longer.
- Deletion: implement DSR (data subject request) over
doc_idoruser_id.
3) Model provider governance
- Confirm data usage (training vs. inference only).
- Disable logging on hosted APIs if must not leave boundary.
- For self-hosted models, pin container images and track model checksum.
4) Prompt governance (SCU-safe)
- Lock schema: system → task → constraints → citations → answer.
- Forbid cross-source merges; require line-level citation IDs.
- Add guard prompts to avoid reproducing secrets or PII unless necessary and consented.
5) Audit & reproducibility
- Use envelope fields (
trace_id,mem_rev,mem_hash) in every record. - Keep answer → prompt → citations → chunks chain navigable.
- Export metrics pack per release (ΔS, λ rates, nDCG, recall).
6) Config template (YAML)
privacy:
redact_at_ingest: true
redactors: [pii_email, pii_phone, pii_name]
reversible_vault: false
retention_days:
prompts: 14
logs: 60
embeddings: 180
access:
roles:
retriever: [read_chunks]
reranker: [read_chunks]
llm: [read_prompts]
analyst: [read_metrics]
secrets:
provider: "aws-kms" # or gcp-kms, vault
rotation_days: 90
providers:
openai:
share_for_training: false
claude:
share_for_training: false
7) Risk scenarios → mitigations
| Scenario | Risk | Mitigation |
|---|---|---|
| User uploads PII-heavy PDFs | Accidental exposure | Redact at ingest; block high-risk types; allow override with consent |
| Multi-tenant leakage | Cross-account data bleed | Tenant IDs in chunk keys; per-tenant indices; access policies |
| Citations reveal secrets | SCU or over-inclusion | Reduce context window; per-source fences; require justification |
| Vendor logs prompts | Data leaves boundary | Use no-log endpoints; self-host; encrypt locally |
Acceptance criteria
- ✅ PII redaction rate ≥ 95% on test corpus; no residual PII in prompts unless approved.
- ✅ Trace chain present for 100% of answers (citations included).
- ✅ Secrets rotated within policy; provider log-sharing disabled.
- ✅ Retention job passes dry-run audit monthly.
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.