WFGY/ProblemMap/privacy-and-governance.md
2025-08-15 23:22:02 +08:00

7.6 KiB
Raw Blame History

🔐 Privacy & Governance for RAG Systems — Practical Runbook

Build trustworthy AI: minimize data, control exposure, and prove compliance without killing velocity.

Quick Nav
Data Contracts · Ops · RAG Map 2.0 · Patterns: Memory Desync · SCU

Not legal advice. Use this as a technical baseline and align with your counsel.


0) Principles

  1. Minimize: ingest only what you truly need; redact at source.
  2. Fence: per-source prompt fences; cite-then-explain.
  3. Prove: log every decision via data contracts; keep tight retention.
  4. Control: least-privilege access; encrypt at rest/in transit.

1) PII taxonomy & redaction

  • Categories: identifiers (name, email, gov ID), contact, location, financial, health, biometric, free-text PII.
  • Redact at ingest with deterministic tags:
{"text":"Contact Alice at alice@example.com","redactions":[{"span":[8,13],"type":"person"},{"span":[24,43],"type":"email"}]}
  • Keep reversible vault only if business requires it; otherwise irreversible.

2) Storage & access control

  • Encryption: TLS in transit; AES-GCM at rest.
  • Access: service accounts per component; forbid shared tokens; rotate keys.
  • Retention: default 3090 days for logs; 730 days for raw prompts unless required longer.
  • Deletion: implement DSR (data subject request) over doc_id or user_id.

3) Model provider governance

  • Confirm data usage (training vs. inference only).
  • Disable logging on hosted APIs if must not leave boundary.
  • For self-hosted models, pin container images and track model checksum.

4) Prompt governance (SCU-safe)

  • Lock schema: system → task → constraints → citations → answer.
  • Forbid cross-source merges; require line-level citation IDs.
  • Add guard prompts to avoid reproducing secrets or PII unless necessary and consented.

5) Audit & reproducibility

  • Use envelope fields (trace_id, mem_rev, mem_hash) in every record.
  • Keep answer → prompt → citations → chunks chain navigable.
  • Export metrics pack per release (ΔS, λ rates, nDCG, recall).

6) Config template (YAML)

privacy:
  redact_at_ingest: true
  redactors: [pii_email, pii_phone, pii_name]
  reversible_vault: false
  retention_days:
    prompts: 14
    logs: 60
    embeddings: 180
  access:
    roles:
      retriever: [read_chunks]
      reranker: [read_chunks]
      llm: [read_prompts]
      analyst: [read_metrics]
  secrets:
    provider: "aws-kms"   # or gcp-kms, vault
    rotation_days: 90
providers:
  openai:
    share_for_training: false
  claude:
    share_for_training: false

7) Risk scenarios → mitigations

Scenario Risk Mitigation
User uploads PII-heavy PDFs Accidental exposure Redact at ingest; block high-risk types; allow override with consent
Multi-tenant leakage Cross-account data bleed Tenant IDs in chunk keys; per-tenant indices; access policies
Citations reveal secrets SCU or over-inclusion Reduce context window; per-source fences; require justification
Vendor logs prompts Data leaves boundary Use no-log endpoints; self-host; encrypt locally

Acceptance criteria

  • PII redaction rate ≥ 95% on test corpus; no residual PII in prompts unless approved.
  • Trace chain present for 100% of answers (citations included).
  • Secrets rotated within policy; provider log-sharing disabled.
  • Retention job passes dry-run audit monthly.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame
Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow