WFGY/ProblemMap/privacy-and-governance.md

6.3 KiB
Raw Blame History

🔐 Privacy & Governance for RAG Systems — Practical Runbook

Build trustworthy AI: minimize data, control exposure, and prove compliance without killing velocity.

Quick Nav
Data Contracts · Ops · RAG Map 2.0 · Patterns: Memory Desync · SCU

Not legal advice. Use this as a technical baseline and align with your counsel.


0) Principles

  1. Minimize: ingest only what you truly need; redact at source.
  2. Fence: per-source prompt fences; cite-then-explain.
  3. Prove: log every decision via data contracts; keep tight retention.
  4. Control: least-privilege access; encrypt at rest/in transit.

1) PII taxonomy & redaction

  • Categories: identifiers (name, email, gov ID), contact, location, financial, health, biometric, free-text PII.
  • Redact at ingest with deterministic tags:
{"text":"Contact Alice at alice@example.com","redactions":[{"span":[8,13],"type":"person"},{"span":[24,43],"type":"email"}]}
  • Keep reversible vault only if business requires it; otherwise irreversible.

2) Storage & access control

  • Encryption: TLS in transit; AES-GCM at rest.
  • Access: service accounts per component; forbid shared tokens; rotate keys.
  • Retention: default 3090 days for logs; 730 days for raw prompts unless required longer.
  • Deletion: implement DSR (data subject request) over doc_id or user_id.

3) Model provider governance

  • Confirm data usage (training vs. inference only).
  • Disable logging on hosted APIs if must not leave boundary.
  • For self-hosted models, pin container images and track model checksum.

4) Prompt governance (SCU-safe)

  • Lock schema: system → task → constraints → citations → answer.
  • Forbid cross-source merges; require line-level citation IDs.
  • Add guard prompts to avoid reproducing secrets or PII unless necessary and consented.

5) Audit & reproducibility

  • Use envelope fields (trace_id, mem_rev, mem_hash) in every record.
  • Keep answer → prompt → citations → chunks chain navigable.
  • Export metrics pack per release (ΔS, λ rates, nDCG, recall).

6) Config template (YAML)

privacy:
  redact_at_ingest: true
  redactors: [pii_email, pii_phone, pii_name]
  reversible_vault: false
  retention_days:
    prompts: 14
    logs: 60
    embeddings: 180
  access:
    roles:
      retriever: [read_chunks]
      reranker: [read_chunks]
      llm: [read_prompts]
      analyst: [read_metrics]
  secrets:
    provider: "aws-kms"   # or gcp-kms, vault
    rotation_days: 90
providers:
  openai:
    share_for_training: false
  claude:
    share_for_training: false

7) Risk scenarios → mitigations

Scenario Risk Mitigation
User uploads PII-heavy PDFs Accidental exposure Redact at ingest; block high-risk types; allow override with consent
Multi-tenant leakage Cross-account data bleed Tenant IDs in chunk keys; per-tenant indices; access policies
Citations reveal secrets SCU or over-inclusion Reduce context window; per-source fences; require justification
Vendor logs prompts Data leaves boundary Use no-log endpoints; self-host; encrypt locally

Acceptance criteria

  • PII redaction rate ≥ 95% on test corpus; no residual PII in prompts unless approved.
  • Trace chain present for 100% of answers (citations included).
  • Secrets rotated within policy; provider log-sharing disabled.
  • Retention job passes dry-run audit monthly.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars