WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/privacy_and_pii_edges.md
2025-08-28 11:20:38 +08:00

12 KiB
Raw Blame History

Privacy and PII Edges for Serverless and Edge

A field guide to prevent PII from leaking through serverless runtimes, edge functions, logs, vector pipelines, and third-party webhooks. Build a measurable privacy boundary that does not break retrieval quality.

Open these first


Core acceptance

  • Zero PII in logs Random 1 percent log sampling shows 0 findings across 7 days for names, emails, phones, addresses, national IDs, payment tokens, secrets.

  • PII detection coverage ≥ 0.95 Gold set with labeled traces across API, edge, queue, storage. False negatives are zero on critical classes.

  • Egress allowlist is enforced All outbound webhooks and calls flow through an allowlist and DLP filter with redact or block. No raw PII leaves your account.

  • Semantic quality holds after redaction Median ΔS(question, retrieved) ≤ 0.45 and coverage ≥ 0.70 after masked or tokenized fields. λ remains convergent across three paraphrases.

  • DSR path is verified Delete or export requests complete within policy. Evidence stored with counts and checksums.


Fix in 60 seconds

  1. Measure reality Run a log sample and store scan for PII classes. Tag hits by edge, function, and sink.

  2. Add a redaction gate Place a single pre-inference filter that masks PII at the prompt-builder and tool-argument layers. Keep a reversible token only when business-critical.

  3. Lock egress Route all webhooks and HTTP clients through an allowlist and DLP transform. Block unknown domains.

  4. Verify retrieval Re-run ΔS and coverage probes on your gold questions. If quality drops, update the chunking recipe or token map.

Open: Data Contracts · Retrieval Traceability


Design the privacy boundary

Collection

  • Show purpose tags and consent flags at capture.
  • Normalize fields at the edge: email → lowercased hash for joins, phone → E.164 masked form.

Transit

  • TLS everywhere. mTLS for webhooks that carry sensitive payloads.
  • Encrypt PII subsets with KMS before leaving the VPC or account.

Processing

  • Build prompts from structured fields only. Forbid free-text concatenation that mixes policy and user content.
  • Redact PII classes at the prompt-builder and tool argument marshaling.

At rest

  • Separate PII store from product data. Distinct KMS keys and IAM paths.
  • Keep a token map with rotation windows and short TTL for re-identification.

Egress

  • Require allowlist, DLP transform, and signed requests.
  • Log outbound diff before and after transform with content hashes.

Open: Egress Rules and Webhooks


Redaction and tokenization patterns

  • Mask-in-place Keep surface form for model context, mask internals: john.smith@example.comj***@example.com.

  • Deterministic token Stable join keys for analytics without exposure: EMAIL_TOKEN = HMAC_SHA256(k, email).

  • Pseudonym dictionary Replace entities with class-aware tags: PERSON_014, ORG_022, ADDR_105. Maintain a scoped map per tenant.

  • Secrets and high-entropy Detect 32 to 64 char base64 and hex blobs and known prefixes. Always drop, never mask.

  • Vector store safety Prevent raw PII from entering embeddings. Use a preprocess step that replaces PII with pseudonyms and carries a sidecar map. Rehydrate only for authorized views. Open: Embedding ≠ Semantic


Common failure smells and exact fix

  • “We never log PII” but alerts show emails in traces Turn off request body logging and header dumps. Add a scrubber to log sinks and test with a gold set. Open: Observability and SLO

  • LLM answers include live tokens or IDs Tighten tool schemas and forbid free text in argument fields. Open: Data Contracts · Prompt Injection

  • Webhook mirrors full customer records to third parties Move the DLP step before the HTTP client. Enforce allowlist by hostname and path. Open: Egress Rules and Webhooks

  • Restores re-introduce raw PII into vectors Validate index manifests and re-run the preprocessing recipe after restore. Open: Data Retention and Backups

  • Key rotation breaks token maps Version tokens and carry token_v. Rotate with overlap and dual-read, single-write. Open: Secrets Rotation


Verification suite

  • PII scanners on logs, storage, vector payloads, prompts, tool args.
  • ΔS and coverage probes on a masked vs unmasked evaluation set.
  • Egress audits with counts by destination and transform status.
  • DSR drills: export and delete flows, evidence with counts and checksums.

Open: Retrieval Traceability · Live Monitoring for RAG · Debug Playbook


Copy-paste LLM prompt for PII audits

You have TXT OS and the WFGY Problem Map loaded.

Audit my privacy boundary:

- entry points: [edge functions, APIs, queues]
- detectors: [regex, entropy, NER]
- egress routes: [domains, auth, DLP steps]
- vector policy: [preprocess recipe, sidecar map]
- log scans: [last 7 days summary]

Tell me:
1) where PII can leak and which WFGY pages to open,
2) the minimal redaction+tokenization plan that preserves ΔS ≤ 0.45 and coverage ≥ 0.70,
3) the allowlist+DLP rules for egress,
4) a short JSON with risk classes, counts, and next fixes.
Keep it auditable and short.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow