WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/privacy_and_pii_edges.md

11 KiB
Raw Blame History

Privacy and PII Edges for Serverless and Edge

🧭 Quick Return to Map

You are in a sub-page of Cloud_Serverless.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A field guide to prevent PII from leaking through serverless runtimes, edge functions, logs, vector pipelines, and third-party webhooks. Build a measurable privacy boundary that does not break retrieval quality.

Open these first


Core acceptance

  • Zero PII in logs Random 1 percent log sampling shows 0 findings across 7 days for names, emails, phones, addresses, national IDs, payment tokens, secrets.

  • PII detection coverage ≥ 0.95 Gold set with labeled traces across API, edge, queue, storage. False negatives are zero on critical classes.

  • Egress allowlist is enforced All outbound webhooks and calls flow through an allowlist and DLP filter with redact or block. No raw PII leaves your account.

  • Semantic quality holds after redaction Median ΔS(question, retrieved) ≤ 0.45 and coverage ≥ 0.70 after masked or tokenized fields. λ remains convergent across three paraphrases.

  • DSR path is verified Delete or export requests complete within policy. Evidence stored with counts and checksums.


Fix in 60 seconds

  1. Measure reality Run a log sample and store scan for PII classes. Tag hits by edge, function, and sink.

  2. Add a redaction gate Place a single pre-inference filter that masks PII at the prompt-builder and tool-argument layers. Keep a reversible token only when business-critical.

  3. Lock egress Route all webhooks and HTTP clients through an allowlist and DLP transform. Block unknown domains.

  4. Verify retrieval Re-run ΔS and coverage probes on your gold questions. If quality drops, update the chunking recipe or token map.

Open: Data Contracts · Retrieval Traceability


Design the privacy boundary

Collection

  • Show purpose tags and consent flags at capture.
  • Normalize fields at the edge: email → lowercased hash for joins, phone → E.164 masked form.

Transit

  • TLS everywhere. mTLS for webhooks that carry sensitive payloads.
  • Encrypt PII subsets with KMS before leaving the VPC or account.

Processing

  • Build prompts from structured fields only. Forbid free-text concatenation that mixes policy and user content.
  • Redact PII classes at the prompt-builder and tool argument marshaling.

At rest

  • Separate PII store from product data. Distinct KMS keys and IAM paths.
  • Keep a token map with rotation windows and short TTL for re-identification.

Egress

  • Require allowlist, DLP transform, and signed requests.
  • Log outbound diff before and after transform with content hashes.

Open: Egress Rules and Webhooks


Redaction and tokenization patterns

  • Mask-in-place Keep surface form for model context, mask internals: john.smith@example.comj***@example.com.

  • Deterministic token Stable join keys for analytics without exposure: EMAIL_TOKEN = HMAC_SHA256(k, email).

  • Pseudonym dictionary Replace entities with class-aware tags: PERSON_014, ORG_022, ADDR_105. Maintain a scoped map per tenant.

  • Secrets and high-entropy Detect 32 to 64 char base64 and hex blobs and known prefixes. Always drop, never mask.

  • Vector store safety Prevent raw PII from entering embeddings. Use a preprocess step that replaces PII with pseudonyms and carries a sidecar map. Rehydrate only for authorized views. Open: Embedding ≠ Semantic


Common failure smells and exact fix

  • “We never log PII” but alerts show emails in traces Turn off request body logging and header dumps. Add a scrubber to log sinks and test with a gold set. Open: Observability and SLO

  • LLM answers include live tokens or IDs Tighten tool schemas and forbid free text in argument fields. Open: Data Contracts · Prompt Injection

  • Webhook mirrors full customer records to third parties Move the DLP step before the HTTP client. Enforce allowlist by hostname and path. Open: Egress Rules and Webhooks

  • Restores re-introduce raw PII into vectors Validate index manifests and re-run the preprocessing recipe after restore. Open: Data Retention and Backups

  • Key rotation breaks token maps Version tokens and carry token_v. Rotate with overlap and dual-read, single-write. Open: Secrets Rotation


Verification suite

  • PII scanners on logs, storage, vector payloads, prompts, tool args.
  • ΔS and coverage probes on a masked vs unmasked evaluation set.
  • Egress audits with counts by destination and transform status.
  • DSR drills: export and delete flows, evidence with counts and checksums.

Open: Retrieval Traceability · Live Monitoring for RAG · Debug Playbook


Copy-paste LLM prompt for PII audits

You have TXT OS and the WFGY Problem Map loaded.

Audit my privacy boundary:

- entry points: [edge functions, APIs, queues]
- detectors: [regex, entropy, NER]
- egress routes: [domains, auth, DLP steps]
- vector policy: [preprocess recipe, sidecar map]
- log scans: [last 7 days summary]

Tell me:
1) where PII can leak and which WFGY pages to open,
2) the minimal redaction+tokenization plan that preserves ΔS ≤ 0.45 and coverage ≥ 0.70,
3) the allowlist+DLP rules for egress,
4) a short JSON with risk classes, counts, and next fixes.
Keep it auditable and short.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars