12 KiB
Privacy and PII Edges for Serverless and Edge
A field guide to prevent PII from leaking through serverless runtimes, edge functions, logs, vector pipelines, and third-party webhooks. Build a measurable privacy boundary that does not break retrieval quality.
Open these first
- Boundary schemas: Data Contracts · Retrieval Traceability
- Adversarial inputs: Prompt Injection · Bluffing Controls
- Ops companions: Egress Rules and Webhooks · Secrets Rotation · Observability and SLO
- Data lifecycle: Data Retention and Backups · Edge Cache Invalidation
Core acceptance
-
Zero PII in logs Random 1 percent log sampling shows 0 findings across 7 days for names, emails, phones, addresses, national IDs, payment tokens, secrets.
-
PII detection coverage ≥ 0.95 Gold set with labeled traces across API, edge, queue, storage. False negatives are zero on critical classes.
-
Egress allowlist is enforced All outbound webhooks and calls flow through an allowlist and DLP filter with redact or block. No raw PII leaves your account.
-
Semantic quality holds after redaction Median ΔS(question, retrieved) ≤ 0.45 and coverage ≥ 0.70 after masked or tokenized fields. λ remains convergent across three paraphrases.
-
DSR path is verified Delete or export requests complete within policy. Evidence stored with counts and checksums.
Fix in 60 seconds
-
Measure reality Run a log sample and store scan for PII classes. Tag hits by edge, function, and sink.
-
Add a redaction gate Place a single pre-inference filter that masks PII at the prompt-builder and tool-argument layers. Keep a reversible token only when business-critical.
-
Lock egress Route all webhooks and HTTP clients through an allowlist and DLP transform. Block unknown domains.
-
Verify retrieval Re-run ΔS and coverage probes on your gold questions. If quality drops, update the chunking recipe or token map.
Open: Data Contracts · Retrieval Traceability
Design the privacy boundary
Collection
- Show purpose tags and consent flags at capture.
- Normalize fields at the edge: email → lowercased hash for joins, phone → E.164 masked form.
Transit
- TLS everywhere. mTLS for webhooks that carry sensitive payloads.
- Encrypt PII subsets with KMS before leaving the VPC or account.
Processing
- Build prompts from structured fields only. Forbid free-text concatenation that mixes policy and user content.
- Redact PII classes at the prompt-builder and tool argument marshaling.
At rest
- Separate PII store from product data. Distinct KMS keys and IAM paths.
- Keep a token map with rotation windows and short TTL for re-identification.
Egress
- Require allowlist, DLP transform, and signed requests.
- Log outbound diff before and after transform with content hashes.
Open: Egress Rules and Webhooks
Redaction and tokenization patterns
-
Mask-in-place Keep surface form for model context, mask internals:
john.smith@example.com→j***@example.com. -
Deterministic token Stable join keys for analytics without exposure:
EMAIL_TOKEN = HMAC_SHA256(k, email). -
Pseudonym dictionary Replace entities with class-aware tags:
PERSON_014,ORG_022,ADDR_105. Maintain a scoped map per tenant. -
Secrets and high-entropy Detect 32 to 64 char base64 and hex blobs and known prefixes. Always drop, never mask.
-
Vector store safety Prevent raw PII from entering embeddings. Use a preprocess step that replaces PII with pseudonyms and carries a sidecar map. Rehydrate only for authorized views. Open: Embedding ≠ Semantic
Common failure smells and exact fix
-
“We never log PII” but alerts show emails in traces Turn off request body logging and header dumps. Add a scrubber to log sinks and test with a gold set. Open: Observability and SLO
-
LLM answers include live tokens or IDs Tighten tool schemas and forbid free text in argument fields. Open: Data Contracts · Prompt Injection
-
Webhook mirrors full customer records to third parties Move the DLP step before the HTTP client. Enforce allowlist by hostname and path. Open: Egress Rules and Webhooks
-
Restores re-introduce raw PII into vectors Validate index manifests and re-run the preprocessing recipe after restore. Open: Data Retention and Backups
-
Key rotation breaks token maps Version tokens and carry
token_v. Rotate with overlap and dual-read, single-write. Open: Secrets Rotation
Verification suite
- PII scanners on logs, storage, vector payloads, prompts, tool args.
- ΔS and coverage probes on a masked vs unmasked evaluation set.
- Egress audits with counts by destination and transform status.
- DSR drills: export and delete flows, evidence with counts and checksums.
Open: Retrieval Traceability · Live Monitoring for RAG · Debug Playbook
Copy-paste LLM prompt for PII audits
You have TXT OS and the WFGY Problem Map loaded.
Audit my privacy boundary:
- entry points: [edge functions, APIs, queues]
- detectors: [regex, entropy, NER]
- egress routes: [domains, auth, DLP steps]
- vector policy: [preprocess recipe, sidecar map]
- log scans: [last 7 days summary]
Tell me:
1) where PII can leak and which WFGY pages to open,
2) the minimal redaction+tokenization plan that preserves ΔS ≤ 0.45 and coverage ≥ 0.70,
3) the allowlist+DLP rules for egress,
4) a short JSON with risk classes, counts, and next fixes.
Keep it auditable and short.
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.