Privacy and PII Edges for Serverless and Edge

A field guide to prevent PII from leaking through serverless runtimes, edge functions, logs, vector pipelines, and third-party webhooks. Build a measurable privacy boundary that does not break retrieval quality.

Open these first

Boundary schemas: Data Contracts · Retrieval Traceability
Adversarial inputs: Prompt Injection · Bluffing Controls
Ops companions: Egress Rules and Webhooks · Secrets Rotation · Observability and SLO
Data lifecycle: Data Retention and Backups · Edge Cache Invalidation

Core acceptance

Zero PII in logs Random 1 percent log sampling shows 0 findings across 7 days for names, emails, phones, addresses, national IDs, payment tokens, secrets.
PII detection coverage ≥ 0.95 Gold set with labeled traces across API, edge, queue, storage. False negatives are zero on critical classes.
Egress allowlist is enforced All outbound webhooks and calls flow through an allowlist and DLP filter with redact or block. No raw PII leaves your account.
Semantic quality holds after redaction Median ΔS(question, retrieved) ≤ 0.45 and coverage ≥ 0.70 after masked or tokenized fields. λ remains convergent across three paraphrases.
DSR path is verified Delete or export requests complete within policy. Evidence stored with counts and checksums.

Fix in 60 seconds

Measure reality Run a log sample and store scan for PII classes. Tag hits by edge, function, and sink.
Add a redaction gate Place a single pre-inference filter that masks PII at the prompt-builder and tool-argument layers. Keep a reversible token only when business-critical.
Lock egress Route all webhooks and HTTP clients through an allowlist and DLP transform. Block unknown domains.
Verify retrieval Re-run ΔS and coverage probes on your gold questions. If quality drops, update the chunking recipe or token map.

Open: Data Contracts · Retrieval Traceability

Design the privacy boundary

Collection

Show purpose tags and consent flags at capture.
Normalize fields at the edge: email → lowercased hash for joins, phone → E.164 masked form.

Transit

TLS everywhere. mTLS for webhooks that carry sensitive payloads.
Encrypt PII subsets with KMS before leaving the VPC or account.

Processing

Build prompts from structured fields only. Forbid free-text concatenation that mixes policy and user content.
Redact PII classes at the prompt-builder and tool argument marshaling.

At rest

Separate PII store from product data. Distinct KMS keys and IAM paths.
Keep a token map with rotation windows and short TTL for re-identification.

Egress

Require allowlist, DLP transform, and signed requests.
Log outbound diff before and after transform with content hashes.

Open: Egress Rules and Webhooks

Redaction and tokenization patterns

Mask-in-place Keep surface form for model context, mask internals: john.smith@example.com → j***@example.com.
Deterministic token Stable join keys for analytics without exposure: EMAIL_TOKEN = HMAC_SHA256(k, email).
Pseudonym dictionary Replace entities with class-aware tags: PERSON_014, ORG_022, ADDR_105. Maintain a scoped map per tenant.
Secrets and high-entropy Detect 32 to 64 char base64 and hex blobs and known prefixes. Always drop, never mask.
Vector store safety Prevent raw PII from entering embeddings. Use a preprocess step that replaces PII with pseudonyms and carries a sidecar map. Rehydrate only for authorized views. Open: Embedding ≠ Semantic

Common failure smells and exact fix

“We never log PII” but alerts show emails in traces Turn off request body logging and header dumps. Add a scrubber to log sinks and test with a gold set. Open: Observability and SLO
LLM answers include live tokens or IDs Tighten tool schemas and forbid free text in argument fields. Open: Data Contracts · Prompt Injection
Webhook mirrors full customer records to third parties Move the DLP step before the HTTP client. Enforce allowlist by hostname and path. Open: Egress Rules and Webhooks
Restores re-introduce raw PII into vectors Validate index manifests and re-run the preprocessing recipe after restore. Open: Data Retention and Backups
Key rotation breaks token maps Version tokens and carry token_v. Rotate with overlap and dual-read, single-write. Open: Secrets Rotation

Verification suite

PII scanners on logs, storage, vector payloads, prompts, tool args.
ΔS and coverage probes on a masked vs unmasked evaluation set.
Egress audits with counts by destination and transform status.
DSR drills: export and delete flows, evidence with counts and checksums.

Open: Retrieval Traceability · Live Monitoring for RAG · Debug Playbook

Copy-paste LLM prompt for PII audits

You have TXT OS and the WFGY Problem Map loaded.

Audit my privacy boundary:

- entry points: [edge functions, APIs, queues]
- detectors: [regex, entropy, NER]
- egress routes: [domains, auth, DLP steps]
- vector policy: [preprocess recipe, sidecar map]
- log scans: [last 7 days summary]

Tell me:
1) where PII can leak and which WFGY pages to open,
2) the minimal redaction+tokenization plan that preserves ΔS ≤ 0.45 and coverage ≥ 0.70,
3) the allowlist+DLP rules for egress,
4) a short JSON with risk classes, counts, and next fixes.
Keep it auditable and short.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

12 KiB Raw Blame History Unescape Escape