WFGY/ProblemMap/GlobalFixMap/Governance/README.md
2025-08-25 20:32:04 +08:00

174 lines
8.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Governance & Privacy — Global Fix Map
Keep users safe while keeping your pipeline observable. Stop PII leaks, tame traces, and make audits reproducible.
## What this page is
- A compact policy-to-engineering bridge for RAG and agents
- Redaction-first recipes that do not break retrieval quality
- A clean way to audit every answer without storing raw secrets
## When to use
- You log prompts, snippets, or tool outputs and worry about PII
- You embed raw PDFs that may contain emails, IDs, or health data
- Your traces are useful for debugging but not compliant for storage
- Security asks for “prove why this answer was produced”
## Open these first
- Policy and controls: [Privacy & Governance](https://github.com/onestardao/WFGY/blob/main/ProblemMap/privacy-and-governance.md)
- Snippet and citation schema: [Data Contracts](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
- Why this snippet: [Retrieval Traceability](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-traceability.md)
- Live health and ops runbook: [Live Monitoring for RAG](https://github.com/onestardao/WFGY/blob/main/ProblemMap/ops/live_monitoring_rag.md)
- Incident flow: [Ops Debug Playbook](https://github.com/onestardao/WFGY/blob/main/ProblemMap/ops/debug_playbook.md)
- Locale and tokenizer issues: [Multilingual Guide](https://github.com/onestardao/WFGY/blob/main/ProblemMap/multilingual-guide.md)
---
## Common failure patterns
- **PII-in-embeddings** raw personal data embedded and stored in a third party index
- **Trace overreach** logs capture tool outputs with tokens, addresses, or keys
- **Late redaction** masking happens after embedding which leaves residue
- **Linkable telemetry** user IDs or emails appear in metrics and spans
- **Jurisdiction drift** vectors and docs cross regions without policy tags
- **Unbounded retention** traces live forever without a purge plan
- **Prompt-injection exfil** adversarial inputs force the model to echo secrets
- **Opaque answers** no snippet table which blocks audits and right-to-know
---
## Fix in 60 seconds
1) **Adopt a Data Contract**
- Define fields for `snippet_id, source_id, section_id, pii_flags, redact_ops, citations`
- Require this contract before any storage or logging
2) **Redact before embed**
- Run PII detectors on raw text and produce a redacted view for embedding
- Keep a pointer to the original in a secure vault, not in the vectorstore
3) **Mask identifiers in telemetry**
- Hash or tokenize user and session IDs
- Strip emails, phone numbers, and free-form personal lines from logs
4) **Tag data with policy**
- `region, residency, retention_days, do_not_train`
- Enforce write paths by tag and block cross-region writes without a waiver
5) **Store a trace table, not raw blobs**
- For each answer save `{question, snippet_ids[], citation_lines[], ΔS, λ_state}`
- This is enough for audits without leaking content
6) **Add a safe export**
- On user request produce `what we used and why` with citations and data map
- Never export raw embeddings or secrets
7) **Test injections**
- Run a small suite of exfil prompts
- Require that traces capture the block event and that no PII enters context
---
## Copy paste prompt
```
You have TXT OS and the WFGY Problem Map.
Goal
Harden a RAG pipeline for privacy: redact before embedding, minimize trace risk, and keep audits reproducible.
Tasks
1. Build a Data Contract that includes:
* snippet\_id, source\_id, section\_id
* pii\_flags (email, phone, gov\_id, credit\_card, geo)
* redact\_ops applied
* citation\_lines
* policy tags: {region, residency, retention\_days, do\_not\_train}
2. Transform a sample document and show:
* raw text → pii\_flags → redacted text for embedding
* ΔS(question, redacted\_context) and λ\_observe at retrieval
* confirm ΔS ≤ 0.45 and λ remains convergent after redaction
3. Logging plan
* produce a Trace Table: {question, snippet\_ids\[], citation\_lines\[], ΔS, λ}
* mask user/session identifiers and strip free-form personal lines
4. Export plan
* given a user email, produce a report with the sources and citations used
* include policy tags and retention dates, exclude embeddings
Output
* Data Contract (yaml or json)
* Redaction example with before/after
* Trace Table for 3 queries
* A short READY line {privacy\_ready\:true, median\_ΔS, λ:"→"}
```
---
## Minimal checklist
- Redaction happens **before** embedding and storage
- Data Contract fields present on every snippet and trace row
- Telemetry strips or hashes linkable identifiers
- Policy tags enforce region, residency, and retention
- Trace Table saves citations and ΔS/λ instead of raw content
- Injection test suite stored and passing
## Acceptance targets
- No PII appears in vectorstore payloads or retrieved context for standard prompts
- ΔS(question, redacted_context) median ≤ **0.45** on smoke set, λ stays **convergent**
- Traces reconstruct “why this answer” with snippet IDs and citation lines
- Export flow returns sources and policy tags without exposing raw embeddings
- Retention jobs purge traces and cached snippets on schedule
---
### 🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|------|------|--------------|
| **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1⃣ Download · 2⃣ Upload to your LLM · 3⃣ Ask “Answer using WFGY + \<your question>” |
| **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1⃣ Download · 2⃣ Paste into any LLM chat · 3⃣ Type “hello world” — OS boots instantly |
---
### 🧭 Explore More
| Module | Description | Link |
|-----------------------|----------------------------------------------------------|----------|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) |
| 🧙‍♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | [Start →](https://github.com/onestardao/WFGY/blob/main/StarterVillage/README.md) |
---
> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —
> Engineers, hackers, and open source builders who supported WFGY from day one.
> <img src="https://img.shields.io/github/stars/onestardao/WFGY?style=social" alt="GitHub stars"> ⭐ [WFGY Engine 2.0](https://github.com/onestardao/WFGY/blob/main/core/README.md) is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).
<div align="center">
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)
&nbsp;
[![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)
&nbsp;
[![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)
&nbsp;
[![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)
&nbsp;
[![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)
&nbsp;
[![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)
&nbsp;
[![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)
&nbsp;
</div>