| .. | ||
| checklists | ||
| eval | ||
| mvp_demo | ||
| ops | ||
| patterns | ||
| playbooks | ||
| tools | ||
| chunk_id_schema.md | ||
| chunking-checklist.md | ||
| code_tables_blocks.md | ||
| eval_rag_precision_recall.md | ||
| live_monitoring_rag.md | ||
| ops_debug_playbook.md | ||
| pdf_layouts_and_ocr.md | ||
| README.md | ||
| reindex_migration.md | ||
| section_detection.md | ||
| title_hierarchy.md | ||
Chunking — Global Fix Map
🏥 Quick Return to Emergency Room
You are in a specialist desk.
For full triage and doctors on duty, return here:
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.
A compact hub to stabilize document chunking across formats, pipelines, and retrieval systems.
This folder routes chunk-related bugs to structural fixes and provides checklists, schema, and live recipes.
No infra change required.
Orientation: what each page does
| Page | What it solves | Typical symptom |
|---|---|---|
| Chunk ID Schema | Unique ID + schema for each chunk | Duplicate or drifting chunks across runs |
| Chunking Checklist | Minimal audit list for validity | Chunks too long, too short, or incomplete |
| Code / Tables / Blocks | Preserve structure for code, tables, blocks | Retrieval drops formatting or logic |
| Section Detection | Detect paragraph and section anchors | Anchors missing, snippets cut mid-thought |
| Title Hierarchy | Maintain document heading hierarchy | Only partial or meaningless sub-sections retrieved |
| PDF Layouts & OCR | Repair PDF/OCR-specific chunking | Citations collapse after parsing |
| Reindex & Migration | Safe chunk migration during reindex | Index rebuilt but old refs mismatch |
| Eval RAG Precision & Recall | Deterministic evaluation recipes | “Better” chunking cannot be proven |
| Live Monitoring (RAG) | Online health checks for chunking | Sudden drift or collapse after deploy |
When to use this folder
- Your chunks look fine by eye but retrieval skips important sections.
- PDF / OCR parsing collapses headers, math, or tables.
- Hybrid retrievers underperform due to inconsistent chunk boundaries.
- Reindexing breaks old citations.
- Context flips between runs with same corpus.
Acceptance targets
- Chunk boundaries align with semantic windows
- ΔS(question, retrieved) ≤ 0.45
- Coverage of target section ≥ 0.70
- λ_observe convergent across 3 paraphrases and 2 seeds
- Traceability contract fields always present:
{snippet_id, section_id, source_url, offsets, tokens}
60-second fix checklist
-
Check chunk IDs
Applychunk_id_schema. Ensure unique + stable across reindex. -
Audit with checklist
Run the chunking-checklist before ingest. -
Preserve structure
Use code_tables_blocks for code, tables, blocks. -
Validate anchors
Confirm section and title detection. Apply title_hierarchy. -
Reindex safely
Use reindex_migration with hash/version lock. -
Monitor live
Apply live_monitoring_rag to catch collapse early.
Minimal probe pack
Context: I loaded TXT OS and the WFGY pages.
Task:
- Given doc corpus D, log ΔS(question, retrieved) and λ across 3 paraphrases.
- Validate chunk IDs and section anchors.
- If ΔS ≥ 0.60 or λ flips, propose the smallest structural change:
chunk schema, checklist, or reindex.
- Verify coverage ≥ 0.70 after fix.
Return JSON:
{ "citations": [...], "ΔS": 0.xx, "λ_state": "<>", "coverage": 0.xx, "next_fix": "..." }
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
Explore More
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.