8.5 KiB
Chunking — Global Fix Map
🏥 Quick Return to Emergency Room
You are in a specialist desk.
For full triage and doctors on duty, return here:
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.
A compact hub to stabilize document chunking across formats, pipelines, and retrieval systems.
This folder routes chunk-related bugs to structural fixes and provides checklists, schema, and live recipes.
No infra change required.
Orientation: what each page does
| Page | What it solves | Typical symptom |
|---|---|---|
| Chunk ID Schema | Unique ID + schema for each chunk | Duplicate or drifting chunks across runs |
| Chunking Checklist | Minimal audit list for validity | Chunks too long, too short, or incomplete |
| Code / Tables / Blocks | Preserve structure for code, tables, blocks | Retrieval drops formatting or logic |
| Section Detection | Detect paragraph and section anchors | Anchors missing, snippets cut mid-thought |
| Title Hierarchy | Maintain document heading hierarchy | Only partial or meaningless sub-sections retrieved |
| PDF Layouts & OCR | Repair PDF/OCR-specific chunking | Citations collapse after parsing |
| Reindex & Migration | Safe chunk migration during reindex | Index rebuilt but old refs mismatch |
| Eval RAG Precision & Recall | Deterministic evaluation recipes | “Better” chunking cannot be proven |
| Live Monitoring (RAG) | Online health checks for chunking | Sudden drift or collapse after deploy |
When to use this folder
- Your chunks look fine by eye but retrieval skips important sections.
- PDF / OCR parsing collapses headers, math, or tables.
- Hybrid retrievers underperform due to inconsistent chunk boundaries.
- Reindexing breaks old citations.
- Context flips between runs with same corpus.
Acceptance targets
- Chunk boundaries align with semantic windows
- ΔS(question, retrieved) ≤ 0.45
- Coverage of target section ≥ 0.70
- λ_observe convergent across 3 paraphrases and 2 seeds
- Traceability contract fields always present:
{snippet_id, section_id, source_url, offsets, tokens}
60-second fix checklist
-
Check chunk IDs
Applychunk_id_schema. Ensure unique + stable across reindex. -
Audit with checklist
Run the chunking-checklist before ingest. -
Preserve structure
Use code_tables_blocks for code, tables, blocks. -
Validate anchors
Confirm section and title detection. Apply title_hierarchy. -
Reindex safely
Use reindex_migration with hash/version lock. -
Monitor live
Apply live_monitoring_rag to catch collapse early.
Minimal probe pack
Context: I loaded TXT OS and the WFGY pages.
Task:
- Given doc corpus D, log ΔS(question, retrieved) and λ across 3 paraphrases.
- Validate chunk IDs and section anchors.
- If ΔS ≥ 0.60 or λ flips, propose the smallest structural change:
chunk schema, checklist, or reindex.
- Verify coverage ≥ 0.70 after fix.
Return JSON:
{ "citations": [...], "ΔS": 0.xx, "λ_state": "<>", "coverage": 0.xx, "next_fix": "..." }
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.