WFGY/ProblemMap/GlobalFixMap/Chunking/README.md

9.8 KiB
Raw Blame History

Chunking — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A compact hub to stabilize document chunking across formats, pipelines, and retrieval systems.
This folder routes chunk-related bugs to structural fixes and provides checklists, schema, and live recipes.
No infra change required.


Orientation: what each page does

Page What it solves Typical symptom
Chunk ID Schema Unique ID + schema for each chunk Duplicate or drifting chunks across runs
Chunking Checklist Minimal audit list for validity Chunks too long, too short, or incomplete
Code / Tables / Blocks Preserve structure for code, tables, blocks Retrieval drops formatting or logic
Section Detection Detect paragraph and section anchors Anchors missing, snippets cut mid-thought
Title Hierarchy Maintain document heading hierarchy Only partial or meaningless sub-sections retrieved
PDF Layouts & OCR Repair PDF/OCR-specific chunking Citations collapse after parsing
Reindex & Migration Safe chunk migration during reindex Index rebuilt but old refs mismatch
Eval RAG Precision & Recall Deterministic evaluation recipes “Better” chunking cannot be proven
Live Monitoring (RAG) Online health checks for chunking Sudden drift or collapse after deploy

When to use this folder

  • Your chunks look fine by eye but retrieval skips important sections.
  • PDF / OCR parsing collapses headers, math, or tables.
  • Hybrid retrievers underperform due to inconsistent chunk boundaries.
  • Reindexing breaks old citations.
  • Context flips between runs with same corpus.

Acceptance targets

  • Chunk boundaries align with semantic windows
  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of target section ≥ 0.70
  • λ_observe convergent across 3 paraphrases and 2 seeds
  • Traceability contract fields always present: {snippet_id, section_id, source_url, offsets, tokens}

60-second fix checklist

  1. Check chunk IDs
    Apply chunk_id_schema. Ensure unique + stable across reindex.

  2. Audit with checklist
    Run the chunking-checklist before ingest.

  3. Preserve structure
    Use code_tables_blocks for code, tables, blocks.

  4. Validate anchors
    Confirm section and title detection. Apply title_hierarchy.

  5. Reindex safely
    Use reindex_migration with hash/version lock.

  6. Monitor live
    Apply live_monitoring_rag to catch collapse early.


Minimal probe pack

Context: I loaded TXT OS and the WFGY pages.

Task:
- Given doc corpus D, log ΔS(question, retrieved) and λ across 3 paraphrases.
- Validate chunk IDs and section anchors.
- If ΔS ≥ 0.60 or λ flips, propose the smallest structural change:
  chunk schema, checklist, or reindex.
- Verify coverage ≥ 0.70 after fix.

Return JSON:
{ "citations": [...], "ΔS": 0.xx, "λ_state": "<>", "coverage": 0.xx, "next_fix": "..." }

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars