WFGY/ProblemMap/GlobalFixMap/Chunking/ops_debug_playbook.md

11 KiB
Raw Blame History

RAG ops debug playbook

🧭 Quick Return to Map

You are in a sub-page of Chunking.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A fast triage guide for incidents after you change chunking, OCR, embedding, or index settings. The goal is to localize the failing layer in minutes and apply a reversible fix.

Open these first

Golden acceptance

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 to the target section
  • λ_observe convergent across three paraphrases and two seeds
  • Citation offsets within 30 bytes of the ground block

Symptom to fix map

Symptom Quick probe Likely root Open this Minimal fix
Coverage drops after index rebuild Check index_hash change with same build id Bad boot sequence or partial ingest reindex_migration.md Rebuild with frozen normalizers, fence ingestion, re-point alias after eval pass
Citations point to wrong offsets Validate 30 byte window around cited chunk OCR or layout normalization drift pdf_layouts_and_ocr.md Re-run layout pass and regenerate chunk ids with stable scheme
High similarity yet wrong meaning Compare ΔS to anchor section and to decoy Metric or analyzer mismatch embedding-vs-semantic.md Switch metric or normalize text, add rerank pass
Answers flip between reruns Three paraphrase test and λ flip count Prompt header reorder or rerank shuffle rerankers.md Lock header order and rerank seeds, clamp variance
Tables or code never cited Check block type in top k Block typing lost during chunking code_tables_blocks.md Preserve block types, add type-aware rerank feature
One doc dominates retrieval Top k doc entropy and author field Fragmentation or duplicate shards reindex_migration.md Rebalance shards, dedupe, enable cross doc rerank
Tool loops or JSON fails Inspect tool schema and free text fields Contract too loose, injection data-contracts.md, prompt-injection.md Tighten schema, add cite first and role fences

Seven step incident routine

  1. Freeze context
    Capture build, index_hash, metric, analyzer, embed_model, retriever params, reranker.

  2. Reproduce
    Run three paraphrases and two seeds. Log ΔS per candidate, λ states, coverage, citation offsets.

  3. Verify structure
    Check chunk id format from chunk_id_schema.md and title tree from title_hierarchy.md.

  4. Boundary audit
    Confirm the cited block sits inside one detected section from section_detection.md.

  5. Content type audit
    Ensure tables and code blocks survive extraction per code_tables_blocks.md.

  6. Meaning check
    If ΔS stays high on every k, suspect metric or index mismatch. Open embedding-vs-semantic.md and rerankers.md.

  7. Decide fix module
    Retrieval drift → BBMC with contracts
    Reasoning collapse → BBCR bridge plus BBAM clamp
    Dead ends in long chains → BBPF alternate path

Copy probes you can paste

SQL like probe for vector stores

-- sample ten queries that failed coverage in the last hour
select qid, question, topk_ids, topk_scores, index_hash, embed_model
from rag_logs
where ts > now() - interval '1 hour'
  and coverage = false
limit 10;

LLM triage prompt

You have TXTOS and WFGY Problem Map.

Given logs for {N} queries with ΔS lists, λ states, citations, and index fingerprints:
1) Name the failing layer: boundary, typing, metric, rerank, OCR, contract.
2) Return exact pages to open next.
3) Propose a minimal reversible fix and a verification test.
Return JSON {layer, pages[], fix, test}.

Rollback and canary

Postmortem template

  • Incident summary
  • Impact window and scope
  • Root layer and evidence
  • Fix that shipped and verification
  • Prevention items: contracts, monitors, checklists

Prevention checklist

  • Stable chunk ids and title tree are present in every snippet payload
  • Cite first prompting and strict data contracts are enforced
  • OCR and layout normalizers are frozen for production builds
  • Rerank seed and header order are locked during canary
  • Live probes for ΔS, λ, coverage, citation accuracy are enabled

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars

下一頁建議:ProblemMap/GlobalFixMap/Chunking/chunking_checklist.md 這一頁是現場交付的簡潔檢查表,會把上面所有規則壓成二十條可勾選項,給運維和標準化使用。