WFGY/ProblemMap/GlobalFixMap/Chunking/ops_debug_playbook.md

11 KiB
Raw Blame History

RAG ops debug playbook

🧭 Quick Return to Map

You are in a sub-page of Chunking.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A fast triage guide for incidents after you change chunking, OCR, embedding, or index settings. The goal is to localize the failing layer in minutes and apply a reversible fix.

Open these first

Golden acceptance

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 to the target section
  • λ_observe convergent across three paraphrases and two seeds
  • Citation offsets within 30 bytes of the ground block

Symptom to fix map

Symptom Quick probe Likely root Open this Minimal fix
Coverage drops after index rebuild Check index_hash change with same build id Bad boot sequence or partial ingest reindex_migration.md Rebuild with frozen normalizers, fence ingestion, re-point alias after eval pass
Citations point to wrong offsets Validate 30 byte window around cited chunk OCR or layout normalization drift pdf_layouts_and_ocr.md Re-run layout pass and regenerate chunk ids with stable scheme
High similarity yet wrong meaning Compare ΔS to anchor section and to decoy Metric or analyzer mismatch embedding-vs-semantic.md Switch metric or normalize text, add rerank pass
Answers flip between reruns Three paraphrase test and λ flip count Prompt header reorder or rerank shuffle rerankers.md Lock header order and rerank seeds, clamp variance
Tables or code never cited Check block type in top k Block typing lost during chunking code_tables_blocks.md Preserve block types, add type-aware rerank feature
One doc dominates retrieval Top k doc entropy and author field Fragmentation or duplicate shards reindex_migration.md Rebalance shards, dedupe, enable cross doc rerank
Tool loops or JSON fails Inspect tool schema and free text fields Contract too loose, injection data-contracts.md, prompt-injection.md Tighten schema, add cite first and role fences

Seven step incident routine

  1. Freeze context
    Capture build, index_hash, metric, analyzer, embed_model, retriever params, reranker.

  2. Reproduce
    Run three paraphrases and two seeds. Log ΔS per candidate, λ states, coverage, citation offsets.

  3. Verify structure
    Check chunk id format from chunk_id_schema.md and title tree from title_hierarchy.md.

  4. Boundary audit
    Confirm the cited block sits inside one detected section from section_detection.md.

  5. Content type audit
    Ensure tables and code blocks survive extraction per code_tables_blocks.md.

  6. Meaning check
    If ΔS stays high on every k, suspect metric or index mismatch. Open embedding-vs-semantic.md and rerankers.md.

  7. Decide fix module
    Retrieval drift → BBMC with contracts
    Reasoning collapse → BBCR bridge plus BBAM clamp
    Dead ends in long chains → BBPF alternate path

Copy probes you can paste

SQL like probe for vector stores

-- sample ten queries that failed coverage in the last hour
select qid, question, topk_ids, topk_scores, index_hash, embed_model
from rag_logs
where ts > now() - interval '1 hour'
  and coverage = false
limit 10;

LLM triage prompt

You have TXTOS and WFGY Problem Map.

Given logs for {N} queries with ΔS lists, λ states, citations, and index fingerprints:
1) Name the failing layer: boundary, typing, metric, rerank, OCR, contract.
2) Return exact pages to open next.
3) Propose a minimal reversible fix and a verification test.
Return JSON {layer, pages[], fix, test}.

Rollback and canary

Postmortem template

  • Incident summary
  • Impact window and scope
  • Root layer and evidence
  • Fix that shipped and verification
  • Prevention items: contracts, monitors, checklists

Prevention checklist

  • Stable chunk ids and title tree are present in every snippet payload
  • Cite first prompting and strict data contracts are enforced
  • OCR and layout normalizers are frozen for production builds
  • Rerank seed and header order are locked during canary
  • Live probes for ΔS, λ, coverage, citation accuracy are enabled

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars

下一頁建議:ProblemMap/GlobalFixMap/Chunking/chunking_checklist.md 這一頁是現場交付的簡潔檢查表,會把上面所有規則壓成二十條可勾選項,給運維和標準化使用。