WFGY/ProblemMap/GlobalFixMap/Chunking/ops_debug_playbook.md

10 KiB
Raw Blame History

RAG ops debug playbook

🧭 Quick Return to Map

You are in a sub-page of Chunking.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A fast triage guide for incidents after you change chunking, OCR, embedding, or index settings. The goal is to localize the failing layer in minutes and apply a reversible fix.

Open these first

Golden acceptance

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 to the target section
  • λ_observe convergent across three paraphrases and two seeds
  • Citation offsets within 30 bytes of the ground block

Symptom to fix map

Symptom Quick probe Likely root Open this Minimal fix
Coverage drops after index rebuild Check index_hash change with same build id Bad boot sequence or partial ingest reindex_migration.md Rebuild with frozen normalizers, fence ingestion, re-point alias after eval pass
Citations point to wrong offsets Validate 30 byte window around cited chunk OCR or layout normalization drift pdf_layouts_and_ocr.md Re-run layout pass and regenerate chunk ids with stable scheme
High similarity yet wrong meaning Compare ΔS to anchor section and to decoy Metric or analyzer mismatch embedding-vs-semantic.md Switch metric or normalize text, add rerank pass
Answers flip between reruns Three paraphrase test and λ flip count Prompt header reorder or rerank shuffle rerankers.md Lock header order and rerank seeds, clamp variance
Tables or code never cited Check block type in top k Block typing lost during chunking code_tables_blocks.md Preserve block types, add type-aware rerank feature
One doc dominates retrieval Top k doc entropy and author field Fragmentation or duplicate shards reindex_migration.md Rebalance shards, dedupe, enable cross doc rerank
Tool loops or JSON fails Inspect tool schema and free text fields Contract too loose, injection data-contracts.md, prompt-injection.md Tighten schema, add cite first and role fences

Seven step incident routine

  1. Freeze context
    Capture build, index_hash, metric, analyzer, embed_model, retriever params, reranker.

  2. Reproduce
    Run three paraphrases and two seeds. Log ΔS per candidate, λ states, coverage, citation offsets.

  3. Verify structure
    Check chunk id format from chunk_id_schema.md and title tree from title_hierarchy.md.

  4. Boundary audit
    Confirm the cited block sits inside one detected section from section_detection.md.

  5. Content type audit
    Ensure tables and code blocks survive extraction per code_tables_blocks.md.

  6. Meaning check
    If ΔS stays high on every k, suspect metric or index mismatch. Open embedding-vs-semantic.md and rerankers.md.

  7. Decide fix module
    Retrieval drift → BBMC with contracts
    Reasoning collapse → BBCR bridge plus BBAM clamp
    Dead ends in long chains → BBPF alternate path

Copy probes you can paste

SQL like probe for vector stores

-- sample ten queries that failed coverage in the last hour
select qid, question, topk_ids, topk_scores, index_hash, embed_model
from rag_logs
where ts > now() - interval '1 hour'
  and coverage = false
limit 10;

LLM triage prompt

You have TXTOS and WFGY Problem Map.

Given logs for {N} queries with ΔS lists, λ states, citations, and index fingerprints:
1) Name the failing layer: boundary, typing, metric, rerank, OCR, contract.
2) Return exact pages to open next.
3) Propose a minimal reversible fix and a verification test.
Return JSON {layer, pages[], fix, test}.

Rollback and canary

Postmortem template

  • Incident summary
  • Impact window and scope
  • Root layer and evidence
  • Fix that shipped and verification
  • Prevention items: contracts, monitors, checklists

Prevention checklist

  • Stable chunk ids and title tree are present in every snippet payload
  • Cite first prompting and strict data contracts are enforced
  • OCR and layout normalizers are frozen for production builds
  • Rerank seed and header order are locked during canary
  • Live probes for ΔS, λ, coverage, citation accuracy are enabled

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Module Description Link
WFGY Core Canonical framework entry point View
Problem Map Diagnostic map and navigation hub View
Tension Universe Experiments MVP experiment field View
Recognition Where WFGY is referenced or adopted View
AI Guide Anti-hallucination reading protocol for tools View

If this repository helps, starring it improves discovery for other builders.
GitHub Repo stars

下一頁建議:ProblemMap/GlobalFixMap/Chunking/chunking_checklist.md 這一頁是現場交付的簡潔檢查表,會把上面所有規則壓成二十條可勾選項,給運維和標準化使用。