RAG ops debug playbook

A fast triage guide for incidents after you change chunking, OCR, embedding, or index settings. The goal is to localize the failing layer in minutes and apply a reversible fix.

Open these first

Chunk ids and stability: chunk_id_schema.md
Title tree numbering: title_hierarchy.md
Section boundary rules: section_detection.md
Typed blocks (code, tables, figures): code_tables_blocks.md
PDF layout and OCR normalization: pdf_layouts_and_ocr.md
Rebuild without breaking citations: reindex_migration.md
Eval harness and gates: eval_rag_precision_recall.md
Live probes and alerts: live_monitoring_rag.md
Retrieval trace schema: retrieval-traceability.md
Payload contracts: data-contracts.md
Reranker controls: rerankers.md
Similarity vs meaning: embedding-vs-semantic.md
Prompt injection: prompt-injection.md
Visual recovery map: rag-architecture-and-recovery.md

Golden acceptance

ΔS(question, retrieved) ≤ 0.45
Coverage ≥ 0.70 to the target section
λ_observe convergent across three paraphrases and two seeds
Citation offsets within 30 bytes of the ground block

Symptom to fix map

Symptom	Quick probe	Likely root	Open this	Minimal fix
Coverage drops after index rebuild	Check `index_hash` change with same build id	Bad boot sequence or partial ingest	reindex_migration.md	Rebuild with frozen normalizers, fence ingestion, re-point alias after eval pass
Citations point to wrong offsets	Validate 30 byte window around cited chunk	OCR or layout normalization drift	pdf_layouts_and_ocr.md	Re-run layout pass and regenerate chunk ids with stable scheme
High similarity yet wrong meaning	Compare ΔS to anchor section and to decoy	Metric or analyzer mismatch	embedding-vs-semantic.md	Switch metric or normalize text, add rerank pass
Answers flip between reruns	Three paraphrase test and λ flip count	Prompt header reorder or rerank shuffle	rerankers.md	Lock header order and rerank seeds, clamp variance
Tables or code never cited	Check block `type` in top k	Block typing lost during chunking	code_tables_blocks.md	Preserve block types, add type-aware rerank feature
One doc dominates retrieval	Top k doc entropy and author field	Fragmentation or duplicate shards	reindex_migration.md	Rebalance shards, dedupe, enable cross doc rerank
Tool loops or JSON fails	Inspect tool schema and free text fields	Contract too loose, injection	data-contracts.md, prompt-injection.md	Tighten schema, add cite first and role fences

Seven step incident routine

Freeze context
Capture build, index_hash, metric, analyzer, embed_model, retriever params, reranker.
Reproduce
Run three paraphrases and two seeds. Log ΔS per candidate, λ states, coverage, citation offsets.
Verify structure
Check chunk id format from chunk_id_schema.md and title tree from title_hierarchy.md.
Boundary audit
Confirm the cited block sits inside one detected section from section_detection.md.
Content type audit
Ensure tables and code blocks survive extraction per code_tables_blocks.md.
Meaning check
If ΔS stays high on every k, suspect metric or index mismatch. Open embedding-vs-semantic.md and rerankers.md.
Decide fix module
Retrieval drift → BBMC with contracts
Reasoning collapse → BBCR bridge plus BBAM clamp
Dead ends in long chains → BBPF alternate path

Copy probes you can paste

SQL like probe for vector stores

-- sample ten queries that failed coverage in the last hour
select qid, question, topk_ids, topk_scores, index_hash, embed_model
from rag_logs
where ts > now() - interval '1 hour'
  and coverage = false
limit 10;

LLM triage prompt

You have TXTOS and WFGY Problem Map.

Given logs for {N} queries with ΔS lists, λ states, citations, and index fingerprints:
1) Name the failing layer: boundary, typing, metric, rerank, OCR, contract.
2) Return exact pages to open next.
3) Propose a minimal reversible fix and a verification test.
Return JSON {layer, pages[], fix, test}.

Rollback and canary

Roll back if two of the live gates from live_monitoring_rag.md fire in two consecutive windows.
Canary new index at five percent. Promote only if coverage and citation accuracy meet gates from eval_rag_precision_recall.md.

Postmortem template

Incident summary
Impact window and scope
Root layer and evidence
Fix that shipped and verification
Prevention items: contracts, monitors, checklists

Prevention checklist

Stable chunk ids and title tree are present in every snippet payload
Cite first prompting and strict data contracts are enforced
OCR and layout normalizers are frozen for production builds
Rerank seed and header order are locked during canary
Live probes for ΔS, λ, coverage, citation accuracy are enabled

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →