12 KiB
RAG ops debug playbook
A fast triage guide for incidents after you change chunking, OCR, embedding, or index settings. The goal is to localize the failing layer in minutes and apply a reversible fix.
Open these first
- Chunk ids and stability: chunk_id_schema.md
- Title tree numbering: title_hierarchy.md
- Section boundary rules: section_detection.md
- Typed blocks (code, tables, figures): code_tables_blocks.md
- PDF layout and OCR normalization: pdf_layouts_and_ocr.md
- Rebuild without breaking citations: reindex_migration.md
- Eval harness and gates: eval_rag_precision_recall.md
- Live probes and alerts: live_monitoring_rag.md
- Retrieval trace schema: retrieval-traceability.md
- Payload contracts: data-contracts.md
- Reranker controls: rerankers.md
- Similarity vs meaning: embedding-vs-semantic.md
- Prompt injection: prompt-injection.md
- Visual recovery map: rag-architecture-and-recovery.md
Golden acceptance
- ΔS(question, retrieved) ≤ 0.45
- Coverage ≥ 0.70 to the target section
- λ_observe convergent across three paraphrases and two seeds
- Citation offsets within 30 bytes of the ground block
Symptom to fix map
| Symptom | Quick probe | Likely root | Open this | Minimal fix |
|---|---|---|---|---|
| Coverage drops after index rebuild | Check index_hash change with same build id |
Bad boot sequence or partial ingest | reindex_migration.md | Rebuild with frozen normalizers, fence ingestion, re-point alias after eval pass |
| Citations point to wrong offsets | Validate 30 byte window around cited chunk | OCR or layout normalization drift | pdf_layouts_and_ocr.md | Re-run layout pass and regenerate chunk ids with stable scheme |
| High similarity yet wrong meaning | Compare ΔS to anchor section and to decoy | Metric or analyzer mismatch | embedding-vs-semantic.md | Switch metric or normalize text, add rerank pass |
| Answers flip between reruns | Three paraphrase test and λ flip count | Prompt header reorder or rerank shuffle | rerankers.md | Lock header order and rerank seeds, clamp variance |
| Tables or code never cited | Check block type in top k |
Block typing lost during chunking | code_tables_blocks.md | Preserve block types, add type-aware rerank feature |
| One doc dominates retrieval | Top k doc entropy and author field | Fragmentation or duplicate shards | reindex_migration.md | Rebalance shards, dedupe, enable cross doc rerank |
| Tool loops or JSON fails | Inspect tool schema and free text fields | Contract too loose, injection | data-contracts.md, prompt-injection.md | Tighten schema, add cite first and role fences |
Seven step incident routine
-
Freeze context
Capturebuild,index_hash,metric,analyzer,embed_model, retriever params, reranker. -
Reproduce
Run three paraphrases and two seeds. Log ΔS per candidate, λ states, coverage, citation offsets. -
Verify structure
Check chunk id format from chunk_id_schema.md and title tree from title_hierarchy.md. -
Boundary audit
Confirm the cited block sits inside one detected section from section_detection.md. -
Content type audit
Ensure tables and code blocks survive extraction per code_tables_blocks.md. -
Meaning check
If ΔS stays high on every k, suspect metric or index mismatch. Open embedding-vs-semantic.md and rerankers.md. -
Decide fix module
Retrieval drift → BBMC with contracts
Reasoning collapse → BBCR bridge plus BBAM clamp
Dead ends in long chains → BBPF alternate path
Copy probes you can paste
SQL like probe for vector stores
-- sample ten queries that failed coverage in the last hour
select qid, question, topk_ids, topk_scores, index_hash, embed_model
from rag_logs
where ts > now() - interval '1 hour'
and coverage = false
limit 10;
LLM triage prompt
You have TXTOS and WFGY Problem Map.
Given logs for {N} queries with ΔS lists, λ states, citations, and index fingerprints:
1) Name the failing layer: boundary, typing, metric, rerank, OCR, contract.
2) Return exact pages to open next.
3) Propose a minimal reversible fix and a verification test.
Return JSON {layer, pages[], fix, test}.
Rollback and canary
- Roll back if two of the live gates from live_monitoring_rag.md fire in two consecutive windows.
- Canary new index at five percent. Promote only if coverage and citation accuracy meet gates from eval_rag_precision_recall.md.
Postmortem template
- Incident summary
- Impact window and scope
- Root layer and evidence
- Fix that shipped and verification
- Prevention items: contracts, monitors, checklists
Prevention checklist
- Stable chunk ids and title tree are present in every snippet payload
- Cite first prompting and strict data contracts are enforced
- OCR and layout normalizers are frozen for production builds
- Rerank seed and header order are locked during canary
- Live probes for ΔS, λ, coverage, citation accuracy are enabled
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.
下一頁建議:ProblemMap/GlobalFixMap/Chunking/chunking_checklist.md
這一頁是現場交付的簡潔檢查表,會把上面所有規則壓成二十條可勾選項,給運維和標準化使用。