WFGY/ProblemMap/GlobalFixMap/Retrieval/retrieval-playbook.md

10 KiB
Raw Blame History

Retrieval Playbook

🧭 Quick Return to Map

You are in a sub-page of Retrieval.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A practical, store-agnostic playbook to stabilize retrieval quality. Use this page to route symptoms to the right structural fix, apply measurable targets, and keep read/write parity across pipelines.

When to use

  • High similarity yet wrong meaning
  • Missing or unstable citations
  • Hybrid retrieval performs worse than a single retriever
  • Results flip across runs or paraphrases
  • New deploy returns empty or partial context

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 for the intended section
  • λ remains convergent across 3 paraphrases and 2 seeds
  • E_resonance stays flat on long windows

Helpers:


60-second fix path

  1. Probe
    Run ΔS(question, retrieved) at k = 5, 10, 20. Log λ for each paraphrase.
    Tool: deltaS_probes.md

  2. Lock schema
    Enforce cite-then-explain, and require snippet_id, section_id, source_url, offsets, tokens.
    Spec: Data Contracts

  3. Repair the failing layer

    • Wrong meaning with high similarity → see Metric and analyzer parity below
    • Missing or shaky citations → install Traceability schema
    • Hybrid worse than single → run Hybrid weighting and Query parsing split
    • Flips across runs → clamp with Rerankers and parity checks
  4. Verify
    Coverage ≥ 0.70 on 3 paraphrases; λ convergent on 2 seeds; ΔS ≤ 0.45.


Root-cause map → exact fixes

1) Metric and analyzer parity

Symptoms: high similarity yet wrong meaning, language or casing skew, mixed punctuation behavior.

Actions

  • Align dense and sparse analyzers. Keep lowercasing, accent fold, token boundaries consistent.
  • Normalize vectors at write and read. Keep pooling identical.
  • Rebuild with explicit metric and dimension logged in traces.

Open

2) Traceability and citation locks

Symptoms: answer looks right but citations are missing, wrong section id, or not reproducible.

Actions

  • Require snippet_id, section_id, source_url, offsets, tokens in every hop.
  • Forbid cross-section reuse unless explicitly whitelisted.
  • Enforce cite-then-explain in prompts.

Open

3) Hybrid retrieval that underperforms

Symptoms: BM25 + dense gives worse order than either alone; relevant docs appear far down; order flips.

Actions

  • Separate query parsing from retrieval. Fix the parse.
  • Weight dense and sparse explicitly. Add a deterministic tiebreak.
  • Add a rerank step with a fixed cross-encoder and seed.

Open

4) Fragmentation or contamination

Symptoms: facts exist but never show; duplicates or stale shards; inconsistent analyzers by batch.

Actions

  • Rebuild a clean index with a single write path.
  • Stamp index_hash, log embedding model id and normalization.
  • Run a small gold set to verify recall.

Open


Guardrails to install in any pipeline

Write path

  • One tokenizer and analyzer spec. Log it.
  • One embedding model and pooling policy. Log it.
  • Chunk window and overlap recorded in metadata.
  • Field schema: doc_id, section_id, snippet_id, source_url, offsets, tokens, index_hash, embed_model, analyzer.

Read path

  • Same analyzer, same normalization.
  • k sweep at 5, 10, 20 for ΔS probes.
  • Deterministic tiebreak on (score, section_id, snippet_id).

Prompt contract

  • Cite first, then explain.
  • Enforce JSON with citations and λ state.
  • Forbid cross-section reuse unless allowed.

Specs


Copy-paste prompt block for the reasoning step

You have TXTOS and the WFGY Problem Map loaded.

Retrieval inputs:
- question: "{Q}"
- k sweep results: {k5:..., k10:..., k20:...}
- citations: [{snippet_id, section_id, source_url, offsets, tokens}, ...]

Do:
1) Validate cite-then-explain. If any citation is missing or mismatched, return the failing field and stop.
2) Report ΔS(question, retrieved) and λ state. If ΔS ≥ 0.60 or λ divergent, return the minimal structural fix:
   - metric/analyzer parity
   - hybrid weighting and rerank
   - traceability schema
3) Output JSON:
   { "answer": "...", "citations": [...], "ΔS": 0.xx, "λ": "<state>", "next_fix": "<page to open>" }
Keep it auditable and short.

Evaluation loop

  • Gold questions per section: 3 to 5
  • For each question: run 3 paraphrases, 2 seeds
  • Metrics to log: coverage, ΔS, λ, recall@k, MAP@k, citation match rate
  • Recipes → retrieval_eval_recipes.md

Store-specific adapters

If a symptom points to a store quirk or feature gap, jump here:


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars