WFGY/ProblemMap/GlobalFixMap/Retrieval/retrieval-playbook.md
2025-09-05 11:47:22 +08:00

12 KiB
Raw Blame History

Retrieval Playbook

🧭 Quick Return to Map

You are in a sub-page of Retrieval.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A practical, store-agnostic playbook to stabilize retrieval quality. Use this page to route symptoms to the right structural fix, apply measurable targets, and keep read/write parity across pipelines.

When to use

  • High similarity yet wrong meaning
  • Missing or unstable citations
  • Hybrid retrieval performs worse than a single retriever
  • Results flip across runs or paraphrases
  • New deploy returns empty or partial context

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 for the intended section
  • λ remains convergent across 3 paraphrases and 2 seeds
  • E_resonance stays flat on long windows

Helpers:


60-second fix path

  1. Probe
    Run ΔS(question, retrieved) at k = 5, 10, 20. Log λ for each paraphrase.
    Tool: deltaS_probes.md

  2. Lock schema
    Enforce cite-then-explain, and require snippet_id, section_id, source_url, offsets, tokens.
    Spec: Data Contracts

  3. Repair the failing layer

    • Wrong meaning with high similarity → see Metric and analyzer parity below
    • Missing or shaky citations → install Traceability schema
    • Hybrid worse than single → run Hybrid weighting and Query parsing split
    • Flips across runs → clamp with Rerankers and parity checks
  4. Verify
    Coverage ≥ 0.70 on 3 paraphrases; λ convergent on 2 seeds; ΔS ≤ 0.45.


Root-cause map → exact fixes

1) Metric and analyzer parity

Symptoms: high similarity yet wrong meaning, language or casing skew, mixed punctuation behavior.

Actions

  • Align dense and sparse analyzers. Keep lowercasing, accent fold, token boundaries consistent.
  • Normalize vectors at write and read. Keep pooling identical.
  • Rebuild with explicit metric and dimension logged in traces.

Open

2) Traceability and citation locks

Symptoms: answer looks right but citations are missing, wrong section id, or not reproducible.

Actions

  • Require snippet_id, section_id, source_url, offsets, tokens in every hop.
  • Forbid cross-section reuse unless explicitly whitelisted.
  • Enforce cite-then-explain in prompts.

Open

3) Hybrid retrieval that underperforms

Symptoms: BM25 + dense gives worse order than either alone; relevant docs appear far down; order flips.

Actions

  • Separate query parsing from retrieval. Fix the parse.
  • Weight dense and sparse explicitly. Add a deterministic tiebreak.
  • Add a rerank step with a fixed cross-encoder and seed.

Open

4) Fragmentation or contamination

Symptoms: facts exist but never show; duplicates or stale shards; inconsistent analyzers by batch.

Actions

  • Rebuild a clean index with a single write path.
  • Stamp index_hash, log embedding model id and normalization.
  • Run a small gold set to verify recall.

Open


Guardrails to install in any pipeline

Write path

  • One tokenizer and analyzer spec. Log it.
  • One embedding model and pooling policy. Log it.
  • Chunk window and overlap recorded in metadata.
  • Field schema: doc_id, section_id, snippet_id, source_url, offsets, tokens, index_hash, embed_model, analyzer.

Read path

  • Same analyzer, same normalization.
  • k sweep at 5, 10, 20 for ΔS probes.
  • Deterministic tiebreak on (score, section_id, snippet_id).

Prompt contract

  • Cite first, then explain.
  • Enforce JSON with citations and λ state.
  • Forbid cross-section reuse unless allowed.

Specs


Copy-paste prompt block for the reasoning step

You have TXTOS and the WFGY Problem Map loaded.

Retrieval inputs:
- question: "{Q}"
- k sweep results: {k5:..., k10:..., k20:...}
- citations: [{snippet_id, section_id, source_url, offsets, tokens}, ...]

Do:
1) Validate cite-then-explain. If any citation is missing or mismatched, return the failing field and stop.
2) Report ΔS(question, retrieved) and λ state. If ΔS ≥ 0.60 or λ divergent, return the minimal structural fix:
   - metric/analyzer parity
   - hybrid weighting and rerank
   - traceability schema
3) Output JSON:
   { "answer": "...", "citations": [...], "ΔS": 0.xx, "λ": "<state>", "next_fix": "<page to open>" }
Keep it auditable and short.

Evaluation loop

  • Gold questions per section: 3 to 5
  • For each question: run 3 paraphrases, 2 seeds
  • Metrics to log: coverage, ΔS, λ, recall@k, MAP@k, citation match rate
  • Recipes → retrieval_eval_recipes.md

Store-specific adapters

If a symptom points to a store quirk or feature gap, jump here:


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →

👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow