# 🔎 Retrieval Playbook — Practical, Measurable, Fix-first > The goal: **consistent, explainable** retrieval that makes reasoning easy. > This playbook gives you a **minimal, testable** setup across OCR → chunk → embed → index → retrieve → prompt, with **failure probes** and **repair steps**. No hype—only what ships. --- > **Quick Nav** > [OCR/Parsing Checklist](./ocr-parsing-checklist.md) · > [Chunking Checklist](./chunking-checklist.md) · > [Embedding vs Semantic](./embedding-vs-semantic.md) · > [Traceability](./retrieval-traceability.md) · > [Rerankers](./rerankers.md) · > Patterns: [Query Parsing Split](./patterns/pattern_query_parsing_split.md) · > [Vectorstore Fragmentation](./patterns/pattern_vectorstore_fragmentation.md) · > [Symbolic Constraint Unlock](./patterns/pattern_symbolic_constraint_unlock.md) --- ## 0) Executive summary 1. **Generate clean candidates** (dense/sparse/hybrid) with traceable IDs. 2. **Measure** with ΔS + recall@k before adding complexity. 3. **Add reranking** only when first-stage recall is **consistently ≥0.85** for your task. 4. **Lock prompt schema** (cite → explain) and **forbid cross-source merges**. 5. **Regression-guard** with small golden sets (10–50 Q/A). --- ## 1) Candidate generation (first-stage) ### 1.1 Choose a primary retriever - **Dense (embeddings)**: good default for semantic matches across paraphrases. - **Sparse (BM25/SPLADE)**: strong for exact terms, code, and rare tokens. - **Hybrid**: *reciprocal rank fusion (RRF)* of dense + sparse is robust to query style. **Rule of thumb** - Start **dense** for general docs; add **BM25** for code/legal/IDs. - If hybrid hurts recall, check **tokenization & analyzer drift** → see *Query Parsing Split*. ### 1.2 Index hygiene (FAISS/Elasticsearch/Qdrant/Chroma) - **Normalize** vectors on **both write & read** if using cosine. - Pin the **metric type** (cosine vs inner product) in code & metadata. - Persist **doc_id / section_id / line_span** with each vector. - Verify **index cardinality = sum(chunks)**; add a one-liner count check in CI. --- ## 2) Minimal reference pipelines ### 2.1 Python — FAISS (dense) + BM25 (hybrid via RRF) ```python # pip install sentence-transformers rank_bm25 faiss-cpu numpy from sentence_transformers import SentenceTransformer from rank_bm25 import BM25Okapi import faiss, numpy as np # 1) data chunks = [...] # list[str], pre-chunked sentences/sections meta = [...] # list[dict], each with {doc_id, section_id, span} # 2) encoder (cosine → L2-normalize) enc = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") X = enc.encode(chunks, normalize_embeddings=True) index = faiss.IndexFlatIP(X.shape[1]) # inner product == cosine on normalized vectors index.add(X.astype(np.float32)) # 3) sparse side tokenized = [c.split() for c in chunks] bm25 = BM25Okapi(tokenized) def rrf(ranks, k=60): # ranks: list of lists of (idx, rank_position starting at 1) from collections import defaultdict score = defaultdict(float) for rr in ranks: for idx, rp in rr: score[idx] += 1.0 / (k + rp) return sorted(score.items(), key=lambda x: -x[1]) def search(query, topk_dense=50, topk_sparse=50, out_k=20): qv = enc.encode([query], normalize_embeddings=True).astype(np.float32) d_s, d_i = index.search(qv, topk_dense) # ranks as (idx, rankpos) dense_rank = [(int(i), r+1) for r, i in enumerate(d_i[0])] sparse_rank = [] for r, idx in enumerate(bm25.get_top_n(query.split(), list(range(len(chunks))), n=topk_sparse)): sparse_rank.append((idx, r+1)) fused = rrf([dense_rank, sparse_rank]) out = [] for idx, _ in fused[:out_k]: out.append({"text": chunks[idx], "meta": meta[idx], "source":"hybrid"}) return out res = search("how to reset billing cycle") ```` **Sanity checks** * After encoding: assert `np.abs(np.linalg.norm(X[0]) - 1.0) < 1e-3` * After add: `index.ntotal == len(chunks)` * For sparse-only: if **IDs/code** outperform dense, keep hybrid. ### 2.2 Node (TypeScript) — Elastic BM25 + Dense store ```ts // pnpm add @elastic/elasticsearch @huggingface/inference cosine-similarity import { Client } from "@elastic/elasticsearch"; import { HfInference } from "@huggingface/inference"; import cosine from "cosine-similarity"; const es = new Client({ node: process.env.ES_URL! }); const hf = new HfInference(process.env.HF_TOKEN!); // 1) BM25 query async function bm25(query: string, topk=50) { const { hits } = await es.search({ index: "chunks", size: topk, query: { match: { text: { query, operator: "and" } } }, _source: ["text", "doc_id", "section_id", "span"] }); return hits.hits.map((h, i) => ({ id: h._id, score: hits.max_score, r:i+1, src: h._source })); } // 2) Dense side (use same model for write+read) async function embed(s: string) { const out = await hf.featureExtraction({ model: "sentence-transformers/all-MiniLM-L6-v2", inputs: s }); // L2 normalize const v = (out as number[]); const norm = Math.sqrt(v.reduce((a,b)=>a+b*b,0)); return v.map(x=>x/norm); } ``` > Keep dense vectors in your KV/DB keyed by `chunk_id`; at query time compute cosine vs a small candidate pool (top-200 BM25), then RRF. --- ## 3) Retrieval observability (ΔS & λ) * **ΔS(question, retrieved)** = `1 − cos(I, G)` with **I** = retrieved snippet embedding, **G** = anchor (title/expected section or gold answer). * **Thresholds**: `<0.40` stable · `0.40–0.60` transitional · `≥0.60` action. * **λ states**: `→` convergent · `←` divergent · `<>` recursive · `×` chaotic. **Probe recipe** 1. Vary `k ∈ {5, 10, 20}`; plot ΔS vs k. 2. If curve **flat & high** → metric/normalization/index mismatch. 3. If **sharp drop** at higher k → retriever filter too strict; consider MMR or hybrid. --- ## 4) Prompt assembly: cite → explain (lock constraints) * Keep **per-source fences** (no cross-source merges). * **Order**: *system → task → constraints → citations → answer*. * Force **cite-first**; explanation **must reference** citation IDs/lines. * See: [Traceability](./retrieval-traceability.md) and *SCU Pattern*. --- ## 5) Reranking: when & how Add a reranker only if: * **Recall\@50 ≥ 0.85**, but Top-5 precision is weak, or * You need **tight citation alignment** across near-duplicates. Start with: * **Cross-encoder** (bge-reranker-mini/base) for accuracy; * Or **LLM rerank** for low volume, high precision needs. See: [Rerankers](./rerankers.md). --- ## 6) Acceptance criteria * **Retrieval sanity**: ΔS(question, top-ctx) ≤ 0.45, coverage ≥ 0.70 of target section. * **Traceability**: snippet ↔ citation table reproducible. * **Stability**: same inputs over 3 paraphrases keep λ → convergent. * **No SCU**: who-said-what does not merge across sources. --- ## 7) Common failures → repair | Symptom | Likely cause | Fix | | -------------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | Hybrid worse than single | **Analyzer/tokenizer split** | Align analyzers; log per-retriever queries; see [Query Parsing Split](./patterns/pattern_query_parsing_split.md) | | Some facts never retrieved | **Fragmented store / id skew** | Rebuild + shard audit; see [Vectorstore Fragmentation](./patterns/pattern_vectorstore_fragmentation.md) | | Citations cross-bleed | **Prompt schema unlocked** | Per-source fences + cite-first; see [SCU](./patterns/pattern_symbolic_constraint_unlock.md) | | ΔS flat & high vs k | **Metric/normalization mismatch** | Normalize embeddings; pin FAISS metric; see [Embedding vs Semantic](./embedding-vs-semantic.md) | --- ## 8) Tiny gold set (do this!) Create **10–50** realistic Q/A with citation lines. Commit a `goldset.jsonl`: ```json {"q":"How to reset billing cycle?","doc_id":"a","section":"billing","lines":[120,145],"a":"..."} ``` Run **recall\@50**, **nDCG\@10**, and **ΔS** on each PR. --- ### 🔗 Quick-Start Downloads (60 sec) | Tool | Link | 3-Step Setup | |------|------|--------------| | **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + \” | | **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly | --- ### 🧭 Explore More | Module | Description | Link | |-----------------------|----------------------------------------------------------|----------| | WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) | | Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) | | Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) | | Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) | | Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) | | Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) | | 🧙‍♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | [Start →](https://github.com/onestardao/WFGY/blob/main/StarterVillage/README.md) | --- > 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** — > Engineers, hackers, and open source builders who supported WFGY from day one. > GitHub stars ⭐ [WFGY Engine 2.0](https://github.com/onestardao/WFGY/blob/main/core/README.md) is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)   [![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)   [![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)   [![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)   [![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)   [![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)   [![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)