# 🧮 Rerankers — When to Use Them, How to Tune, How to Prove It
Reranking boosts **precision@k** by re-scoring a candidate set from first-stage retrieval.
Used correctly, it **tightens citations** and reduces “looks-right-but-wrong” answers. Used blindly, it burns latency & money.
---
> **Quick Nav**
> [Retrieval Playbook](./retrieval-playbook.md) ·
> [Embedding vs Semantic](./embedding-vs-semantic.md) ·
> [Traceability](./retrieval-traceability.md) ·
> Patterns: [Query Parsing Split](./patterns/pattern_query_parsing_split.md) ·
> [Symbolic Constraint Unlock](./patterns/pattern_symbolic_constraint_unlock.md)
---
## 0) TL;DR — Decision table
| Situation | Use | Why |
|---|---|---|
| First-stage **recall@50 < 0.85** | **Do NOT** add reranker yet | You’re promoting the wrong pool; fix candidate generation first |
| Recall is good but **Top-5 irrelevant** | Add **cross-encoder** reranker | Cross-attends Q–D; best precision |
| Need **tight citations** across near-duplicates | Cross-encoder or **ColBERT** style | Fine-grained token interactions |
| Very low volume, high stakes | **LLM-as-reranker** | Expensive but accurate, great for audits |
| High QPS, tight budget | **Light cross-encoder** (mini) or **linear fusion** | 80/20 precision for minimal cost |
---
## 1) Families of rerankers
1. **Cross-encoder** (e.g., bge-reranker, ms-marco MiniLM)
- Jointly encodes **[query ⊕ doc]**; outputs a relevance score.
- **Pros**: best precision; **Cons**: O(k) forward passes.
2. **Late-interaction** (e.g., ColBERT-style)
- Token-level max-sim interactions; faster than full cross-enc.
- **Pros**: scalable; **Cons**: infra heavier than CE.
3. **LLM-as-reranker**
- Ask model to score or rank candidates with a schema.
- **Pros**: reasoning-aware; **Cons**: latency & cost; needs a **strict judging prompt**.
**Start point**: cross-encoder mini/base → upgrade if needed.
---
## 2) Minimal implementations
### 2.1 Python — Cross-encoder (bge-reranker)
```python
# pip install FlagEmbedding
from FlagEmbedding import FlagReranker
rerank = FlagReranker('BAAI/bge-reranker-base', use_fp16=True)
def rerank_topk(query, candidates, out_k=10):
# candidates: list[{"text":..., "meta":{...}}]
pairs = [(query, c["text"]) for c in candidates]
scores = rerank.compute_score(pairs, normalize=True)
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
out = []
for c, s in ranked[:out_k]:
c["rerank_score"] = float(s)
c["source"] = c.get("source","") + "|ce"
out.append(c)
return out
````
**Tips**
* Use **normalize=True** for score comparability across batches.
* Batch size 16–64 depending on VRAM/CPU.
### 2.2 Node — LLM-as-reranker (schema-locked)
```ts
// Example sketch using any chat LLM SDK
const SYSTEM = `You are a strict retrieval judge.
Return JSON array of {id,score,reason} with score in [0,1].
Score by factual support for the query; do not invent.`;
function judgingPrompt(query: string, cands: {id:string,text:string}[]) {
const body = cands.map((c,i)=>`[${i}] id=${c.id}\n${c.text}`).join("\n\n");
return `Query: ${query}\n\nCandidates:\n${body}\n\nRules:\n- Cite terms that match\n- Penalize off-topic\n- Prefer exact sections\n\nNow return JSON only.`;
}
// call your LLM and parse JSON;
// accept top-k with score ≥ threshold and keep justification in logs.
```
**Guardrails**
* **JSON-only** response.
* Enforce **max tokens** and refuse long doc bodies (pass snippets only).
* Never let LLM **rewrite** the snippet; judge only.
---
## 3) Tuning knobs that actually matter
* **Candidate pool size (`k_in`)**: 50–200 typical. Small pool → missed gold; huge pool → latency.
* **Output size (`k_out`)**: 5–20. For grounded QA, 6–8 is a sweet spot.
* **Score calibration**: Normalize CE outputs to `[0,1]`; keep **per-query z-scores** for audit.
* **Hybrid gate**: If BM25 and dense disagree drastically, log both top-5 and check [Query Parsing Split](./patterns/pattern_query_parsing_split.md).
* **Dedup by doc/section**: Keep at most **N** chunks per section to avoid overfitting to near-duplicates.
---
## 4) Verification (don’t skip)
**Metrics**
* **nDCG\@10**, **MRR\@10**, **Recall\@50**, and **ΔS(question, top-ctx)**.
* Expect **ΔS ≤ 0.45** after rerank on accepted top-ctx.
* Track **citation hit rate** (does the final answer cite a reranked chunk?).
**A/B checklist**
1. Freeze the first-stage retriever.
2. Compare **with vs without** reranker on the same gold set.
3. Record latency p95 and cost/query.
4. If nDCG\@10 ↑ < **+0.05** but latency doubles → not worth it.
---
## 5) Failure modes → fixes
| Symptom | Likely cause | Fix |
| ---------------------------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| Reranker prefers off-topic “fluent” text | Judge prompt vague / CE miscalibrated | Tighten judging schema; penalize missing query terms; normalize scores |
| Great demo, but prod recall tanks | k\_in too small / drift | Increase k\_in to 100–200; re-check recall\@50 |
| Citations merge across sources | Prompt schema unlocked | Enforce per-source fences; see [SCU](./patterns/pattern_symbolic_constraint_unlock.md) |
| Hybrid suddenly worse than dense | Tokenizers diverged | Align analyzers; log per-retriever queries; see [Query Parsing Split](./patterns/pattern_query_parsing_split.md) |
---
## 6) Cost model (back-of-envelope)
* Cross-encoder base: \~**3–6 ms**/doc on A10g-level GPU, slower on CPU.
* For **k\_in=100** and p95 **\~500 ms** on CPU, consider:
* shrink text by **sentence-windowing**,
* use **mini** model,
* pre-filter by **BM25 top-60** then CE top-10.
---
## 7) Acceptance criteria
* **nDCG\@10** improves by **≥ +0.05** vs baseline.
* **Recall\@50** unchanged (±0.02) after adding reranker (candidate pool must remain wide).
* **ΔS(question, top-ctx) ≤ 0.45** and λ stays **convergent** on 3 paraphrases.
* **Traceability**: store `{query, cand_id, pre_score, post_score, reason}`.
---
## 8) Example pipeline glue
```python
def answer(query):
cands = search(query, topk_dense=80, topk_sparse=80, out_k=60) # from retrieval-playbook
reranked = rerank_topk(query, cands, out_k=8) # CE/LLM reranker
prompt = build_prompt(query, reranked) # cite → explain, fenced by section
return call_llm(prompt)
```
* **Do not** exceed **8–10** context chunks for QA—precision collapses after that.
* Always **log** which reranker selected which chunk.
---
### 🧭 Explore More
| Module | Description | Link |
|-----------------------|----------------------------------------------------------|----------|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | [Start →](https://github.com/onestardao/WFGY/blob/main/StarterVillage/README.md) |
---
> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —
> Engineers, hackers, and open source builders who supported WFGY from day one.
> ⭐ [WFGY Engine 2.0](https://github.com/onestardao/WFGY/blob/main/core/README.md) is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).