vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-26 10:40:55 +00:00

onestardao c3075fb1f2 sync footer navigation (remove clinics, align PM versions)

2026-03-06 12:46:37 +00:00

11 KiB

Raw Permalink Blame History

Rerankers — Ordering Control and Stability

🧭 Quick Return to Map

You are in a sub-page of Retrieval.
To reorient, go back here:

Retrieval — information access and knowledge lookup

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Use rerankers when recall is fine but the top hits are mis-ordered, unstable, or biased toward the wrong metric. This page shows listwise and pairwise recipes, fusion knobs, and stability fences you can drop into any stack.

References you may want open already:
RAG Architecture & Recovery · Retrieval Playbook · Retrieval Traceability · Data Contracts · Embedding ≠ Semantic · Query Parsing Split · Vectorstore Fragmentation

Acceptance targets

ΔS(question, top1.text) ≤ 0.45
Anchor coverage of the final topk ≥ 0.70
Kendall τ against gold ranking improves by ≥ 0.20 over baseline bi-encoder order
λ remains convergent across 3 paraphrases and 2 seeds

If ΔS sits in 0.40 to 0.60 and τ gains are small, fix chunking or metric before adding complexity.

Symptoms → exact fix

Symptom	Likely cause	Open this fix
Correct passage appears in top20 but not in top3	wrong ordering after recall	Retrieval Playbook, this page
Topk flips between identical runs	non-deterministic tie breaks or LLM variance	Retrieval Traceability
BM25 beats dense when queries are abstractive	fusion uncalibrated or query parsing split	Query Parsing Split
Duplicated near-identical hits crowd out diversity	no MMR or section-aware penalties	this page (MMR recipe)
Great similarity, wrong meaning	metric mismatch at index time	Embedding ≠ Semantic
Hits vanish after ingest or rebuild	fragmented store, mixed analyzers	Vectorstore Fragmentation

Strategy 1: Cross-encoder reranker (robust default)

When you have bi-encoder recall and need precise order.
Why cross-encoders read the full pair (q, passage) and recover semantics lost by embeddings.

Deterministic sort key

sort_key = (-score_ce, section_priority, snippet_id, start_offset)

Keep the tie-break stable so pagination and caching never reshuffle results.

Minimal pipeline

# Pseudocode only
candidates = dense_recall(q, k=50) + bm25_recall(q, k=50)  # union then dedupe by snippet_id
scored = []
for c in candidates:
    s = cross_encoder.score(q, c.text)  # e.g., monoT5, E5-mistral-ce, etc.
    scored.append({**c, "score_ce": s})

# diversity
scored = mmr(q, scored, lambda_rank="score_ce", alpha=0.7)  # see MMR recipe below

# deterministic order
ordered = sorted(scored, key=lambda x: (-x["score_ce"], x["section_priority"], x["snippet_id"], x["offsets"][0]))
topk = ordered[:k]

Strategy 2: LLM-as-reranker with schema locks

Use an LLM to score evidence only. Do not let it answer. Force a strict schema and cite-then-explain in the trace.

Prompt skeleton

Task: score each candidate passage for "is this the best evidence to answer Q".
Return JSON with fields: {id, score in [0,1], why_short}. Do not answer Q.

Q: "<question>"

Candidates:
- id: s001, section_id: A.3, snippet_id: 19, text: "<passage>"
- id: s002, section_id: B.1, snippet_id: 7,  text: "<passage>"
...
Scoring rubric:
1) directness to the likely anchor section,
2) presence of atomic facts that must be cited,
3) low ambiguity, low cross-topic bleed.

Output JSON list only.

Variance controls

Fix the model, temperature 0, seed fixed if provider supports it.
Add BBAM clamp in the system preface to keep λ convergent.
Keep the rubric short and stable across runs.

Strategy 3: Fusion that behaves

RRF (reciprocal rank fusion)

s_fused = Σ_m 1 / (k0 + rank_m), with k0 around 60 for top100 feeds. RRF is robust when scores are not comparable.

Z-score fusion

Normalize each retriever to zero mean and unit variance then sum. Good when score ranges are stable over time.

Two-stage order

union and dedupe by (section_id, snippet_id)
fast fusion to top50
cross-encoder or LLM rerank to topk

Strategy 4: Diversity with MMR

Maximal marginal relevance avoids redundant hits and expands anchor coverage.

mmr(q, items, lambda_rank="score", alpha=0.7):
  S = []
  while len(S) < k:
    select x that maximizes alpha * rel(q, x) - (1 - alpha) * max_sim(x, S)
  return S

Use cosine on embedding space for max_sim.
Penalize items sharing the same section_id unless the anchor spans multiple snippets.
Track coverage per section to avoid starving small but relevant sections.

Stability and observability fences

Log reranker_version, fusion_type, alpha, k0, and index_hash.
Write the final order and why for the topk into the trace.
Freeze prompt headers for LLM rerankers.
Use a single deterministic tiebreak chain as shown above.
Alert when the top1 ΔS drifts by more than 0.10 week over week.

Specs to follow while wiring traces: Retrieval Traceability · Data Contracts

Evaluation that catches the real failures

ΔS(question, top1) and ΔS(top1, anchor)
Kendall τ against a small gold ranking
Hit@k for anchor coverage
Flip rate across 2 seeds and 3 paraphrases
Time budget per query and p95 latency

See recipes: Retrieval Evaluation Recipes

Copy-paste prompt: LLM reranker (listwise)

You have TXT OS and the WFGY Problem Map loaded.

Goal: score passages for evidence quality only. Do not answer the question.

Question: "<q>"

Return a JSON array: [{"id":"...","score":0.00..1.00,"why_short":"..."}].
Scoring considers:
1) directness to the required anchor,
2) atomic facts present,
3) low ambiguity and low bleed from other topics.

If two are equal, prefer the one with clearer citation spans.

When to escalate

Rerankers improve τ but ΔS remains high: rebuild metric, analyzer, and window. Open: Embedding ≠ Semantic and Chunking Checklist
Ordering still flips across runs or deployments: inspect schema drift and boot sequencing. Open: Retrieval Traceability, Bootstrap Ordering, Pre-Deploy Collapse

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

11 KiB Raw Permalink Blame History Unescape Escape