WFGY/ProblemMap/GlobalFixMap/RAG_VectorDB/hybrid_retriever_weights.md

9.1 KiB
Raw Blame History

Hybrid Retriever Weights — Guardrails and Fix Pattern

🧭 Quick Return to Map

You are in a sub-page of RAG_VectorDB.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Use this page when hybrid retrieval underperforms a single retriever or when results look noisy after fusing BM25, dense vectors, HyDE, or filters. Failures usually come from score scale mismatch, duplicate dominance, or query-type priors not reflected in weights.


Open these first


Core acceptance

  • ΔS(question, retrieved) ≤ 0.45 on 3 paraphrases and 2 seeds.
  • Coverage ≥ 0.70 to the target section after fusion and rerank.
  • λ remains convergent when weights are perturbed within ±10 percent.
  • Jaccard overlap against the best single retrievers top-k ≥ 0.60.
  • No single source type or domain exceeds 40 percent of the final top-k unless configured.

Symptoms → likely cause → open this


Fix in 60 seconds

  1. Normalize each retrievers scores inside the candidate pool
    Use one of: min-max to 01 per retriever, z-score per retriever, or pure rank-based RRF.

  2. De-duplicate by snippet identity
    Collapse near-duplicates using stable keys: {doc_id, section_id, hash_64}.

  3. Fuse with a simple, auditable rule
    Start with RRF: score = Σ 1 / (rank_i + k) with k ∈ [50, 100].
    Then try weighted sum on normalized scores: S = wdense*sdense + wbm25*sbm25 + whyde*shyde.

  4. Rerank with a cross-encoder
    Rerank top 50100 to top 1020. Enforce cite-then-explain in the prompt.

  5. Measure ΔS and λ
    If λ flips when weights move by ±10 percent, clamp with BBAM and lock schema headers.


Minimal reference recipe


retrievers:

* name: dense
  k: 60
  norm: z
  weight: 0.55
* name: bm25
  k: 200
  norm: rank   # convert to ranks 1..k
  weight: 0.35
* name: hyde
  k: 60
  norm: z
  weight: 0.10

fusion:
method: RRF
rrf\_k: 60
dedupe: snippet\_id  # or doc\_id+section\_id+hash64

rerank:
model: cross-encoder-v2
take\_top: 15

accept:
deltaS\_max: 0.45
coverage\_min: 0.70
jitter\_weight: 0.10   # weights +/- 10 percent must keep λ convergent


Weighting heuristics that actually work

  • Short factual queries
    Increase dense weight to 0.60.7. Keep BM25 at 0.30.4. HyDE optional.

  • Long verbose queries or code
    Push BM25 to 0.5. Keep dense at 0.4. Use reranker to clean length bias.

  • Multilingual or mixed-script
    Reduce BM25 weight if tokenizer mismatch is suspected. Verify casing and analyzer.

  • Highly structured data
    Use BM25 boost on fielded terms. Keep dense for semantic recall.

  • Safety or policy queries
    HyDE at most 0.15. Prefer deterministic BM25 plus strict reranker.


Observability probes you must log

  • Per retriever: raw score mean and stdev before normalization.
  • After fusion: source mix histogram and duplicate collapse count.
  • ΔS(question, retrieved) and λ states at steps: retrieve, fuse, rerank, answer.
  • A/B against best single retriever and report ΔS improvement or regression.

Common gotchas

  • Mixing cosine dense scores with BM25 raw scores without normalization.
  • HyDE prompts built with a different tokenizer than the dense model.
  • Reranker trained on passages while you fuse at document level.
  • Language-specific analyzers differ across shards and you fuse their outputs.
  • Latency cutoffs truncate candidate lists unevenly and bias the fusion.

Verification

  • Gold set of 100 queries with 3 paraphrases.
  • Require ΔS ≤ 0.45 and coverage ≥ 0.70 after fusion plus rerank.
  • Jaccard with best single retriever ≥ 0.60.
  • Weight jitter ±10 percent must keep λ convergent and citations stable.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars