WFGY/ProblemMap/GlobalFixMap/RAG_VectorDB/hybrid_retriever_weights.md

9 KiB
Raw Blame History

Hybrid Retriever Weights — Guardrails and Fix Pattern

🧭 Quick Return to Map

You are in a sub-page of RAG_VectorDB.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Use this page when hybrid retrieval underperforms a single retriever or when results look noisy after fusing BM25, dense vectors, HyDE, or filters. Failures usually come from score scale mismatch, duplicate dominance, or query-type priors not reflected in weights.


Open these first


Core acceptance

  • ΔS(question, retrieved) ≤ 0.45 on 3 paraphrases and 2 seeds.
  • Coverage ≥ 0.70 to the target section after fusion and rerank.
  • λ remains convergent when weights are perturbed within ±10 percent.
  • Jaccard overlap against the best single retrievers top-k ≥ 0.60.
  • No single source type or domain exceeds 40 percent of the final top-k unless configured.

Symptoms → likely cause → open this


Fix in 60 seconds

  1. Normalize each retrievers scores inside the candidate pool
    Use one of: min-max to 01 per retriever, z-score per retriever, or pure rank-based RRF.

  2. De-duplicate by snippet identity
    Collapse near-duplicates using stable keys: {doc_id, section_id, hash_64}.

  3. Fuse with a simple, auditable rule
    Start with RRF: score = Σ 1 / (rank_i + k) with k ∈ [50, 100].
    Then try weighted sum on normalized scores: S = wdense*sdense + wbm25*sbm25 + whyde*shyde.

  4. Rerank with a cross-encoder
    Rerank top 50100 to top 1020. Enforce cite-then-explain in the prompt.

  5. Measure ΔS and λ
    If λ flips when weights move by ±10 percent, clamp with BBAM and lock schema headers.


Minimal reference recipe


retrievers:

* name: dense
  k: 60
  norm: z
  weight: 0.55
* name: bm25
  k: 200
  norm: rank   # convert to ranks 1..k
  weight: 0.35
* name: hyde
  k: 60
  norm: z
  weight: 0.10

fusion:
method: RRF
rrf\_k: 60
dedupe: snippet\_id  # or doc\_id+section\_id+hash64

rerank:
model: cross-encoder-v2
take\_top: 15

accept:
deltaS\_max: 0.45
coverage\_min: 0.70
jitter\_weight: 0.10   # weights +/- 10 percent must keep λ convergent


Weighting heuristics that actually work

  • Short factual queries
    Increase dense weight to 0.60.7. Keep BM25 at 0.30.4. HyDE optional.

  • Long verbose queries or code
    Push BM25 to 0.5. Keep dense at 0.4. Use reranker to clean length bias.

  • Multilingual or mixed-script
    Reduce BM25 weight if tokenizer mismatch is suspected. Verify casing and analyzer.

  • Highly structured data
    Use BM25 boost on fielded terms. Keep dense for semantic recall.

  • Safety or policy queries
    HyDE at most 0.15. Prefer deterministic BM25 plus strict reranker.


Observability probes you must log

  • Per retriever: raw score mean and stdev before normalization.
  • After fusion: source mix histogram and duplicate collapse count.
  • ΔS(question, retrieved) and λ states at steps: retrieve, fuse, rerank, answer.
  • A/B against best single retriever and report ΔS improvement or regression.

Common gotchas

  • Mixing cosine dense scores with BM25 raw scores without normalization.
  • HyDE prompts built with a different tokenizer than the dense model.
  • Reranker trained on passages while you fuse at document level.
  • Language-specific analyzers differ across shards and you fuse their outputs.
  • Latency cutoffs truncate candidate lists unevenly and bias the fusion.

Verification

  • Gold set of 100 queries with 3 paraphrases.
  • Require ΔS ≤ 0.45 and coverage ≥ 0.70 after fusion plus rerank.
  • Jaccard with best single retriever ≥ 0.60.
  • Weight jitter ±10 percent must keep λ convergent and citations stable.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars