WFGY/ProblemMap/GlobalFixMap/RAG_VectorDB/hybrid_retriever_weights.md
2025-09-01 11:01:51 +08:00

9.8 KiB
Raw Blame History

Hybrid Retriever Weights — Guardrails and Fix Pattern

Use this page when hybrid retrieval underperforms a single retriever or when results look noisy after fusing BM25, dense vectors, HyDE, or filters. Failures usually come from score scale mismatch, duplicate dominance, or query-type priors not reflected in weights.


Open these first


Core acceptance

  • ΔS(question, retrieved) ≤ 0.45 on 3 paraphrases and 2 seeds.
  • Coverage ≥ 0.70 to the target section after fusion and rerank.
  • λ remains convergent when weights are perturbed within ±10 percent.
  • Jaccard overlap against the best single retrievers top-k ≥ 0.60.
  • No single source type or domain exceeds 40 percent of the final top-k unless configured.

Symptoms → likely cause → open this


Fix in 60 seconds

  1. Normalize each retrievers scores inside the candidate pool
    Use one of: min-max to 01 per retriever, z-score per retriever, or pure rank-based RRF.

  2. De-duplicate by snippet identity
    Collapse near-duplicates using stable keys: {doc_id, section_id, hash_64}.

  3. Fuse with a simple, auditable rule
    Start with RRF: score = Σ 1 / (rank_i + k) with k ∈ [50, 100].
    Then try weighted sum on normalized scores: S = wdense*sdense + wbm25*sbm25 + whyde*shyde.

  4. Rerank with a cross-encoder
    Rerank top 50100 to top 1020. Enforce cite-then-explain in the prompt.

  5. Measure ΔS and λ
    If λ flips when weights move by ±10 percent, clamp with BBAM and lock schema headers.


Minimal reference recipe


retrievers:

* name: dense
  k: 60
  norm: z
  weight: 0.55
* name: bm25
  k: 200
  norm: rank   # convert to ranks 1..k
  weight: 0.35
* name: hyde
  k: 60
  norm: z
  weight: 0.10

fusion:
method: RRF
rrf\_k: 60
dedupe: snippet\_id  # or doc\_id+section\_id+hash64

rerank:
model: cross-encoder-v2
take\_top: 15

accept:
deltaS\_max: 0.45
coverage\_min: 0.70
jitter\_weight: 0.10   # weights +/- 10 percent must keep λ convergent


Weighting heuristics that actually work

  • Short factual queries
    Increase dense weight to 0.60.7. Keep BM25 at 0.30.4. HyDE optional.

  • Long verbose queries or code
    Push BM25 to 0.5. Keep dense at 0.4. Use reranker to clean length bias.

  • Multilingual or mixed-script
    Reduce BM25 weight if tokenizer mismatch is suspected. Verify casing and analyzer.

  • Highly structured data
    Use BM25 boost on fielded terms. Keep dense for semantic recall.

  • Safety or policy queries
    HyDE at most 0.15. Prefer deterministic BM25 plus strict reranker.


Observability probes you must log

  • Per retriever: raw score mean and stdev before normalization.
  • After fusion: source mix histogram and duplicate collapse count.
  • ΔS(question, retrieved) and λ states at steps: retrieve, fuse, rerank, answer.
  • A/B against best single retriever and report ΔS improvement or regression.

Common gotchas

  • Mixing cosine dense scores with BM25 raw scores without normalization.
  • HyDE prompts built with a different tokenizer than the dense model.
  • Reranker trained on passages while you fuse at document level.
  • Language-specific analyzers differ across shards and you fuse their outputs.
  • Latency cutoffs truncate candidate lists unevenly and bias the fusion.

Verification

  • Gold set of 100 queries with 3 paraphrases.
  • Require ΔS ≤ 0.45 and coverage ≥ 0.70 after fusion plus rerank.
  • Jaccard with best single retriever ≥ 0.60.
  • Weight jitter ±10 percent must keep λ convergent and citations stable.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame
Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow