Update and Index Skew: Guardrails and Fix Patterns

🧭 Quick Return to Map

You are in a sub-page of Embeddings.
To reorient, go back here:

Embeddings — vector representations and semantic search

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A repair guide for pipelines where fresh content does not show up, shards disagree after a redeploy, or recall drops right after a routine job. Use this page to localize drift between ingestion, embedding, and index structures, then lock ordering and verify with ΔS, coverage, and λ.

Open these first

Visual map and recovery: rag-architecture-and-recovery.md
End to end retrieval knobs: retrieval-playbook.md
Traceability and cite first: retrieval-traceability.md
Payload schema and ingestion locks: data-contracts.md
Metric or FAISS traps: vectorstore-metrics-and-faiss-pitfalls.md
Boot order and deploy failures: bootstrap-ordering.md · deployment-deadlock.md · predeploy-collapse.md
Live ops and debugging: ops/live_monitoring_rag.md · ops/debug_playbook.md

When to use this page

New docs appear in object store but not in retrieval
Some tenants or shards recall fine while others look stale
After a redeploy, recall falls or top k order flips
Index reports healthy yet coverage to anchors is low
ANN rebuild completes but neighbor order looks random

Acceptance targets

ΔS(question, retrieved) ≤ 0.45
Coverage of the target section ≥ 0.70
λ remains convergent across three paraphrases and two seeds
E_resonance stays flat on long windows

Symptom to likely cause

Fresh content missing for hours Likely cause: ingestion watermark stuck or write path non idempotent. Open bootstrap-ordering.md.
One shard good and another flat Likely cause: mixed analyzer or model rev, or ANN params diverged. Open retrieval-playbook.md.
Recall drops after nightly job Likely cause: index rebuilt with a different metric or normalization policy. Open vectorstore-metrics-and-faiss-pitfalls.md.
Top k flips after deploy Likely cause: header reorder or citation schema drift that amplifies store skew. Open retrieval-traceability.md.
Writes succeed but queries truncate or pad vectors Likely cause: dimension mismatch after model swap. Open dimension_mismatch_and_projection.md.

Fix in 60 seconds

Read the watermarks For each stage write a simple count and last processed id or time. Compare DOC_COUNT, EMB_COUNT, IDX_COUNT. Any gap indicates skew.
Pin versions and abort on mismatch Ingest refuses rows if any of these differ from the contract or store metadata: embed_model, embed_rev, dim, metric, normalize_l2, analyzer_rev, ann_rev, index_hash. See data-contracts.md.
Rebuild the broken segment Re-embed and re-index the affected shard or time window. Retrain ANN and PQ on the new vectors. Do not reuse old graphs.
Clamp λ on the prompt side Use citation first and fixed header order to avoid prompt variance while you repair the store. See retrieval-traceability.md.
Verify Three paraphrases and two seeds. Require coverage ≥ 0.70 and ΔS ≤ 0.45 on the gold anchors.

Root causes checklist

Non idempotent upserts by (doc_id, section_id, rev)
Background jobs race with live writers
Mixed embed_model or normalize_l2 across namespaces
ANN params not retrained after rebuild
Analyzer or tokenizer version differs across shards
TTL or retention silently dropped sections
Partial deploy cut over while index still training
Streaming path uses a different preprocessor than batch

Minimal probes

Probe A — watermark audit
For each stage {ingest, embed, index}:
  read COUNT and LAST_TS
Expect ingest ≥ embed ≥ index with small gaps. Any large gap is skew.

Probe B — version parity
Sample 1k rows per shard and tabulate:
  embed_model, embed_rev, dim, metric, normalize_l2, analyzer_rev, ann_rev
Any heterogeneity inside one collection is a fail.

Probe C — recall delta
Run the same 50 gold queries before and after shard rebuild.
Require coverage gain ≥ 0.10 if the shard was failing.

Probe D — ANN sanity
Toggle reranker on and off at k=20.
If reranker recovers most anchors while base k misses, retrain ANN or rebuild.

Contract fields to add

{
  "doc_id": "stable",
  "section_id": "stable",
  "rev": "v2025-08-28",
  "ingest_ts": "2025-08-28T10:42:00Z",
  "embed_model": "exact-id",
  "embed_rev": "hash-or-date",
  "dim": 768,
  "metric": "cosine",
  "normalize_l2": true,
  "analyzer_rev": "text-preproc-v3",
  "ann_index": "hnsw",
  "ann_rev": "hnsw_v5",
  "index_hash": "sha256:...",
  "partition": "tenant_a|shard_03",
  "write_path": "batch|stream",
  "tombstone": false
}

Operational guardrails

Single writer per partition and idempotent upsert
Preflight that halts when store.metric != contract.metric or dim mismatches
Blue green or shadow collection for any rebuild, with union retriever and deterministic rerank during cutover
Scheduled drift sweep that compares watermarks and ΔS across partitions
Alerts on ΔS ≥ 0.60 or λ flip rate spikes on live traffic

Verification checklist

Coverage ≥ 0.70 and ΔS ≤ 0.45 on a ten question gold set
λ convergent across two seeds and three paraphrases
Top k overlap across seeds ≥ 0.8 after the fix
Watermarks aligned for ingest, embed, and index within your SLO window

Copy paste prompt for the LLM step

TXT OS and the WFGY Problem Map are loaded.

My issue: updates not reflected or recall dropped after a job.
Traces:
- watermarks: ingest=..., embed=..., index=...
- versions: embed_model=..., embed_rev=..., metric=..., ann_rev=...
- ΔS(question,retrieved)=..., coverage=..., λ across 3 paraphrases

Tell me:
1) the failing layer and why,
2) the exact WFGY page to open next,
3) the minimal structural fix to remove skew and pass targets,
4) a short verification plan for coverage ≥ 0.70 and ΔS ≤ 0.45.
Use BBMC, BBCR, BBPF, BBAM when relevant.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

10 KiB Raw Blame History Unescape Escape