Poisoning and Contamination — Guardrails and Fix Pattern

🧭 Quick Return to Map

You are in a sub-page of RAG_VectorDB.
To reorient, go back here:

RAG_VectorDB — vector databases for retrieval and grounding

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Use this page when your retriever starts surfacing adversarial or low-trust text that did not exist in your canonical corpus, or when cache/index merges leak content across tenants or environments. We treat three classes: content poisoning, index contamination, and query-time prompt-in-context attacks.

Open these first

Visual map and recovery → rag-architecture-and-recovery.md
End-to-end retrieval knobs → retrieval-playbook.md
Traceability and snippet schema → retrieval-traceability.md · data-contracts.md
Metric and normalization sanity → metric_mismatch.md · normalization_and_scaling.md
Tokenizer pitfalls → tokenization_and_casing.md
Store health and splits → vectorstore_fragmentation.md
Hybrid fusion → hybrid_retriever_weights.md

Core acceptance

Provenance coverage ≥ 0.95 of indexed snippets have a valid signed ingest manifest.
Poison recall ≥ 0.90 on a seeded red-team set with near-zero false positives (≤ 0.02).
Tenant isolation: zero cross-namespace hits in 10k randomized queries.
ΔS(question, retrieved) ≤ 0.45 after quarantine, across 3 paraphrases and 2 seeds.
λ remains convergent when adversarial candidates are present but quarantined.

Quick triage

Symptom	Likely cause	Open this
New, off-brand text appears in results after a site crawl	ingestion path accepts untrusted domains or query params	tighten allowlist via data-contracts.md and add manifest checks below
Same doc shows mixed tenant banners or footers	index contamination or namespace leak	vectorstore_fragmentation.md
Adversarial strings like “ignore previous instructions” stored inside snippets	context-level prompt injection preserved into corpus	retrieval-traceability.md and citation-first prompting
Abrupt shift in token stats, casing, or Unicode forms	poisoned batch with tokenizer skew	tokenization_and_casing.md
Good recall, but answers quote junk domains	provenance score missing in fusion	add authority/manifest weights in fusion and see hybrid_retriever_weights.md

Hardening plan in 7 concrete steps

Provenance contract at ingest
Require an ingest manifest per document: {source_type, source_url, crawl_time, signer, content_hash, schema_rev}. Reject if missing or signer not in allowlist. See data-contracts.md.
Canonicalize and fingerprint
Build content_hash = xxh64(canonical_text) and simhash_64 on token 3-grams. Log (doc_id, section_id, content_hash, simhash_64) for every snippet.
Authority and trust scoring
Maintain provenance_score ∈ [0,1] from source allowlist, TLS, HSTS, domain reputation, and signer. Store with each snippet.
Poison scoring and quarantine
Compute poison_score from features: abrupt Δ in token distribution, repeated instruction phrases, suspicious URLs, CSS/JS remnants, high simhash match to known attack corpora. If poison_score ≥ τ_p, route to quarantine_index not the main index.
Namespace fences
Enforce {tenant_id, product_id, env} in the key. Deny cross-namespace reads by default. Verify in retrieval traces.
Fusion with provenance
During hybrid fusion, weight by provenance: w_final = α·w_bm25 + β·w_dense + γ·provenance_score − δ·poison_score. Do dedupe before and after fusion.
Revocation and rebuild
Keep a revocation list of content_hash and manifests. On revocation, remove from indexes, purge caches, and re-embed neighbors to avoid residual bias.

Minimal reference recipe

ingest:
  allowlist:
    domains: ["docs.company.com", "kb.company.com"]
    schemes:  ["https"]
    signer_keys: ["ed25519:..."]
  manifest_required: true

fingerprint:
  canonicalize: {nfkc: true, lowercase: true, collapse_spaces: true, strip_soft_hyphen: true}
  content_hash: "xxh64"
  simhash_bits: 64

scores:
  provenance:
    domain_reputation_weight: 0.4
    signer_weight: 0.4
    tls_policy_weight: 0.2
  poison:
    tokenshift_z_max: 0.25
    adversarial_phrase_boost: true
    url_entropy_threshold: 4.2

retrieval:
  namespace_keys: ["tenant_id", "product_id", "env"]
  fusion_weights: {bm25: 0.35, dense: 0.45, provenance: 0.25, poison: -0.40}
  quarantine_threshold: 0.60
  quarantine_index: "rag_quarantine"

ops:
  revocation_list: "kv:rag_revocations"
  cache_invalidate_on_revoke: true

Query-time shields

Citation-first and schema-locked prompting so the model must quote before reasoning. See retrieval-traceability.md.
Outlier alert when a top candidate has provenance_score < 0.3 or lives in quarantine_index.
Tenant assert in every request: {tenant_id, env} must match the index namespace. Reject otherwise.

Observability you must log

Poison detection counts per batch and per source domain.
Top phrases that triggered poison flags with sample hashes.
Namespace violations blocked.
Fusion weight contributions for the final top k.
ΔS and λ at retrieve, fuse, quarantine, and answer.
Revocation actions: which hashes, which indexes, time to full purge.

Incident response and cleanup

Freeze writes to affected namespaces.
Export suspect snippets by content_hash and manifest_id.
Add to revocation list, purge index shards and caches.
Re-embed neighbor windows around removed spans to reduce drift.
Rebuild acceptance set: verify ΔS ≤ 0.45 and coverage ≥ 0.70 on gold Qs.
Postmortem and new allowlist rule to prevent recurrence.

Verification

Seed 200 adversarial docs mixed into 10k clean.
- Poison recall ≥ 0.90 at τ selected by ROC on the seed.
- False positive rate ≤ 0.02 on clean control.
- Tenant isolation: 0 cross-namespace hits in 10k queries.
- Final ΔS and λ targets met across paraphrases.

Copy-paste prompt for the LLM step

You have TXTOS and WFGY Problem Map loaded.

Given the fused candidates with fields {provenance_score, poison_score, namespace, snippet_id, content_hash, citations}:

1) Discard any candidate where namespace != request.namespace.
2) If poison_score ≥ 0.60, quarantine and return a short fix note.
3) Rerank remaining by (model_score + 0.25*provenance_score - 0.40*poison_score).
4) Enforce cite-then-explain and return:
{
  "kept": [{"snippet_id": "...", "source_url": "..."}],
  "quarantined": [{"snippet_id": "...", "reason": "..."}],
  "ΔS": 0.xx,
  "λ_state": "...",
  "next_fix": "..."
}

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame — ⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

13 KiB Raw Blame History Unescape Escape