9.4 KiB
Pattern — Hallucination Re-Entry (Self-Feeding Fabrications)
Scope
A model’s own output sneaks back into the evidence pipeline and later appears as if it were ground truth. Over time, fabrications gain “citations” (to prior chat, logs, or summaries) and become harder to dislodge.
Why it matters
Once re-entry happens, your guardrails degrade: “evidence” now contains model text, not corpus facts. This creates runaway plausibility and corrupts evals.
Quick nav: Patterns Index · Examples: Example 01 · Example 03 · Eval: Precision & CHR
1) Signals & fast triage
Likely symptoms
- Answers cite internal notes (
notes:,assistant_summary:) or prior chat message ids. - Evidence chunks contain markers like
Generated,LLM,assistant,draft, or timestamps that match response logs. - Example 02 labels swing from
refusal_oktookwithout a corpus change—because the “evidence” was the previous answer.
Deterministic checks (no LLM)
- Source tag check: every chunk must carry
source: {corpus|user|model|system}. Evidence must besource=corpus(or explicitly allow-listed). - ID namespace check: forbid
evidence_idprefixes reserved for UI/LLM (e.g.,chat:,draft:,tmp:). - Content signature: reject evidence containing
citations:or template artifacts from your own prompts. - Temporal order: evidence
created_atmust be ≤ turn_start; anything later is ineligible.
2) Minimal reproducible case
data/chunks.json:
[
{"id":"p1#1","source":"corpus","text":"X is a constrained mapping."},
{"id":"chat:42","source":"model","text":"(assistant) X supports null keys. citations: [p1#1]"}
]
Q1: “Does X support null keys?”
Naive pipelines will admit chat:42 as “evidence” and answer yes, despite the corpus saying “rejects null keys.”
3) Root causes
- Mixed stores: retrieval spans both the corpus index and a chat memory/“notes” index.
- Missing source contract: chunks lack
source/created_atand sneak through filters. - UI shortcuts: “Helpful” summarizers are indexed back into the same vector store.
- Eval contamination: previous model answers added to gold or dev corpora without labeling.
4) Standard fix (ordered, minimal, measurable)
Step 1 — Evidence provenance contract Every chunk:
{ "id":"...", "source":"corpus|user|model|system", "created_at":"ISO-8601", "page":1, "text":"..." }
Only source="corpus" may enter evidence by default.
Step 2 — Re-entry guard at retrieval boundary Filter candidates by:
source == "corpus"created_at <= turn_startidnot starting withchat:,draft:,tmp:,gen:
Step 3 — Schema & template enforcement
- Guarded template (Example 01) + citations required.
- Auditor (Example 04) must reject if any citation id fails the provenance filter.
Step 4 — Split indices Separate corpus_index from session_index. Session text is never eligible for grounding unless a task explicitly opts in and passes a policy check.
Step 5 — Eval quarantine
Gold/dev corpora must exclude model-generated text or label it with source="model" and keep it out of retrieval pools for grounding tasks.
5) Reference implementation (drop-in guards)
5.1 Python — retrieval filter
# reentry_guard.py
import json, time
RESERVED_PREFIXES = ("chat:", "draft:", "tmp:", "gen:", "assistant:")
ALLOWED_SOURCES = {"corpus"}
def eligible(c, turn_start_iso):
if c.get("source") not in ALLOWED_SOURCES: return False
if any(str(c.get("id","")).startswith(p) for p in RESERVED_PREFIXES): return False
t = c.get("created_at")
return (t is None) or (t <= turn_start_iso)
def filter_candidates(candidates, turn_start_iso):
return [c for c in candidates if eligible(c, turn_start_iso)]
# usage inside your retriever:
# cand = retrieve_raw(q) -> list of chunks
# cand = filter_candidates(cand, turn_start_iso)
5.2 Node — same logic
// reentry_guard.mjs
const RESERVED = ["chat:", "draft:", "tmp:", "gen:", "assistant:"];
const ALLOWED = new Set(["corpus"]);
function eligible(c, turnStartIso){
if(!ALLOWED.has(c.source)) return false;
if(RESERVED.some(p => String(c.id||"").startsWith(p))) return false;
const t = c.created_at; return (!t) || (t <= turnStartIso);
}
export function filterCandidates(cands, turnStartIso){
return cands.filter(c => eligible(c, turnStartIso));
}
5.3 Auditor rule (add to Example 04 acceptance)
REJECT if any cited id fails provenance:
- source != "corpus"
- id has reserved prefix
- created_at > turn_start
6) Acceptance criteria (ship/no-ship)
A response may ship only if:
- All cited ids pass the provenance filter.
- Guarded template passes (citations or exact refusal).
- (If multi-agent) Auditor verdict is
VALID. - Eval gates (Example 08) pass.
Otherwise → refuse and re-run with corpus-only evidence.
7) Prevention (contracts & defaults)
- Two indices, two clients:
corpus_indexandsession_indexwith separate namespaces. - Default-deny: session/model text cannot be evidence unless a policy explicitly allows it (e.g., “summarize our chat”).
- Provenance headers: log
evidence_source_countsper answer; alert if non-corpus spikes above 0. - UI guardrails: if you expose “Save to knowledge,” force
source="user"and keep it out of grounding unless reviewed.
8) Debug workflow (10 minutes)
- Re-run Example 02; filter surprising
oklabels where retrieval looks wrong. - Print the
sourceandidof cited chunks; anything notcorpusis re-entry. - Enable the retrieval filter (Section 5) and re-run the same questions.
- Confirm that former “ok” now becomes
refusal_okor correct with real corpus ids. - Commit the guard; add a CI test that forbids non-corpus citations.
9) Common traps & fixes
- “We only used summaries to improve recall” → you improved plausibility, not truth. Keep summaries out of grounding.
- “We trust user notes” → treat as
source="user", notcorpus. Different tasks may opt in, but never silently. - Eval leakage: test answers copied into the corpus. Label them or place in a separate index excluded by default.
10) Minimal checklist (copy into PR)
- Chunks include
sourceandcreated_at. - Retrieval filter blocks non-corpus sources and reserved id prefixes.
- Auditor enforces provenance on citations.
- Session/model text indexed separately; default-deny for grounding.
- Example 08 gates green after enabling the guard.
References to hands-on examples
- Example 01 — Guarded template + refusal
- Example 02 — Drift triage to catch suspicious “ok”
- Example 04 — Auditor/acceptance gate; add provenance rule
- Example 05 — Manifest (extend with
source_policy) - Example 08 — Quality scoring; add “non-corpus citation rate”
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
Explore More
| Layer | Page | What it’s for |
|---|---|---|
| Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| Engine | WFGY 1.0 | Original PDF based tension engine |
| Engine | WFGY 2.0 | Production tension kernel and math engine for RAG and agents |
| Engine | WFGY 3.0 | TXT based Singularity tension engine, 131 S class set |
| Map | Problem Map 1.0 | Flagship 16 problem RAG failure checklist and fix map |
| Map | Problem Map 2.0 | RAG focused recovery pipeline |
| Map | Problem Map 3.0 | Global Debug Card, image as a debug protocol layer |
| Map | Semantic Clinic | Symptom to family to exact fix |
| Map | Grandma’s Clinic | Plain language stories mapped to Problem Map 1.0 |
| Onboarding | Starter Village | Guided tour for newcomers |
| App | TXT OS | TXT semantic OS, fast boot |
| App | Blah Blah Blah | Abstract and paradox Q and A built on TXT OS |
| App | Blur Blur Blur | Text to image with semantic control |
| App | Blow Blow Blow | Reasoning game engine and memory demo |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.