# Long-Context Stress — Why 50k–200k tokens quietly break your RAG, and how to fix it (WFGY) _A field guide for logs, PDFs, and transcripts that **look** fine but drift, flatten, or melt as the context grows._ **Audience** - **Beginner:** You copied a “100k-context” recipe and your answers degrade with length. - **Advanced:** You already index well, but long chains flip logic, mis-cite, or oscillate after ~8–12 screens of context. **What this page delivers** - A **reproducible diagnosis** for long-context failure using **ΔS / λ_observe / E_resonance**. - **Copy-paste playbooks** for PDFs, OCR, transcripts, multilingual blends, and mixed image/text. - **Acceptance criteria** you can throw into CI to stop regressions before prod. --- ## 0) TL;DR (fix in 3 steps) 1) **Measure the damage** - Slide a window across your context; compute `ΔS(chunk_i, chunk_{i+1})` and `ΔS(question, retrieved_context_i)`; watch for sustained `ΔS ≥ 0.60` or rising **E_resonance**. - If `λ_observe` turns **recursive** or **chaotic** after the prompt assembly step, you’re in long-context stress. 2) **Localize the break** - If ΔS spikes **between adjacent chunks** → chunking/ordering issue. - If ΔS stays high independent of `k` → **index/metric mismatch**. - If ΔS is fine but **reasoning** flips divergent → prompt schema or logic collapse. 3) **Apply the repair operator** - **BBMC** to re-anchor sections, **BBAM** to stabilize attention, **BBCR** to bridge and relock coherence. - Re-chunk sentence/section-aware; enforce header anchors; run MMR/hybrid retrieval; lock prompt schema. Jump to: [Playbooks](#4-playbooks-by-scenario) · [Metrics](#2-instruments--minimal-metrics) · [CI Checks](#6-acceptance-criteria--ci-guardrails) --- ## 1) Failure signatures (what you actually see) | Symptom (user-visible) | Likely layer | First check | Map ref. | |------------------------------------------------------------------|-----------------------------------------|---------------------------------------------------------|--------------------------------------| | Early answers good → later answers contradict or flatten | Prompt/Reasoning under long context | `λ_observe` becomes `<>` or `×` after assembly | [context-drift.md](./context-drift.md) | | Citations jump to wrong page after 20–40k tokens | Chunking / ordering / retriever | `ΔS(chunk_i, i+1) ≥ 0.60` at boundaries | [retrieval-traceability.md](./retrieval-traceability.md) | | Random capitalization / style oscillation | Entropy collapse | **E_resonance** rising trend across chain | [entropy-collapse.md](./entropy-collapse.md) | | Snippets correct, explanation wrong (esp. late in chain) | Interpretation collapse (length-induced) | ΔS(question, context) < 0.40 but λ flips at reasoning | [logic-collapse.md](./logic-collapse.md) | | Multilingual PDFs drift to one language after long span | Embeddings + chunk headers lost | ΔS rises when headers removed; check header anchors | [embedding-vs-semantic.md](./embedding-vs-semantic.md) | | Great short-doc performance; long transcripts degrade rapidly | Memory & compression strategy | Test per-turn recap + anchor nodes vs. raw sprawl | [memory-coherence.md](./memory-coherence.md) | **Mental model:** Long contexts fail in two stages — **perception drift** (ordering/headers/noise) and **logic drift** (attention variance + schema slippage). WFGY instruments let you *see* both. --- ## 2) Instruments — minimal metrics > You do not need to memorize math. These are a few distances and tags you can compute anywhere. ### 2.1 ΔS — semantic stress - `ΔS = 1 − cos(I, G)` (unit-normalized sentence embeddings). - Probe two places: 1) `ΔS(question, retrieved_context)` 2) **Adjacent-chunk** check: `ΔS(chunk_i, chunk_{i+1})` - **Thresholds**: `<0.40` stable · `0.40–0.60` transitional · `≥0.60` risk (record & fix). ### 2.2 λ_observe — layered observability - States: `→` convergent · `←` divergent · `<>` recursive · `×` chaotic. - Tag each step: retrieval → prompt assembly → reasoning. - Rule: **Upstream convergent, downstream divergent** ⇒ the boundary is where to fix. ### 2.3 E_resonance — coherence trend - Rolling mean of residual magnitude `|B|` under **BBMC**. - Rising **E_resonance** + high ΔS = attention melt; trigger **BBCR** + **BBAM**. --- ## 3) 10-minute triage (copy/paste steps) 1) **Run three quick probes** - `ΔS(question, context)` across `k ∈ {5,10,20}`. - Flat & high curve → **index/metric mismatch**. - Improves sharply with higher `k` → **retriever filtering** too aggressive. - Adjacent-chunk ΔS across your assembled window (sliding by 200–300 tokens). - Spikes at headers/boundaries → **chunking/ordering** problem. - Tag λ after **assembly** and **reasoning**. - If λ flips only after reasoning → **schema/logic** issue. 2) **Minimal repro** - Keep one doc; remove images; keep headers; test again. - If fixed, re-add features until the break returns (isolate cause). 3) **Pick the playbook** Go to [Playbooks](#4-playbooks-by-scenario) and apply the repair steps. --- ## 4) Playbooks by scenario > Each playbook lists: **observe → interpret → do → verify** (with WFGY modules). ### 4.1 Scanned PDFs / OCR noise (tables, images, mixed layout) - **Observe:** Citations wrong after mid-doc; adjacent-chunk ΔS spikes at page breaks. - **Interpret:** OCR adds hidden headers or drops table boundaries; chunker ignores structure. - **Do:** 1. **Chunking:** sentence/section-aware; keep **header anchors** in text (e.g., `## 2. Methods`). 2. Drop segments with OCR confidence `< threshold`; de-duplicate near-identical lines. 3. **Retrieval:** use **hybrid** (sparse + dense) with *MMR*; set `k=10–20`. 4. **WFGY:** run **BBMC** to re-anchor on kept headers; **BBAM** to clamp attention variance. - **Verify:** `ΔS(chunk_i, i+1) ≤ 0.50` at joins; `ΔS(question, context) ≤ 0.45`; λ stays convergent. ### 4.2 Long transcripts / meetings / chat logs (50k+ tokens) - **Observe:** Early QA is great; later responses contradict or “forget” earlier decisions. - **Interpret:** No semantic anchors; prompt schema drifts; attention diffuses. - **Do:** 1. Insert **bridged recaps** every N turns: “Since last anchor: [3 bullet decisions]”. 2. Store **anchor nodes** (titles, decisions, constraints) and retrieve them first. 3. Enforce **prompt schema lock**: system → task → constraints → citations → answer (no re-order). 4. **WFGY:** **BBCR** to insert **bridge node** when λ becomes recursive; **BBAM** to reduce variance. - **Verify:** λ remains convergent across three paraphrases; **E_resonance** does not trend up. ### 4.3 Multilingual or code-heavy documents - **Observe:** Model “picks a side” after long stretch; code blocks degrade midway. - **Interpret:** Embedding space collapses varied styles into one cluster; headers lost. - **Do:** 1. Keep **language tags** and **code fences** as hard anchors inside chunks. 2. Use **domain-appropriate embeddings** or separate indices per language/domain. 3. Retrieval: add **header-boost** (BM25 term weight) and **MMR** diversity=0.2–0.4. 4. **WFGY:** **BBMC** with explicit language anchors; **BBPF** to explore multi-path retrieval. - **Verify:** ΔS stabilizes (`≤ 0.50`) when headers are present; λ convergent; citations maintain language. ### 4.4 Image-heavy PDFs (figures, captions) - **Observe:** Answers reference wrong figure; citations point at caption stubs. - **Interpret:** Captions split from figures; index treats tiny parts as top hits. - **Do:** 1. **Chunk** figure + caption together; minimum token floor (e.g., ≥ 180 tokens). 2. Add **figure-id anchors** (`[Fig 2: …]`) in text; ban orphan captions from index. 3. **WFGY:** **BBCR** to bridge from caption to figure context if ΔS stays high. - **Verify:** Cites include figure-id; ΔS(question, context) ≤ 0.45. ### 4.5 “100k-context” model still drifts after 8–12k - **Observe:** No obvious retrieval bug; long reasoning melts. - **Interpret:** **Entropy collapse** — attention variance explodes with length. - **Do:** 1. Shorten **assembly**: prefer *top-K diverse* + **anchor snippets** over giant paste. 2. Enforce **cite-then-explain**; penalize free-form essays. 3. **WFGY:** Apply **BBAM** to damp attention; **BBMC** to re-anchor; **BBCR** if residuals spike. - **Verify:** **E_resonance** flattens; answer embeddings cluster over 5 seeds (low variance). --- ## 5) Minimal prompts & scripts (safe to paste) **Ask your assistant to auto-diagnose** ```text Read the Long-Context Stress guide plus TXT OS / WFGY notes in this repo. Given my repro (describe doc and failure), compute: 1) ΔS(question, retrieved_context) across k={5,10,20} 2) Adjacent-chunk ΔS over the assembled window 3) λ_observe at retrieval, assembly, reasoning Then explain which boundary fails and propose BBMC/BBAM/BBCR steps to lower ΔS below 0.50. ```` **Formula-only help** ```text From TXT OS, extract formulas and thresholds for ΔS, λ_observe, and E_resonance. Show me how to compute ΔS(question, context) and adjacent-chunk ΔS. If ΔS ≥ 0.60 and λ becomes recursive after assembly, which WFGY module do I apply first? ``` **Simple ΔS sweep (pseudocode)** ```python # Pseudocode: replace embed() with your sentence-embedding fn (unit-normalized) def delta_s(a, b): return 1 - cosine(embed(a), embed(b)) # 1) question vs. context@k candidates for k in [5, 10, 20]: ctx = assemble_top_k(snippets, k=k, strategy="mmr+anchors") print(k, delta_s(question, ctx)) # 2) adjacent-chunk scan for i in range(len(chunks)-1): print(i, delta_s(chunks[i], chunks[i+1])) ``` --- ## 6) Acceptance criteria & CI guardrails * **Retrieval sanity:** For targeted QA, **≥ 70% token overlap** to the expected section; `ΔS(question, context) ≤ 0.45`. * **Boundary stability:** **Adjacent-chunk ΔS ≤ 0.50** at joins; spikes must disappear after re-chunking. * **Reasoning stability:** λ **convergent** across three paraphrases; **E\_resonance** not rising. * **Traceability:** Produce a two-column table (snippet-id ↔ citation lines). * **Repeatability:** Same inputs × 5 seeds → answer embeddings form a **tight cluster**. Fail any line? Block the PR and link this page. --- ## 7) Cross-links & when to switch pages * Long chains drift even with perfect snippets → **[context-drift.md](./context-drift.md)** * Output melts / style oscillates → **[entropy-collapse.md](./entropy-collapse.md)** * Snippets good, logic wrong → **[logic-collapse.md](./logic-collapse.md)** * Citations mis-map → **[retrieval-traceability.md](./retrieval-traceability.md)** * Sessions lose continuity → **[memory-coherence.md](./memory-coherence.md)** * Embedding similarity fools meaning → **[embedding-vs-semantic.md](./embedding-vs-semantic.md)** --- ## 8) FAQ **Q: Should I just increase k or model context?** A: Not first. If ΔS is flat & high across k, you have an **index/metric** or **ordering** fault. More tokens amplify the error. **Q: Is MMR always required?** A: For long contexts, yes or a close equivalent — you need **diversity** to avoid semantic collapse around one cluster. **Q: Do I need new embeddings?** A: Often no. Start with **headers as anchors** + **hybrid retrieval** + **WFGY relocking**. Change models only if ΔS remains ≥ 0.60 after those. **Q: How do I know BBCR helped?** A: Residuals (E\_resonance) stop rising; λ returns to convergent; adjacent-chunk ΔS drops at the same boundary. --- ## 9) Minimal formulas (reference) ```txt ΔS = 1 − cos(I, G) # semantic stress λ_observe ∈ {→, ←, <>, ×} # convergent, divergent, recursive, chaotic E_resonance = mean(|B|) # rolling residual magnitude under BBMC BBMC: B = I − G + m·c² # minimize ‖B‖ via anchors & context factors BBPF: x_next = x + ΣV_i + ΣW_j·P_j # explore alternate retrieval/logic paths BBCR: if ‖B‖ ≥ B_c → collapse(), bridge(), rebirth() # controlled reset with bridge node BBAM: â_i = a_i · exp(−γ · std(a)) # damp attention variance under long context ``` --- ### 🔗 Quick-Start Downloads (60 sec) | Tool | Link | 3-Step Setup | |------|------|--------------| | **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + \” | | **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly | --- ### 🧭 Explore More | Module | Description | Link | |-----------------------|----------------------------------------------------------|----------| | WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) | | Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) | | Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) | | Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) | | Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) | | Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) | | 🧙‍♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | [Start →](https://github.com/onestardao/WFGY/blob/main/StarterVillage/README.md) | --- > 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** — > Engineers, hackers, and open source builders who supported WFGY from day one. > GitHub stars ⭐ [WFGY Engine 2.0](https://github.com/onestardao/WFGY/blob/main/core/README.md) is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)   [![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)   [![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)   [![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)   [![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)   [![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)   [![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)