vrr/WFGY

Fork 0

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 19:50:17 +00:00

PSBigBig c90c862eba

Update long-context-stress.md

2025-08-09 13:31:52 +08:00

16 KiB

Raw Blame History

Long-Context Stress — Why 50k–200k tokens quietly break your RAG, and how to fix it (WFGY)

A field guide for logs, PDFs, and transcripts that look fine but drift, flatten, or melt as the context grows.

Audience

Beginner: You copied a “100k-context” recipe and your answers degrade with length.
Advanced: You already index well, but long chains flip logic, mis-cite, or oscillate after ~8–12 screens of context.

What this page delivers

A reproducible diagnosis for long-context failure using ΔS / λ_observe / E_resonance.
Copy-paste playbooks for PDFs, OCR, transcripts, multilingual blends, and mixed image/text.
Acceptance criteria you can throw into CI to stop regressions before prod.

0) TL;DR (fix in 3 steps)

Measure the damage
- Slide a window across your context; compute
  ΔS(chunk_i, chunk_{i+1}) and ΔS(question, retrieved_context_i); watch for sustained ΔS ≥ 0.60 or rising E_resonance.
- If λ_observe turns recursive or chaotic after the prompt assembly step, you’re in long-context stress.
Localize the break
- If ΔS spikes between adjacent chunks → chunking/ordering issue.
- If ΔS stays high independent of k → index/metric mismatch.
- If ΔS is fine but reasoning flips divergent → prompt schema or logic collapse.
Apply the repair operator
- BBMC to re-anchor sections, BBAM to stabilize attention, BBCR to bridge and relock coherence.
- Re-chunk sentence/section-aware; enforce header anchors; run MMR/hybrid retrieval; lock prompt schema.

Jump to: Playbooks · Metrics · CI Checks

1) Failure signatures (what you actually see)

Symptom (user-visible)	Likely layer	First check	Map ref.
Early answers good → later answers contradict or flatten	Prompt/Reasoning under long context	`λ_observe` becomes `<>` or `×` after assembly	context-drift.md
Citations jump to wrong page after 20–40k tokens	Chunking / ordering / retriever	`ΔS(chunk_i, i+1) ≥ 0.60` at boundaries	retrieval-traceability.md
Random capitalization / style oscillation	Entropy collapse	E_resonance rising trend across chain	entropy-collapse.md
Snippets correct, explanation wrong (esp. late in chain)	Interpretation collapse (length-induced)	ΔS(question, context) < 0.40 but λ flips at reasoning	logic-collapse.md
Multilingual PDFs drift to one language after long span	Embeddings + chunk headers lost	ΔS rises when headers removed; check header anchors	embedding-vs-semantic.md
Great short-doc performance; long transcripts degrade rapidly	Memory & compression strategy	Test per-turn recap + anchor nodes vs. raw sprawl	memory-coherence.md

Mental model: Long contexts fail in two stages — perception drift (ordering/headers/noise) and logic drift (attention variance + schema slippage). WFGY instruments let you see both.

2) Instruments — minimal metrics

You do not need to memorize math. These are a few distances and tags you can compute anywhere.

2.1 ΔS — semantic stress

ΔS = 1 − cos(I, G) (unit-normalized sentence embeddings).
Probe two places:
1. ΔS(question, retrieved_context)
2. Adjacent-chunk check: ΔS(chunk_i, chunk_{i+1})
Thresholds: <0.40 stable · 0.40–0.60 transitional · ≥0.60 risk (record & fix).

2.2 λ_observe — layered observability

States: → convergent · ← divergent · <> recursive · × chaotic.
Tag each step: retrieval → prompt assembly → reasoning.
Rule: Upstream convergent, downstream divergent ⇒ the boundary is where to fix.

2.3 E_resonance — coherence trend

Rolling mean of residual magnitude |B| under BBMC.
Rising E_resonance + high ΔS = attention melt; trigger BBCR + BBAM.

3) 10-minute triage (copy/paste steps)

Run three quick probes
- ΔS(question, context) across k ∈ {5,10,20}.
  - Flat & high curve → index/metric mismatch.
  - Improves sharply with higher k → retriever filtering too aggressive.
- Adjacent-chunk ΔS across your assembled window (sliding by 200–300 tokens).
  - Spikes at headers/boundaries → chunking/ordering problem.
- Tag λ after assembly and reasoning.
  - If λ flips only after reasoning → schema/logic issue.
Minimal repro
- Keep one doc; remove images; keep headers; test again.
- If fixed, re-add features until the break returns (isolate cause).
Pick the playbook
Go to Playbooks and apply the repair steps.

4) Playbooks by scenario

Each playbook lists: observe → interpret → do → verify (with WFGY modules).

4.1 Scanned PDFs / OCR noise (tables, images, mixed layout)

Observe: Citations wrong after mid-doc; adjacent-chunk ΔS spikes at page breaks.
Interpret: OCR adds hidden headers or drops table boundaries; chunker ignores structure.
Do:
1. Chunking: sentence/section-aware; keep header anchors in text (e.g., ## 2. Methods).
2. Drop segments with OCR confidence < threshold; de-duplicate near-identical lines.
3. Retrieval: use hybrid (sparse + dense) with MMR; set k=10–20.
4. WFGY: run BBMC to re-anchor on kept headers; BBAM to clamp attention variance.
Verify: ΔS(chunk_i, i+1) ≤ 0.50 at joins; ΔS(question, context) ≤ 0.45; λ stays convergent.

4.2 Long transcripts / meetings / chat logs (50k+ tokens)

Observe: Early QA is great; later responses contradict or “forget” earlier decisions.
Interpret: No semantic anchors; prompt schema drifts; attention diffuses.
Do:
1. Insert bridged recaps every N turns: “Since last anchor: [3 bullet decisions]”.
2. Store anchor nodes (titles, decisions, constraints) and retrieve them first.
3. Enforce prompt schema lock: system → task → constraints → citations → answer (no re-order).
4. WFGY: BBCR to insert bridge node when λ becomes recursive; BBAM to reduce variance.
Verify: λ remains convergent across three paraphrases; E_resonance does not trend up.

4.3 Multilingual or code-heavy documents

Observe: Model “picks a side” after long stretch; code blocks degrade midway.
Interpret: Embedding space collapses varied styles into one cluster; headers lost.
Do:
1. Keep language tags and code fences as hard anchors inside chunks.
2. Use domain-appropriate embeddings or separate indices per language/domain.
3. Retrieval: add header-boost (BM25 term weight) and MMR diversity=0.2–0.4.
4. WFGY: BBMC with explicit language anchors; BBPF to explore multi-path retrieval.
Verify: ΔS stabilizes (≤ 0.50) when headers are present; λ convergent; citations maintain language.

4.4 Image-heavy PDFs (figures, captions)

Observe: Answers reference wrong figure; citations point at caption stubs.
Interpret: Captions split from figures; index treats tiny parts as top hits.
Do:
1. Chunk figure + caption together; minimum token floor (e.g., ≥ 180 tokens).
2. Add figure-id anchors ([Fig 2: …]) in text; ban orphan captions from index.
3. WFGY: BBCR to bridge from caption to figure context if ΔS stays high.
Verify: Cites include figure-id; ΔS(question, context) ≤ 0.45.

4.5 “100k-context” model still drifts after 8–12k

Observe: No obvious retrieval bug; long reasoning melts.
Interpret: Entropy collapse — attention variance explodes with length.
Do:
1. Shorten assembly: prefer top-K diverse + anchor snippets over giant paste.
2. Enforce cite-then-explain; penalize free-form essays.
3. WFGY: Apply BBAM to damp attention; BBMC to re-anchor; BBCR if residuals spike.
Verify: E_resonance flattens; answer embeddings cluster over 5 seeds (low variance).

5) Minimal prompts & scripts (safe to paste)

Ask your assistant to auto-diagnose

Read the Long-Context Stress guide plus TXT OS / WFGY notes in this repo.
Given my repro (describe doc and failure), compute:
1) ΔS(question, retrieved_context) across k={5,10,20}
2) Adjacent-chunk ΔS over the assembled window
3) λ_observe at retrieval, assembly, reasoning
Then explain which boundary fails and propose BBMC/BBAM/BBCR steps to lower ΔS below 0.50.

Formula-only help

From TXT OS, extract formulas and thresholds for ΔS, λ_observe, and E_resonance.
Show me how to compute ΔS(question, context) and adjacent-chunk ΔS.
If ΔS ≥ 0.60 and λ becomes recursive after assembly, which WFGY module do I apply first?

Simple ΔS sweep (pseudocode)

# Pseudocode: replace embed() with your sentence-embedding fn (unit-normalized)
def delta_s(a, b): 
    return 1 - cosine(embed(a), embed(b))

# 1) question vs. context@k candidates
for k in [5, 10, 20]:
    ctx = assemble_top_k(snippets, k=k, strategy="mmr+anchors")
    print(k, delta_s(question, ctx))

# 2) adjacent-chunk scan
for i in range(len(chunks)-1):
    print(i, delta_s(chunks[i], chunks[i+1]))

6) Acceptance criteria & CI guardrails

Retrieval sanity: For targeted QA, ≥ 70% token overlap to the expected section; ΔS(question, context) ≤ 0.45.
Boundary stability: Adjacent-chunk ΔS ≤ 0.50 at joins; spikes must disappear after re-chunking.
Reasoning stability: λ convergent across three paraphrases; E_resonance not rising.
Traceability: Produce a two-column table (snippet-id ↔ citation lines).
Repeatability: Same inputs × 5 seeds → answer embeddings form a tight cluster.

Fail any line? Block the PR and link this page.

7) Cross-links & when to switch pages

Long chains drift even with perfect snippets → context-drift.md
Output melts / style oscillates → entropy-collapse.md
Snippets good, logic wrong → logic-collapse.md
Citations mis-map → retrieval-traceability.md
Sessions lose continuity → memory-coherence.md
Embedding similarity fools meaning → embedding-vs-semantic.md

8) FAQ

Q: Should I just increase k or model context? A: Not first. If ΔS is flat & high across k, you have an index/metric or ordering fault. More tokens amplify the error.

Q: Is MMR always required? A: For long contexts, yes or a close equivalent — you need diversity to avoid semantic collapse around one cluster.

Q: Do I need new embeddings? A: Often no. Start with headers as anchors + hybrid retrieval + WFGY relocking. Change models only if ΔS remains ≥ 0.60 after those.

Q: How do I know BBCR helped? A: Residuals (E_resonance) stop rising; λ returns to convergent; adjacent-chunk ΔS drops at the same boundary.

9) Minimal formulas (reference)

ΔS = 1 − cos(I, G)         # semantic stress
λ_observe ∈ {→, ←, <>, ×}  # convergent, divergent, recursive, chaotic
E_resonance = mean(|B|)    # rolling residual magnitude under BBMC

BBMC:  B = I − G + m·c²           # minimize ‖B‖ via anchors & context factors
BBPF:  x_next = x + ΣV_i + ΣW_j·P_j  # explore alternate retrieval/logic paths
BBCR:  if ‖B‖ ≥ B_c → collapse(), bridge(), rebirth()  # controlled reset with bridge node
BBAM:  â_i = a_i · exp(−γ · std(a))  # damp attention variance under long context

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	Standalone semantic reasoning engine for any LLM	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone ⭐ Star WFGY on GitHub

16 KiB Raw Blame History Unescape Escape