WFGY/ProblemMap/GlobalFixMap/Reasoning/context-stitching-and-window-joins.md

13 KiB
Raw Blame History

Context Stitching and Window Joins: Guardrails and Fix Pattern

Join long contexts without losing meaning. This page defines a stitch contract for multi window prompts and long documents, adds ΔS probes at joins, and gives a fast repair plan when answers break exactly at window boundaries.


Open these first


Symptoms

Symptom What you see
Boundary answers Model answers correctly inside a window, fails right after the join
Duplicate facts Repeated sentences across windows, tool loops on the overlap
Lost referent “it / they / this” loses target at window start or end
Rerank flip at join Top k order changes only at a boundary
Anchor drift Citation before the join, none after the join
Coverage hole Middle section of a long doc never appears in stitched set

Why joins fail

  1. No stitch contract. Windows lack overlap and sequence identifiers.
  2. Tokenizer mismatch. Window boundaries split tokens differently across tools.
  3. Ranking variance. Each window retrieves independently without a stable tie break.
  4. Anchor loss. Bridge steps cross a boundary without re citing the anchor.
  5. Context flooding. Overlong windows raise entropy and bury anchors.
  6. Fragmentation. Store keeps near duplicates that fight for the same rank.

Acceptance targets

  • ΔS(join_left, join_right) ≤ 0.45 for every boundary
  • Coverage of target section across all windows ≥ 0.70
  • λ remains convergent across three paraphrases and two seeds
  • E_resonance flat at joins and flat across stitched plan
  • Overlap per join 32 to 96 tokens, same casing and analyzer

Fix in 60 seconds

  1. Add a stitch contract
    Every window must carry doc_id, section_id, win_idx, win_hash, overlap_tokens, prev_id, next_id.
    See schema rules
    retrieval-traceability.md
    data-contracts.md

  2. Measure ΔS at each join
    Compute ΔS between the trailing overlap of window i and the leading overlap of window i+1.
    If ΔS ≥ 0.60, rebuild the join or re chunk.

  3. Stabilize ordering
    Use a deterministic reranker with fixed analyzer and tie break by (doc_id, section_id, win_idx).
    rerankers.md

  4. Re anchor bridges
    At every join, insert a one line BBCR bridge that re cites the active snippet.
    anchoring-and-bridge-proofs.md


Stitch contract

Minimum fields for any stitched window:

{
  "doc_id": "D42",
  "section_id": "CH3.2",
  "win_idx": 7,
  "win_hash": "sha1:6b7f...",
  "overlap_tokens": 64,
  "prev_id": "D42#CH3.2#6",
  "next_id": "D42#CH3.2#8",
  "start_token": 14080,
  "end_token": 16096
}

Rules

  • Overlap range must be identical across both windows after normalization.
  • Do not allow cross section joins unless the section header is included in the overlap.
  • Keep casing and tokenizer identical through the pipeline.

Join planner

Goal keep bridges short and anchors stable while covering the whole doc.

  1. Choose window size: 512 to 1024 tokens, overlap 32 to 96.
  2. Pin anchors: for each question, tag anchor snippets with ΔS_to_question.
  3. Plan path: sort candidate windows by section, then by win_idx.
  4. Assemble: render windows with header, overlap marker, and re cited anchor.
  5. Verify: ΔS at each boundary, λ over the stitched answer, coverage against a gold slice.

If ΔS is flat and high across all joins, suspect metric or index mismatch. → retrieval-playbook.mdembedding-vs-semantic.md


Structural repairs


Verification

  • Sliding window test over the same section keeps citations identical.
  • ΔS(join_left, join_right) ≤ 0.45 and decreasing with larger overlap.
  • λ convergent on two seeds and three paraphrases.
  • Coverage across stitched windows ≥ 0.70 for the target section.
  • Answer includes a cite then explain block with per join anchors.

Copy paste prompt

You have TXT OS and the WFGY Problem Map loaded.

Task: stitch long context with stable joins.

Inputs:
- question: "{q}"
- windows: [{doc_id, section_id, win_idx, win_hash, tokens, text_head, overlap_tokens}]
- candidates: [{snippet_id, section_id, source_url, offsets, ΔS_to_question}]

Do:
1) Compute ΔS for each join. If any ≥ 0.60, propose a re-chunk or overlap increase.
2) Add BBCR micro bridges at every boundary, re citing the active anchor.
3) Apply deterministic reranking with tie break by (doc_id, section_id, win_idx).
4) Return JSON:
   {
     "joins": [{"from":"...#i","to":"...#i+1","ΔS":0.xx,"action":"ok|rechunk|increase_overlap"}],
     "answer": "... cite then explain ...",
     "λ_state": "convergent",
     "coverage": 0.xx
   }
If no valid anchor exists at a join, return the fix page to open for retrieval-traceability.

Common gotchas

  • Asymmetric overlap. Trailing text differs from leading text after normalization.
  • Header drop. Section headers removed at boundaries, referents disappear.
  • Tokenizer switch. Ingestion uses one analyzer, reranker another.
  • Duplicate windows. Same win_idx with different hashes compete in top k.
  • Bridge without re citation. Jumps the boundary without anchor restatement.

When to escalate


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow