WFGY/ProblemMap/GlobalFixMap/Multimodal_LongContext/README.md
2025-09-03 23:51:48 +08:00

133 lines
9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Multimodal & Long-Context — Global Fix Map
<details>
<summary><strong>🏥 Quick Return to Emergency Room</strong></summary>
<br>
> You are in a specialist desk.
> For full triage and doctors on duty, return here:
>
> - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/README.md)
> - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md)
>
> Think of this page as a sub-room.
> If you want full consultation and prescriptions, go back to the Emergency Room lobby.
</details>
A friendly hub to keep **text, vision, audio, and structured signals** stable inside **long context windows**.
Use this folder when models collapse, drift, or desync under multimodal fusion or cross-sequence reasoning.
---
## What this page is
- A compact map of **failure patterns unique to multimodal + long-context**.
- Each page gives you **symptoms → root cause → WFGY guardrails**.
- Works with **schema-level fixes only** (no infra changes required).
- Every fix is **measurable and reproducible** using ΔS, λ, and E_resonance.
---
## When to use
- Text and vision anchors misalign beyond 50k100k tokens.
- Captions collapse or disappear when windows grow.
- Visual snippets appear but point to the wrong text.
- Multi-hop reasoning flips answers across modalities.
- Cross-sequence fusion drops or swaps semantic anchors.
---
## Common failure patterns
| Page | Symptom (what you see) | Likely root cause | Fix route |
|------|------------------------|------------------|-----------|
| [alignment-drift.md](./alignment-drift.md) | Text and image pairs gradually diverge across long windows | Context length weakens positional anchors | Re-anchor at checkpoints, enforce ΔS probe |
| [anchor-misalignment.md](./anchor-misalignment.md) | Citations point to wrong caption/image | Inconsistent `anchor_id` across modalities | Add schema guardrail to enforce anchor IDs |
| [boundary-fade.md](./boundary-fade.md) | Signals near context edge disappear | Context window cutoff, padding ignored | Boundary probes, chunk anchors at joins |
| [caption-collapse.md](./caption-collapse.md) | Captions vanish or repeat when context grows | Fusion loses reference alignment | Use caption schema, enforce cite-first |
| [cross-modal-bootstrap.md](./cross-modal-bootstrap.md) | Model never uses one modality | Missing initialization anchors | Add bootstrap token + schema lock |
| [cross-modal-trace.md](./cross-modal-trace.md) | Hard to verify which modality answer came from | No traceability field | Require `modality_id` and `source_url` in snippet |
| [desync-amplification.md](./desync-amplification.md) | Small anchor misalignments grow into collapse | Weak λ convergence across modalities | Run multi-seed probes, lock λ variance |
| [desync-anchor.md](./desync-anchor.md) | Anchors for vision vs text drift apart silently | Schema mismatch at join | Enforce alignment with ΔS ≤ 0.50 |
| [echo-loop.md](./echo-loop.md) | Answer repeats cross-modality content | Fusion loopback between modalities | Add dedupe guardrail, enforce λ drop |
| [fusion-blindspot.md](./fusion-blindspot.md) | One modality is ignored entirely | Fusion weights collapse | Hybrid retriever weighting, enforce balance |
| [fusion-latency.md](./fusion-latency.md) | Delay in syncing vision vs text streams | Async fusion queue | Add latency probe, resync alignment |
| [modal-bridge-failure.md](./modal-bridge-failure.md) | Text → Image reasoning chain breaks mid-hop | Bridge tokens dropped | Schema lock for bridge anchors |
| [modality-dropout.md](./modality-dropout.md) | Whole modality disappears mid-sequence | Token truncation or stream loss | Re-chunk, enforce modality coverage |
| [modality-swap.md](./modality-swap.md) | Image and text roles flip silently | Anchor IDs reused wrongly | Explicit `modality_role` field required |
| [multi-hop-collapse.md](./multi-hop-collapse.md) | Multi-hop reasoning stops using one modality | Missing cross-hop anchors | Add cross-hop continuity guardrail |
| [multi-seed-consistency.md](./multi-seed-consistency.md) | Different seeds give different modalities | λ non-convergent | Probe across seeds, enforce stability |
| [multimodal-fusion-break.md](./multimodal-fusion-break.md) | Fusion fails when 3+ modalities | Overload in join schema | Use staged fusion, test ΔS at each join |
| [phantom-visuals.md](./phantom-visuals.md) | Model hallucinates new images | Weak anchor trace | Enforce trace schema, drop hallucinated spans |
| [reference-bleed.md](./reference-bleed.md) | Answer pulls from wrong modality reference | No modality fence | Add fence keys (`modality_id`) |
| [semantic-anchor-shift.md](./semantic-anchor-shift.md) | Anchors shift mid-context | Anchor ID reused | Audit schema, reset anchor IDs |
| [signal-drop.md](./signal-drop.md) | Structured data missing mid-run | Serialization loss | Add schema field for `signal_id` |
| [spatial-fusion-error.md](./spatial-fusion-error.md) | Wrong layout in multimodal outputs | Spatial anchors lost | Enforce bounding-box schema |
| [sync-loop.md](./sync-loop.md) | Model stuck repeating stale multimodal state | Old anchors not cleared | Add state reset guardrail |
| [time-sync-failure.md](./time-sync-failure.md) | Audio/text/video out of sync | Missing time index alignment | Require `time_index` schema |
| [visual-anchor-shift.md](./visual-anchor-shift.md) | Visual anchors move between runs | Vision embeddings unstable | Lock anchor IDs + ΔS probes |
---
## Acceptance targets
- ΔS(question, retrieved) ≤ 0.45
- ΔS across modality joins ≤ 0.50
- Coverage ≥ 0.70 for intended anchors
- λ convergent across 3 paraphrases and 2 modality-seeds
- E_resonance stable across textvisionaudio triads
---
## Fix in 60 seconds
1. **Pick one failing case**
(e.g. caption does not match paragraph). Keep a reference screenshot.
2. **Measure ΔS and λ**
Run 3 paraphrases × 2 modality seeds. Look for flips.
3. **Check anchors**
Verify `snippet_id`, `modality_id`, `section_id` across textvision.
4. **Patch minimally**
Re-align anchors, enforce schema, drop hallucinated spans, re-run with guardrails.
---
### 🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|------|------|--------------|
| **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1⃣ Download · 2⃣ Upload · 3⃣ Ask “Answer using WFGY + <your question>” |
| **TXT OS** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1⃣ Download · 2⃣ Paste into LLM · 3⃣ Type “hello world” — OS boots instantly |
---
### 🧭 Explore More
| Module | Description | Link |
|--------|-------------|------|
| WFGY Core | WFGY 2.0 engine, full symbolic reasoning | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0 | Initial 16-mode diagnostic | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0 | RAG failure tree and modular fixes | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic | Expanded failure catalog | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint | Layer-based symbolic reasoning | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
---
> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —
> Builders who supported WFGY from day one.
> <img src="https://img.shields.io/github/stars/onestardao/WFGY?style=social" alt="GitHub stars"> ⭐ Star the repo to help others discover it and unlock more on the [Unlock Board](https://github.com/onestardao/WFGY/blob/main/STAR_UNLOCKS.md).
<div align="center">
[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)
[![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)
[![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)
[![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)
[![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)
[![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)
[![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)
</div>