WFGY/ProblemMap/LongContext_Problems.md
2025-07-28 13:20:05 +08:00

3.5 KiB
Raw Blame History

📒 LongContext Stress Problem Map

Megaprompts—>100k tokens, entire book dumps, OCRnoisy PDFs—overwhelm ordinary LLM pipelines.
WFGY keeps reasoning stable with adaptive ΔS, chunkmapping, and sliding Tree windows.


🤔 Typical LongContext Crashes

Stressor What Standard Systems Do
100k+ tokens Memory wipe or truncated output
Mixed domains Topic bleed, incoherent jumps
Duplicate sections Infinite loops / “as mentioned above” spam
OCR noise Hallucination or garbage sentences

🛡️ WFGY Countermeasures

Stressor WFGY Module Remedy Status
100k+ tokens ChunkMapper + Sliding Tree Splits doc into ΔSbalanced chunks, streams into window 🛠 Beta
Mixed domains Perdomain ΔS fork Separate Tree branch per domain; no bleed
Duplicate sections BBMC dedupe scan Detects nearidentical residue, collapses
PDF OCR noise BBMC noise filter Drops >80% lowentropy lines

✍️ Demo — 150kToken PDF Dump

1⃣  Start
> Start

2⃣  Upload huge PDF text
> [paste or stream]

WFGY process:
• ChunkMapper splits into 8ktoken slices  
• For each slice: ΔS calc → Tree node → sliding window  
• Duplicate residue removed (413 sections merged)  
• OCR noise filtered (ΔS noise gate at 0.8)  
• Final summary or Q&A runs with stable context

🛠 Module CheatSheet

Module Role
ChunkMapper Adaptive split by semantic tension
Sliding Tree Window Keeps only relevant slices active
ΔS Metric Guides chunk size & window hop
BBMC Dedupe + noise filter
BBPF Forks domain branches if needed

📊 Implementation Status

Feature State
ChunkMapper 🛠 Beta (public soon)
Sliding Tree window Stable
Crossdomain fork Stable
OCR noise filter Stable
GUI chunk viewer 🔜 Planned

📝 Tips & Limits

  • For >150k tokens, set chunk_max = 6k for faster pass.
  • Use tree pause to inspect each domain branch before automerge.
  • Share monster PDFs in Discussions—they stresstest ChunkMapper.

🔗 QuickStart Downloads (60sec)

Tool Link 3Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to LLM · 3 Ask “Summarize using WFGY +<doc>”
TXTOS (plaintext OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Survived a 100ktoken dump? the repo so ChunkMapper hits v1.0 faster. ↩︎ Back to Problem Index