WFGY/ProblemMap/GlobalFixMap/LanguageLocale/rtl_bidi_control.md

7.8 KiB
Raw Blame History

RTL & BiDi Control — Guardrails and Fix Pattern

🧭 Quick Return to Map

You are in a sub-page of LanguageLocale.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Stabilize retrieval and reasoning when left-to-right content mixes with right-to-left scripts or invisible BiDi marks. No infra change required. All fixes map back to WFGY pages with measurable targets.

What this page is

  • A compact repair guide for directionality bugs that flip tokens, citations, or numbers.
  • Steps to normalize control characters, lock direction metadata, and keep offsets verifiable.
  • Store-agnostic checks you can run in minutes.

When to use

  • Citations look correct to the eye but snippet offsets do not match.
  • Punctuation or brackets render on the wrong side in answers.
  • Arabic or Hebrew lines invert number order or collapse after parsing.
  • JSON fields with mixed direction break validation or flip keys.
  • Search returns near hits but ΔS stays high on RTL content.

Open these first

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45 on three paraphrases.
  • Coverage of target section ≥ 0.70.
  • λ remains convergent across two seeds.
  • Offsets verified after normalization on both query and snippet.

Typical breakpoints → exact fix

  • Invisible BiDi marks inside snippets cause reversed punctuation or bracket order.
    Fix: strip control code points during indexing and query pre-norm. Persist a dir flag on the clean text.
    Open: Data Contracts, Retrieval Traceability

  • Rendered order vs stored order mismatch makes citations fail.
    Fix: compute character offsets on the normalized text only. Log the normalization pipeline in trace.
    Open: Retrieval Traceability

  • Numbers flip in Arabic or Hebrew lines when Eastern Arabic digits mix with Latin punctuation.
    Fix: normalize digits to a single system for retrieval. Keep the original form for display.
    Open: Digits • Width • Punctuation

  • JSON payloads break or tool calls mis-route because keys include RTL marks.
    Fix: forbid control chars in keys through schema, allow in values only after normalization.
    Open: Prompt Injection, Data Contracts


60-second fix checklist

  1. Strip BiDi controls during ingest and query Remove these if present:
    LRM U+200E, RLM U+200F, LRE U+202A, RLE U+202B, LRO U+202D, RLO U+202E, PDF U+202C,
    LRI U+2066, RLI U+2067, FSI U+2068, PDI U+2069.
    Also normalize NBSP U+00A0, ZWJ U+200D when it changes tokenization.

  2. Persist direction metadata Add dir = "rtl" | "ltr" | "auto" at snippet and paragraph levels. Store it in the trace envelope.

  3. Index on normalized text only

    • Normalize to NFC.
    • Strip BiDi marks.
    • Fold digits per store policy.
    • Keep original text for rendering.
  4. Contract the payload Require fields: snippet_id, dir, norm_hash, offsets_on_norm, source_url.
    Reject if dir missing on RTL sources.
    Open: Data Contracts

  5. Probe λ_observe Vary k = 5, 10, 20. If ΔS stays flat and high, rebuild the index after normalization and re-verify offsets.


Copy-paste prompt


You have TXT OS and the WFGY Problem Map loaded.

My multilingual issue:

* symptoms: punctuation flips or offsets fail on RTL lines
* traces: ΔS(question,retrieved)=..., λ across 3 paraphrases, direction flags

Tell me:

1. the failing layer and why,
2. the exact WFGY page to open,
3. the minimal steps to push ΔS ≤ 0.45 and keep λ convergent,
4. a reproducible check that verifies offsets after normalization.


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + ”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars