Language & Multilingual · Global Fix Map

A compact hub to stabilize cross-lingual retrieval and reasoning.
Use this folder when your corpus or queries include CJK, RTL, Indic, Cyrillic, accented Latin, or frequent code-switching.

Quick routes to per-page guides

Tokenizer mismatch → tokenizer_mismatch.md
Script mixing inside one query → script_mixing.md
Locale normalization and variants → locale_drift.md
End-to-end overview and recipes → multilingual_guide.md
Proper nouns and alias shield → proper_noun_aliases.md
Romanization and transliteration rules → romanization_transliteration.md
Query language detection contract → query_language_detection.md
Analyzer routing per language → query_routing_and_analyzers.md
Multilingual hybrid ranking → hybrid_ranking_multilingual.md
Stopwords and morphology locks → stopword_and_morphology_controls.md
Fallback translation with glossary → fallback_translation_and_glossary_bridge.md
Bilingual and code-switch eval sets → code_switching_eval.md

When to use this folder

High similarity yet wrong meaning on bilingual or mixed-script corpora
Citations point to the wrong section after translating the question
Hybrid retrievers underperform a single retriever across languages
Index looks healthy while coverage stays low for non-Latin scripts
Names flip between native, transliteration, and English aliases
zh-Hans and zh-Hant never co-retrieve, Thai recall drops with no clear cause

Acceptance targets

ΔS(question, retrieved) ≤ 0.45 across language variants
Coverage of the target section ≥ 0.70 after repair
λ remains convergent across three paraphrases and two seeds
E_resonance stays flat on long windows that mix scripts
Citation fields complete, alias noise does not leak into evidence

Open these first

Visual map and recovery → rag-architecture-and-recovery.md
End-to-end retrieval knobs → retrieval-playbook.md
Why this snippet and how to cite → retrieval-traceability.md
Snippet schema fence → data-contracts.md
Embedding vs meaning → embedding-vs-semantic.md
Chunk boundary sanity → chunking-checklist.md

Map symptoms → structural fixes

Symptom	Likely cause	Open this
High similarity yet wrong meaning	Embedding not multilingual or pre-normalization mismatch	embedding-vs-semantic.md
Citations jump sections after translation	Snippet schema too loose	data-contracts.md · retrieval-traceability.md
zh-Hans and zh-Hant never co-retrieve	Variant mapping and width rules missing	locale_drift.md
Thai or CJK recall collapses	Tokenizer mismatch or missing segmenter	tokenizer_mismatch.md
Mixed Latin plus CJK query under-recalls	Analyzer split across scripts	script_mixing.md
Hybrid retriever worse than single	Query parsing split or mis-weighted rerank	patterns/pattern_query_parsing_split.md · rerankers.md
Proper nouns oscillate across spellings	Missing alias fields and entity shield	proper_noun_aliases.md
Inconsistent transliteration causes misses	Romanization rules and aliases not aligned	romanization_transliteration.md
Language detection drifts	Detection contract unlocked or weak samples	query_language_detection.md
Search vs index behave differently	Analyzer routing error	query_routing_and_analyzers.md
Ranking unstable across languages	Monolingual reranker or unaligned features	hybrid_ranking_multilingual.md
Negations or particles vanish	Stopword or morphology rules too aggressive	stopword_and_morphology_controls.md
Persistent high ΔS on local language path	Need controlled translation bridge with glossary	fallback_translation_and_glossary_bridge.md

Fix in 60 seconds

Detect language
Emit stable language and confidence per the detection contract. If unstable, stop and fix detection first.
Open → query_language_detection.md
Lock normalization and analyzers
Keep the same locale, width, accents, and segmentation for both index and search.
Open → locale_drift.md · query_routing_and_analyzers.md
Protect entities and syntax
Add alias fields and romanization pairs. Clamp stopwords and morphological rules for scope words like negations or units.
Open → proper_noun_aliases.md · romanization_transliteration.md · stopword_and_morphology_controls.md
Stabilize ranking and hybrid flows
Use multilingual reranker or dual-track lexical plus vector, keep ordering deterministic.
Open → hybrid_ranking_multilingual.md
Use a translation bridge only as last resort
Enable only when the native path keeps high ΔS. Always pair with a glossary.
Open → fallback_translation_and_glossary_bridge.md
Verify
With bilingual and code-switch test sets confirm ΔS ≤ 0.45 and Coverage ≥ 0.70, λ convergent.
Open → code_switching_eval.md

Store-agnostic quick recipes

Normalize the same way for corpus and queries before any vector store, keep tokenizer consistent on both sides
CJK and Thai need segmentation or bigrams, keep critical fields as keyword to protect entities
If you cannot use multilingual embeddings, add a lexical sidecar then align features in a deterministic rerank

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

13 KiB Raw Blame History Unescape Escape

Language & Multilingual · Global Fix Map

Quick routes to per-page guides

When to use this folder

Acceptance targets

Open these first

Map symptoms → structural fixes

Fix in 60 seconds

Store-agnostic quick recipes

🔗 Quick-Start Downloads (60 sec)

🧭 Explore More

13 KiB

Raw Blame History