WFGY/ProblemMap/GlobalFixMap/Language
2025-08-30 14:26:41 +08:00
..
code_switching_eval.md Create code_switching_eval.md 2025-08-30 10:59:16 +08:00
fallback_translation_and_glossary_bridge.md Create fallback_translation_and_glossary_bridge.md 2025-08-30 14:12:24 +08:00
hybrid_ranking_multilingual.md Create hybrid_ranking_multilingual.md 2025-08-30 13:45:25 +08:00
locale_drift.md Update locale_drift.md 2025-08-30 10:27:09 +08:00
multilingual_guide.md Update multilingual_guide.md 2025-08-30 10:27:27 +08:00
proper_noun_aliases.md Create proper_noun_aliases.md 2025-08-30 11:09:32 +08:00
query_language_detection.md Create query_language_detection.md 2025-08-30 12:09:45 +08:00
query_routing_and_analyzers.md Create query_routing_and_analyzers.md 2025-08-30 12:41:49 +08:00
README.md Update README.md 2025-08-30 14:26:41 +08:00
romanization_transliteration.md Create romanization_transliteration.md 2025-08-30 11:51:04 +08:00
script_mixing.md Update script_mixing.md 2025-08-30 10:36:49 +08:00
stopword_and_morphology_controls.md Create stopword_and_morphology_controls.md 2025-08-30 13:59:44 +08:00
tokenizer_mismatch.md Update tokenizer_mismatch.md 2025-08-30 10:36:30 +08:00

Language & Multilingual · Global Fix Map

A compact hub to stabilize cross-lingual retrieval and reasoning.
Use this folder when your corpus or queries include CJK, RTL, Indic, Cyrillic, accented Latin, or frequent code-switching.


Quick routes to per-page guides


When to use this folder

  • High similarity yet wrong meaning on bilingual or mixed-script corpora
  • Citations point to the wrong section after translating the question
  • Hybrid retrievers underperform a single retriever across languages
  • Index looks healthy while coverage stays low for non-Latin scripts
  • Names flip between native, transliteration, and English aliases
  • zh-Hans and zh-Hant never co-retrieve, Thai recall drops with no clear cause

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45 across language variants
  • Coverage of the target section ≥ 0.70 after repair
  • λ remains convergent across three paraphrases and two seeds
  • E_resonance stays flat on long windows that mix scripts
  • Citation fields complete, alias noise does not leak into evidence

Open these first


Map symptoms → structural fixes

Symptom Likely cause Open this
High similarity yet wrong meaning Embedding not multilingual or pre-normalization mismatch embedding-vs-semantic.md
Citations jump sections after translation Snippet schema too loose data-contracts.md · retrieval-traceability.md
zh-Hans and zh-Hant never co-retrieve Variant mapping and width rules missing locale_drift.md
Thai or CJK recall collapses Tokenizer mismatch or missing segmenter tokenizer_mismatch.md
Mixed Latin plus CJK query under-recalls Analyzer split across scripts script_mixing.md
Hybrid retriever worse than single Query parsing split or mis-weighted rerank patterns/pattern_query_parsing_split.md · rerankers.md
Proper nouns oscillate across spellings Missing alias fields and entity shield proper_noun_aliases.md
Inconsistent transliteration causes misses Romanization rules and aliases not aligned romanization_transliteration.md
Language detection drifts Detection contract unlocked or weak samples query_language_detection.md
Search vs index behave differently Analyzer routing error query_routing_and_analyzers.md
Ranking unstable across languages Monolingual reranker or unaligned features hybrid_ranking_multilingual.md
Negations or particles vanish Stopword or morphology rules too aggressive stopword_and_morphology_controls.md
Persistent high ΔS on local language path Need controlled translation bridge with glossary fallback_translation_and_glossary_bridge.md

Fix in 60 seconds

  1. Detect language
    Emit stable language and confidence per the detection contract. If unstable, stop and fix detection first.
    Open → query_language_detection.md

  2. Lock normalization and analyzers
    Keep the same locale, width, accents, and segmentation for both index and search.
    Open → locale_drift.md · query_routing_and_analyzers.md

  3. Protect entities and syntax
    Add alias fields and romanization pairs. Clamp stopwords and morphological rules for scope words like negations or units.
    Open → proper_noun_aliases.md · romanization_transliteration.md · stopword_and_morphology_controls.md

  4. Stabilize ranking and hybrid flows
    Use multilingual reranker or dual-track lexical plus vector, keep ordering deterministic.
    Open → hybrid_ranking_multilingual.md

  5. Use a translation bridge only as last resort
    Enable only when the native path keeps high ΔS. Always pair with a glossary.
    Open → fallback_translation_and_glossary_bridge.md

  6. Verify
    With bilingual and code-switch test sets confirm ΔS ≤ 0.45 and Coverage ≥ 0.70, λ convergent.
    Open → code_switching_eval.md


Store-agnostic quick recipes

  • Normalize the same way for corpus and queries before any vector store, keep tokenizer consistent on both sides
  • CJK and Thai need segmentation or bigrams, keep critical fields as keyword to protect entities
  • If you cannot use multilingual embeddings, add a lexical sidecar then align features in a deterministic rerank

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame
Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow