vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-26 10:40:55 +00:00

onestardao c3075fb1f2 sync footer navigation (remove clinics, align PM versions)

2026-03-06 12:46:37 +00:00

15 KiB

Raw Permalink Blame History

Fallback Translation and Glossary Bridge · Global Fix Map

🧭 Quick Return to Map

You are in a sub-page of Language.
To reorient, go back here:

Language — multilingual processing and semantic alignment

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

When native-language recall keeps missing the right snippet, switch to a controlled translation bridge with a domain glossary and alias shield. Translate only where needed, protect entities and negations, and verify improvement with ΔS, λ, and coverage.

Open these first

Visual map and recovery → rag-architecture-and-recovery.md
End to end retrieval knobs → retrieval-playbook.md
Traceability schema → retrieval-traceability.md
Contract the payload → data-contracts.md
Language overview → multilingual_guide.md
Tokenizer variance → tokenizer_mismatch.md
Mixed scripts → script_mixing.md
Locale normalization → locale_drift.md
Romanization rules → romanization_transliteration.md
Proper nouns and aliases → proper_noun_aliases.md
Language detection → query_language_detection.md
Analyzer routing → query_routing_and_analyzers.md
Multilingual ranking → hybrid_ranking_multilingual.md
Bilingual eval sets → code_switching_eval.md
Stopwords and morphology → stopword_and_morphology_controls.md

Core acceptance targets

ΔS(question, retrieved) ≤ 0.45 on three paraphrases and two seeds
Coverage of target section ≥ 0.70
λ convergent after the bridge, across native vs pivot language
No entity corruption or negation loss in the final citation set
Rank@k improves or remains flat after the bridge is enabled

When to enable the bridge

Enable only if all three hold:

Native path shows flat-high ΔS across k settings.
Query language and corpus language differ or the corpus is mixed locale.
Entity recall improves during a quick pivot test without harming citations.

If any native pipeline item is obviously wrong, fix that first. See tokenizer, analyzer, or morphology pages above.

What usually breaks

Symptom	Likely cause	Open this
Correct doc exists yet never ranks in top k	analyzer or tokenizer mismatch between query and store	tokenizer_mismatch.md · query_routing_and_analyzers.md
Names translate or transliterate inconsistently	missing alias shield or mixed romanization	proper_noun_aliases.md · romanization_transliteration.md
Negations flip meaning after MT	no do-not-translate list for negation tokens	stopword_and_morphology_controls.md
CJK queries degrade when pivoting via English	script segmentation and width rules differ by stage	script_mixing.md
Turkish/Greek accent fold changes matches	locale normalization not pinned per stage	locale_drift.md
Good recall but order is noisy across languages	reranker trained mono-lingual or features not aligned	hybrid_ranking_multilingual.md

Design: glossary bridge in two modes

Mode A — Query-side pivot Translate the query to the corpus language with a glossary and alias shield. Run retrieval native to the store, then reason in user language.

Mode B — Corpus-side pivot Keep query in user language, retrieve in native, but translate candidate snippets to the user language for reranking and reasoning. Never re-index on the pivot.

Glossary components

do_not_translate: names, products, codes, unit strings, legal terms.
preferred_terms: enforce a deterministic mapping for domain words.
romanization_map: stable transliteration table with 1-to-N aliases.
negation_and_modality: tokens that must survive intact.
protected_char_classes: width, diacritics, punctuation class locks.

Trace fields to log

{
  "bridge_mode": "A|B",
  "pivot_lang": "en|zh|..",
  "glossary_hash": "sha256:...",
  "alias_set_hash": "sha256:...",
  "ΔS_before": 0.xx,
  "ΔS_after": 0.yy,
  "coverage_before": 0.xx,
  "coverage_after": 0.yy
}

Minimal implementation steps

Detect language Use the contract from query_language_detection.md. Refuse fallback if detection is unstable.
Assemble glossary
- Pull domain terms.
- Add aliases from proper_noun_aliases.md.
- Add romanization table from romanization_transliteration.md.
- Add negation and unit strings to do-not-translate.
Choose mode
- Mode A if store is single-locale and analyzers are correct.
- Mode B if store is mixed or analyzers cannot be changed.
Run retrieval Route analyzers per query_routing_and_analyzers.md. For Mode B, translate only candidates for reranking.
Verify Compute ΔS and coverage. Require λ convergent across two seeds and three paraphrases. Log trace fields.
Publish Keep the glossary versioned and pinned in eval reports. Guard with retrieval-traceability.md and data-contracts.md.

Spec: glossary JSON

{
  "version": "glossary_acme_finance_2025_08_30",
  "pivot_lang": "en",
  "do_not_translate": ["Value at Risk", "CAGR", "ROE", "İstanbul", "北京市", "§"],
  "preferred_terms": {
    "账面价值": "book value",
    "净现值": "net present value"
  },
  "romanization_map": {
    "北京市": ["Beijing Shi", "Beijing City"],
    "İstanbul": ["Istanbul", "Stamboul"]
  },
  "negation_and_modality": ["not", "never", "must", "should"],
  "protected_char_classes": ["fullwidth_digit", "narrow_no_break_space"]
}

Copy-paste prompt for the LLM step

You have TXTOS and the WFGY Problem Map loaded.

My multilingual issue:
- native_lang: {xx}
- user_lang: {yy}
- mode: {A|B}
- glossary: {do_not_translate, preferred_terms, romanization_map, negation_and_modality}
- question: "{user_question}"
- candidates: [{snippet_id, text, source_url}...]

Do:
1) Apply the glossary strictly. Protect names, units, negations.
2) Perform cite-then-explain. If citations are weak, return the minimal fix and do not fabricate.
3) Return JSON:
{ "bridge_mode": "A|B", "pivot_lang": "en|...", "citations": [...],
  "answer": "...", "ΔS": 0.xx, "coverage": 0.xx, "λ_state": "→|←|<>|×",
  "next_fix": "..." }
Keep it auditable and short.

Eval protocol

Use bilingual and code-switching sets from code_switching_eval.md.
Compare native vs bridge on the same questions and seeds.
Accept only if ΔS ≤ 0.45, coverage ≥ 0.70, λ convergent, and entity recall does not regress.
Report deltas for Rank@k and citation accuracy.

Common gotchas

Translating the index. Never translate and re-index as a “quick fix”. Pivot only at query or candidate stage.
Letting the MT rewrite units or numbers. Add them to do-not-translate.
Dropping diacritics or width during translation. Pin normalization from locale_drift.md and script_mixing.md.
Reranking with a mono-lingual model. If scores are noisy across languages, follow hybrid_ranking_multilingual.md.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

15 KiB Raw Permalink Blame History Unescape Escape