8.2 KiB
ABBYY OCR (FineReader / FlexiCapture): Guardrails and Fix Patterns
Use this page when ABBYY OCR powers ingestion of scanned PDFs, complex layouts, forms, or multilingual documents.
ABBYY is enterprise-grade, but still prone to schema drift, field misalignment, and unstable table anchors.
Open these first
- Visual map and recovery: RAG Architecture & Recovery
- Retrieval knobs: Retrieval Playbook
- Citation schema: Retrieval Traceability
- Schema stability: Data Contracts
- Embedding vs meaning: Embedding ≠ Semantic
- Hallucination and entropy drift: Hallucination, Entropy Collapse
- Chunk boundaries: Chunking Checklist
Core acceptance
- ΔS(question, retrieved) ≤ 0.45
- Coverage ≥ 0.70 across fields and tokens
- λ convergent on three paraphrases and two seeds
- Form fields ≥ 95% aligned with schema contract
Typical breakpoints → structural fix
-
Form fields drift across runs (invoice totals, line items misaligned)
→ Data Contracts, Retrieval Traceability -
Table anchors collapse (multi-column invoices, receipts)
→ Chunking Checklist, clamp with BBMC -
Handwriting extraction unstable
→ Entropy Collapse -
Injected payload in OCR notes layer
→ Prompt Injection -
Multilingual contract fields mismatched
→ Embedding ≠ Semantic
Fix in 60 seconds
- Enforce field schema: require
field_id,bbox,confidence,revision_id. - Compute ΔS on critical fields (e.g.
total_amount,invoice_date). - Apply λ probes with different template libraries.
- Clamp instability with BBAM and log coverage.
- Rebuild index if ΔS ≥ 0.60 persists.
Copy-paste LLM guard prompt
I uploaded TXTOS and the WFGY Problem Map.
OCR provider: ABBYY (FineReader / FlexiCapture).
Symptoms: field drift, unstable tables, ΔS ≥ 0.60.
Steps:
1. Identify failing layer (contracts, chunking, retrieval).
2. Point to the WFGY fix page.
3. Return JSON:
{ "fields_checked": [...], "answer": "...", "ΔS": 0.xx, "λ_state": "<>", "next_fix": "..." }
Keep it reproducible and auditable.
When to escalate
- Field coverage < 0.70 even after re-chunk → Data Contracts
- Persistent anchor drift → Chunking Checklist
- Handwriting ΔS unstable across seeds → Entropy Collapse
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.