Update ocr-parsing-checklist.md

This commit is contained in:
PSBigBig 2025-09-05 11:22:29 +08:00 committed by GitHub
parent 98ba9fc845
commit 89954840a7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,5 +1,22 @@
# OCR Parsing Checklist — Input Integrity
<details>
<summary><strong>🧭 Quick Return to Map</strong></summary>
<br>
> You are in a sub-page of **MemoryLongContext**.
> To reorient, go back here:
>
> - [**MemoryLongContext** — extended context windows and memory retention](./README.md)
> - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)
> - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)
>
> Think of this page as a desk within a ward.
> If you need the full triage and all prescriptions, return to the Emergency Room lobby.
</details>
OCR and parsing errors are one of the most common silent killers of retrieval pipelines.
Text looks fine to the eye, but models drift because tokens, spacing, or casing have changed.
This checklist ensures **integrity at the source layer** before embeddings or retrieval begin.