Update eval_harness.md

2026-04-28 11:40:07 +00:00 · 2026-02-26 15:32:45 +08:00 · 2026-02-26 15:32:45 +08:00 · 9fc01c3de9
commit 9fc01c3de9
parent bcc48b4572
1 changed files with 7 additions and 0 deletions
--- a/ProblemMap/GlobalFixMap/Eval/eval_harness.md
+++ b/ProblemMap/GlobalFixMap/Eval/eval_harness.md
@ -16,6 +16,13 @@
  > If you need the full triage and all prescriptions, return to the Emergency Room lobby.
 </details>

+> **Evaluation disclaimer (eval harness)**  
+> This page sketches a harness for running structured evaluations on AI pipelines.  
+> Any metrics or labels that pass through such a harness remain heuristic outputs of models, scripts and annotators.  
+> They do not become scientific proof just because they flow through this structure.  
+> Use the harness to compare variants inside a controlled scenario, and avoid presenting those numbers as universal claims about model quality beyond that scenario.
+
+---

 A minimal yet strict harness to run repeatable evaluations for RAG and agent pipelines. It fixes the two usual failures. First, non-reproducible runs. Second, noisy metrics that cannot explain drift. Everything here maps to WFGY pages with measurable targets.