diff --git a/ProblemMap/GlobalFixMap/Eval/eval_semantic_stability.md b/ProblemMap/GlobalFixMap/Eval/eval_semantic_stability.md index 4ed94a8b..2b105966 100644 --- a/ProblemMap/GlobalFixMap/Eval/eval_semantic_stability.md +++ b/ProblemMap/GlobalFixMap/Eval/eval_semantic_stability.md @@ -16,6 +16,11 @@ > If you need the full triage and all prescriptions, return to the Emergency Room lobby. +> **Evaluation disclaimer (semantic stability)** +> Stability scores in this page are heuristic signals about how outputs move under small changes. +> They do not prove global robustness or safety and should be combined with other checks. + +--- **Goal** Quantify how **stable** your pipeline is under small, *non-semantic* perturbations: different seeds, low temperature noise, and benign **prompt jitters** (punctuation/whitespace/synonym swaps). A robust system should keep claims, citations, refusals, and constraint echos **invariant** (or nearly so).