Update eval_benchmarking.md

This commit is contained in:
PSBigBig × MiniPS 2026-02-26 15:43:53 +08:00 committed by GitHub
parent f970ea0f98
commit 58e2ddd7ec
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -17,10 +17,9 @@
</details>
> **Evaluation disclaimer (benchmarking)**
> This document talks about benchmarking strategies for AI systems and RAG pipelines.
> The examples, scores and comparison plots are scenario specific and depend on the exact models, prompts, datasets and hardware that were used.
> They are intended as engineering guidance for local decision making, not as an official leaderboard or proof that one model is better in every setting.
> When you publish results based on these ideas, you should clearly state the scope and limitations of your benchmark and avoid over claiming what the numbers say.
> All scores and examples on this page are scenario specific debug signals.
> They are not an official leaderboard or scientific proof and do not show that one model is always better.
> Use them as local guidance for your own stack and re run the setup when you change models, data or prompts.
---