A benchmark page should explain what the benchmark measures, why it matters, what it misses, and how the data was collected.
| Section | Required fields |
|---|---|
| What it measures | Task family, modality, input/output shape, scoring method, and official source. |
| Why it matters | Practical interpretation for developers, researchers, and evaluators. |
| Known limitations | Contamination, narrow scope, metric flaws, leaderboard comparability, and update cadence. |
| Leaderboard data | Model, score, date, source, notes, and reviewer status. |
Publish checklist
- Link the official benchmark, dataset, paper, or leaderboard source near the top.
- State the exact score date and whether results are copied, interpreted, or summarized.
- Explain what a high score does not prove, especially for broad model-selection claims.
- Document contamination, prompt, tool-use, and evaluation-setting caveats when available.