Benchmark Page Template

A benchmark page should explain what the benchmark measures, why it matters, what it misses, and how the data was collected.

Section	Required fields
What it measures	Task family, modality, input/output shape, scoring method, and official source.
Why it matters	Practical interpretation for developers, researchers, and evaluators.
Known limitations	Contamination, narrow scope, metric flaws, leaderboard comparability, and update cadence.
Leaderboard data	Model, score, date, source, notes, and reviewer status.

Publish checklist

Link the official benchmark, dataset, paper, or leaderboard source near the top.
State the exact score date and whether results are copied, interpreted, or summarized.
Explain what a high score does not prove, especially for broad model-selection claims.
Document contamination, prompt, tool-use, and evaluation-setting caveats when available.