Skip to content

LlmWikis knowledge page

Benchmark Page Template

A benchmark page should explain what the benchmark measures, why it matters, what it misses, and how the data was collected.

Section Required fields
What it measures Task family, modality, input/output shape, scoring method, and official source.
Why it matters Practical interpretation for developers, researchers, and evaluators.
Known limitations Contamination, narrow scope, metric flaws, leaderboard comparability, and update cadence.
Leaderboard data Model, score, date, source, notes, and reviewer status.

Publish checklist

  • Link the official benchmark, dataset, paper, or leaderboard source near the top.
  • State the exact score date and whether results are copied, interpreted, or summarized.
  • Explain what a high score does not prove, especially for broad model-selection claims.
  • Document contamination, prompt, tool-use, and evaluation-setting caveats when available.