Compare Models is a manual decision workflow for now. It helps a reader compare candidates without pretending LlmWikis has live model telemetry, private benchmark access, or universal rankings.
Task fitReasoning, coding, multimodal, retrieval, writing, extraction
Operating fitAPI, local, edge, latency, cost, privacy, data residency
Evidence fitOfficial docs, papers, benchmark sources, evaluation date, caveats
Governance fitLicense, logging, reviewability, tool permissions, rollback path
Comparison worksheet
| Step | Question | Record in the wiki |
|---|---|---|
| 1 | What task will the model actually perform? | Task family, expected input/output, success criteria, and failure tolerance. |
| 2 | Which constraints are non-negotiable? | Privacy, cost ceiling, latency, deployment boundary, license, and human-review rule. |
| 3 | What sources support each capability claim? | Provider docs, model cards, papers, benchmark pages, and date checked. |
| 4 | What evidence is missing? | Unknowns, stale results, unverified community claims, or absent deployment data. |
| 5 | What was decided? | Selected model, rejected alternatives, reason, reviewer, and next review date. |