Fastalytics

AI coding leaderboard and metric inventory

This page shows the current coding leaderboard derived from Artificial Analysis and a separate inventory of other developer-relevant metrics that could be added to the site.

Leaderboard

Current coding leaderboard

Rankings use the Artificial Analysis coding index. Supporting fields are included only when present in the current dataset.

Inventory

Additional developer-relevant metrics

Inventory based on public Artificial Analysis evaluation pages, performance methodology, and the current Fastalytics API model. Each item is labeled by priority, likely implementation effort, and current availability in this codebase.

Currently exposed
8 live fields

Coding, intelligence, LiveCodeBench, SciCode, blended price, output speed, time to first token, and context window.

Available in the current type model
5 additional fields

Input price, output price, time to first answer token, max output tokens, and release date are already typed locally but not shown in the UI.

Likely requires broader sourcing
24 additional items

Agent benchmarks, long-context evaluations, and workload-specific performance cuts are not represented in the current leaderboard payload.

Recommended additions
First implementation slice

These items have the best ratio of developer relevance to implementation cost based on the current codebase and the public Artificial Analysis materials reviewed for this inventory.

Terminal-Bench Hard
Input price per 1M tokens
Output price per 1M tokens
Time to first answer token
Max output tokens
Release date
Metric group
Coding and agent benchmarks

The most directly useful evaluations for developers choosing models for coding, tool use, long-context work, and agentic execution.

MetricWhat it measuresWhy developers carePriorityEffortStatus
Artificial Analysis Coding IndexArtificial Analysis' aggregate coding ranking.Best single headline score for code-heavy model selection.
High
Low
Live now
LiveCodeBenchFresh competitive-programming code generation and repair tasks.Strong proxy for solving unseen coding problems under execution-based checks.
High
Low
Live now
SciCodeScientist-curated coding tasks from real lab workflows.Useful for research, data, and numerics-heavy coding workflows.
High
Low
Live now
Terminal-Bench HardAgentic work in terminal environments across engineering and ops tasks.Best public signal here for tool use, shell execution, and multi-step agent work.
High
Medium
New source likely
GDPval-AAAgentic task completion on real-world occupation workflows with tools.Useful for ranking full agents, especially browsing and shell-enabled systems.
High
High
New source likely
IFBenchInstruction-following under diverse, verifiable constraints.Good signal for prompt adherence, formatting fidelity, and tool-output compliance.
High
High
New source likely
Artificial Analysis Long Context ReasoningReasoning over long documents from 10k to 100k tokens.Important for repo chat, large specs, logs, and multi-file code review.
High
High
New source likely
tau2-bench TelecomDual-control conversational task execution in support workflows.More support-oriented, but still relevant for agent reliability in constrained workflows.
Medium
High
New source likely