Fastalytics

AI coding model leaderboard

Name: AI coding model leaderboard
Creator: Fastalytics

Compare leading AI coding models using Artificial Analysis benchmark data, with developer-relevant context for coding scores, speed, latency, pricing, and context window.

Current Leaderboard Metric Inventory

Current coding leaderboard

Rankings use the Artificial Analysis coding index. Supporting fields are included only when present in the current dataset.

Additional developer-relevant metrics

Public Artificial Analysis evaluations and methodology, labeled by priority, effort, and current availability in this codebase.

Live fields in the leaderboard

Coding, intelligence, benchmarks, pricing, speed, latency, and context.

Additional typed fields

Typed locally but not yet shown in the UI.

Items needing broader sourcing

Agent, long-context, and workload-specific metrics to source.

First implementation slice

Highest-impact additions based on developer relevance and implementation cost.

Terminal-Bench Hard

Input price per 1M tokens

Output price per 1M tokens

Time to first answer token

Max output tokens

Release date

Coding and agent benchmarks

The most directly useful evaluations for developers choosing models for coding, tool use, long-context work, and agentic execution.

Metric	What it measures	Why developers care	Priority	Effort	Status
Artificial Analysis Coding Index	Artificial Analysis' aggregate coding ranking.	Best single headline score for code-heavy model selection.	High	Low	Live now
LiveCodeBench	Fresh competitive-programming code generation and repair tasks.	Strong proxy for solving unseen coding problems under execution-based checks.	High	Low	Live now
SciCode	Scientist-curated coding tasks from real lab workflows.	Useful for research, data, and numerics-heavy coding workflows.	High	Low	Live now
Terminal-Bench Hard	Agentic work in terminal environments across engineering and ops tasks.	Best public signal here for tool use, shell execution, and multi-step agent work.	High	Medium	New source likely
GDPval-AA	Agentic task completion on real-world occupation workflows with tools.	Useful for ranking full agents, especially browsing and shell-enabled systems.	High	High	New source likely
IFBench	Instruction-following under diverse, verifiable constraints.	Good signal for prompt adherence, formatting fidelity, and tool-output compliance.	High	High	New source likely
Artificial Analysis Long Context Reasoning	Reasoning over long documents from 10k to 100k tokens.	Important for repo chat, large specs, logs, and multi-file code review.	High	High	New source likely
tau2-bench Telecom	Dual-control conversational task execution in support workflows.	More support-oriented, but still relevant for agent reliability in constrained workflows.	Medium	High	New source likely