New

Work in progress: Agents Directory has just launched. Stay tuned, more content is on the way.

Rankings

Independent model, benchmark, and agent rankings for AI coding, showing what actually leads right now. Refreshed regularly.

Updated June 2026

Best models for your agent

Best models for agents

#ModelProviderOpenCursorBenchContextInput price

1
Claude Fable 5
ProviderAnthropic
Open—
CursorBench72.9%
Context1M
Input price$10/M
2
Claude Opus 4.7
ProviderAnthropic
Open—
CursorBench64.8%
Context1M
Input price$5/M
3
GPT-5.5
ProviderOpenAI
Open—
CursorBench64.3%
Context1.05M
Input price$5/M
4
Claude Opus 4.8
ProviderAnthropic
Open—
CursorBench63.8%
Context1M
Input price$5/M
5
Composer 2.5
ProviderCursor
Open—
CursorBench63.2%
Context200K
Input price$0.5/M
6
Composer 2
ProviderCursor
Open—
CursorBench52.2%
Context200K
Input price$0.5/M
7
Gemini 3.5 Flash
ProviderGoogle
Open—
CursorBench49.8%
Context1.049M
Input price$1.5/M
8
Claude Sonnet 4.6
ProviderAnthropic
Open—
CursorBench49%
Context1M
Input price$3/M
9
Kimi K2.6
ProviderMoonshot
Open
CursorBench47.6%
Context262K
Input price$0.68/M
10
Kimi K2.5
ProviderMoonshot
Open
CursorBench31.9%
Context262K
Input price$0.35/M

Benchmark leaderboards

Artificial Analysis Intelligence Index

The most-cited composite intelligence score: a 0–100 index combining knowledge, reasoning, math, coding, and agentic evaluations (GPQA Diamond, HLE, IFBench, SciCode, Terminal-Bench Hard, τ²-Bench, and more). Higher is better.

LeaderClaude Fable 564.9

CursorBench 3.1

Ambiguous, multi-file tasks from real Cursor sessions that test codebase understanding, bugfinding, planning, and code review.

LeaderClaude Fable 572.9%

FrontierCode Diamond

The 50 hardest FrontierCode tasks: the toughest production-code problems, graded on whether maintainers would merge the patch. Scores stay low by design. Higher is better.

LeaderClaude Opus 4.813.4%

FrontierCode Main

Cognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.

LeaderClaude Fable 546.3%

Top AI coding agents

Agent

HermesSkills, Integrations & Self-hosting for Hermes
Claude CodeSkills, Plugins & MCP Servers for Claude Code
CodexSkills & MCP Servers for OpenAI Codex
OpenClawSkills & Automation for OpenClaw

Rankings

Independent model, benchmark, and agent rankings for AI coding, showing what actually leads right now. Refreshed regularly.

Best models for your agent

Best free models for Hermes

Best models for Claude Code

Best models for Codex

Best models for Hermes

Best models for OpenClaw