FrontierCode Main

Name: FrontierCode Main leaderboard
Creator: Cognition

Coding

Cognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.

FrontierCode's 150 tasks were hand-selected by 20+ open source maintainers from 36 flagship repositories, then graded on behavioral correctness, regression safety, scope discipline, test quality, and codebase conventions. Main is the 100 hardest tasks. Score is each trial's weighted rubric value, counted only once it clears every blocking criterion (else 0), averaged over the tasks.

Leaderboard

#ModelScoreProvider

1
Claude Fable 5
46.3%Anthropic
2
Claude Opus 4.8
34.3%Anthropic
3
GPT-5.5
25.5%OpenAI
4
Claude Opus 4.7
23%Anthropic
5
GPT-5.4 Mini
17.8%OpenAI
6
Gemini 3.1 Pro Preview
16.7%Google DeepMind
7
Kimi K2.6
16%Moonshot AI
8
Claude Sonnet 4.6
15.1%Anthropic
9
Kimi K2.5
6.9%Moonshot AI
10
MiniMax M2.7
6%MiniMax
11
MiniMax M2.5
5.3%MiniMax
12
Gemini 3.1 Flash Lite
4.8%Google DeepMind

Share:

Details:

Category
Coding
Created by
Cognition
Models tested
12
Leader
Claude Fable 5
Top score
46.3%

Updated June 2026