FrontierCode Main
CodingCognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.
FrontierCode's 150 tasks were hand-selected by 20+ open source maintainers from 36 flagship repositories, then graded on behavioral correctness, regression safety, scope discipline, test quality, and codebase conventions. Main is the 100 hardest tasks. Score is each trial's weighted rubric value, counted only once it clears every blocking criterion (else 0), averaged over the tasks.
Leaderboard
#ModelScoreProvider
- 146.3%Anthropic
- 234.3%Anthropic
- 325.5%OpenAI
- 423%Anthropic
- 517.8%OpenAI
- 616.7%Google DeepMind
- 716%Moonshot AI
- 815.1%Anthropic
- 96.9%Moonshot AI
- 106%MiniMax
- 115.3%MiniMax
- 124.8%Google DeepMind
Details:
Category
CodingCreated by
CognitionModels tested
12Leader
Claude Fable 5Top score
46.3%
Updated June 2026