FrontierCode Main

Coding

Cognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.

FrontierCode's 150 tasks were hand-selected by 20+ open source maintainers from 36 flagship repositories, then graded on behavioral correctness, regression safety, scope discipline, test quality, and codebase conventions. Main is the 100 hardest tasks. Score is each trial's weighted rubric value, counted only once it clears every blocking criterion (else 0), averaged over the tasks.

Leaderboard
Share:
Details:

Updated June 2026