FrontierCode Diamond

Name: FrontierCode Diamond leaderboard
Creator: Cognition

Coding

The 50 hardest FrontierCode tasks: the toughest production-code problems, graded on whether maintainers would merge the patch. Scores stay low by design. Higher is better.

Official source

Claude Fable 5's Diamond score is still pending from Cognition, so it does not yet appear on this board.

Diamond is the 50 most difficult FrontierCode tasks, the hardest production-code problems from real open source repositories. They use the same maintainer-merge rubric with hard blocking criteria (correctness, regression safety, scope). Score is the gated weighted rubric value, counted only once a trial clears every blocker (else 0), averaged over the tasks. As the toughest agentic-coding measure on the board, scores stay low.

Leaderboard

#ModelScoreProvider

1
Claude Opus 4.8
13.4%Anthropic
2
GPT-5.5
6.3%OpenAI
3
Claude Opus 4.7
5.2%Anthropic
4
Gemini 3.1 Pro Preview
4.7%Google DeepMind
5
GPT-5.4 Mini
4.6%OpenAI
6
Kimi K2.6
3.8%Moonshot AI
7
Claude Sonnet 4.6
3.5%Anthropic
8
MiniMax M2.7
2.4%MiniMax
9
MiniMax M2.5
1.1%MiniMax
10
Kimi K2.5
1%Moonshot AI
11
Gemini 3.1 Flash Lite
0.7%Google DeepMind

Share:

Details:

Category
Coding
Created by
Cognition
Models tested
11
Leader
Claude Opus 4.8
Top score
13.4%

Updated June 2026