D

DeepSWE

Coding

Datacurve's agentic coding benchmark: each model runs as an autonomous agent on real software engineering tasks and is scored on whether its final patch resolves the issue. Higher is better.

Score vs. cost
Leaderboard
Share:
Details:
  • Category


    Coding
  • DCreated by


    Datacurve
  • Models tested


    12
  • Configs tested


    20
  • Leader


    OpenAIGPT-5.5
  • Top score


    70%

Updated June 2026