BrowseComp

Name: BrowseComp leaderboard
Creator: OpenAI

Web

OpenAI's hard web-browsing benchmark: 1,266 questions whose answers are hard to find but easy to verify, requiring persistent multi-step browsing. Higher is better.

BrowseComp questions are built with an inverted design: trainers start from a known fact and write a question whose answer does not appear on first-page search results and that another person cannot solve within ten minutes. Answers are short and graded by a model checking semantic equivalence against the reference. The benchmark is essentially unsolvable without tools (GPT-4o scored 1.9% with browsing at launch, and human trainers solved only 29.2% within a two-hour limit), so all rows here are with browsing and tools enabled, single-agent configurations only. Multi-agent harnesses score higher and are excluded to keep rows comparable; there is no single official leaderboard, so scores come from vendor model cards compiled by aggregators.

Leaderboard

#ModelScoreProvider

1
GPT-5.5 ProBrowsing, parallel compute
90.1%OpenAI
2
GPT-5.4 ProBrowsing
89.3%OpenAI
3
Claude Fable 5Single agent, web search
86.9%Anthropic
4
Gemini 3.1 Pro PreviewSearch + Python + Browse
85.9%Google DeepMind
5
GPT-5.5Browsing
84.4%OpenAI
6
Claude Opus 4.8Single agent, web search
84.3%Anthropic
7
Claude Opus 4.6Max thinking, tools
84%Anthropic
8
MiniMax-M3Browsing
83.5%MiniMax
9
DeepSeek V4Pro, max thinking, browsing
83.4%DeepSeek
10
Kimi K2.6Single agent, tools
83.2%Moonshot AI
11
GPT-5.4Browsing
82.7%OpenAI
12
Claude Opus 4.7Adaptive thinking, web search
79.3%Anthropic
13
GLM 5.1Browsing
79.3%Z.AI
14
GPT-5.2 ProBrowsing
77.9%OpenAI
15
GLM 5Browsing
75.9%Z.AI
16
Claude Sonnet 4.6Max thinking, tools
74.7%Anthropic
17
GPT-5.2xHigh, tools
65.8%OpenAI
18
Kimi K2 ThinkingTools
60.2%Moonshot AI
19
Gemini 3 ProSearch + Python + Browse
59.2%Google DeepMind

Sources:

OpenAI: BrowseComp announcement BrowseComp paper (arXiv 2504.12516)openai/simple-evals (dataset + grader)

Share:

Details:

Category
Web
Created by
OpenAI
Models tested
19
Leader
GPT-5.5 Pro
Top score
90.1%

Updated June 2026