How Claude Fable 5 ranks on benchmarks
Anthropic's new Mythos-class model tops CursorBench 3.1 and posts the strongest agentic-coding scores reported so far. The numbers, with the one caveat that matters.
•3 min read•Written by
Agents Directory@agentsdir
Anthropic released Claude Fable 5 on June 9. It's the company's first Mythos-class model, priced at $10 in / $50 out per million tokens with a 1M-token context window and built for long-running autonomous work. Here is where it lands, sourced from Anthropic's announcement and the independent CursorBench leaderboard.
CursorBench 3.1
CursorBench evaluates models on ambiguous, multi-file tasks taken from real Cursor sessions. It's the closest thing we have to a production agentic-coding benchmark.
- Fable 5 high (default): 70.6% at $10.81 per task, more than 7 points clear of every other default configuration.
- Fable 5 Max: 72.9%, the top score on the whole leaderboard.
- Next-best defaults: Cursor's Composer 2.5 at 63.2% ($0.55 per task, the value outlier), GPT-5.5 high at 62.6%, Claude Opus 4.8 high at 58.4%.
Artificial Analysis Intelligence Index
Artificial Analysis publishes a composite 0-100 intelligence score that blends knowledge, reasoning, math, coding, and agentic evaluations. It is the most widely cited all-up benchmark outside vendor tables.
- Fable 5 (default): 64.9, the top score in our catalog, about 7 points above Claude Opus 4.7 (57.3) and Gemini 3.1 Pro Preview (57.2).
- Next in this cut: Qwen3.7 Max (56.6), Gemini 3.5 Flash (55.3), MiniMax-M3 (54.7), Grok 4.3 high (53.2).
Anthropic's reported numbers
From the announcement, against Claude Opus 4.8, GPT-5.5 and Gemini 3.1 Pro, with the best score per row highlighted:
| Claude Mythos 5 / Fable 5 | Claude Mythos Preview | Claude Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro | |
|---|---|---|---|---|---|
Agentic coding SWE-Bench Pro | 80.3% | 77.8% | 69.2% | 58.6% | 54.2% |
Agentic coding FrontierCode (Diamond), xhigh | 29.3% | — | 13.4% | 5.7% | — |
Agentic coding Terminal-Bench 2.1 | 88.0%* | — | 82.7% | 83.4% Codex CLI | 70.7% Gemini CLI |
Knowledge work GDPval-AA | 1932 | — | 1890 | 1769 | 1314 |
Knowledge work vision GDP.pdf, no tools | 29.8% | — | 22.5% | 24.9% | 16.7% |
Spatial reasoning Blueprint-Bench 2 | 38.6% | — | 14.5% | 36.2% | 26.5% |
Tool use AutomationBench | 17.4% | — | 15.5% | 12.9% | 9.6% |
Computer use OSWorld-Verified | 85.0% | 85.4% | 83.4% | 78.7% | 76.2% |
Legal Legal Agent Benchmark | 13.3% | — | 10.4% | 2.1% | 0.0% |
Multidisciplinary reasoning Humanity's Last Exam, no tools | 59.0%* | 56.8% | 49.8% | 41.4% | 44.4% |
Multidisciplinary reasoning Humanity's Last Exam, with tools | 64.5%* | 64.7% | 57.9% | 52.2% | 51.4% |
Biology BioMysteryBench, hard | 46.1%* | 29.6% | 40.0% | — | — |
Biology BioMysteryBench, human solved | 83.9%* | 82.6% | 80.4% | — | — |
Cybersecurity ExploitBench (Cap) | 78.0%* | 69.0% | 40.0% | 34.0% | — |
Health HealthBench Professional | 66.0%* | 64.7% | 56.9% | 51.8% | — |
Anthropic reports the higher score of Claude Mythos 5 and Claude Fable 5; the two land within 1-3 points of each other except on starred (*) benchmarks. See the Mythos caveat below.
The Mythos caveat
Anthropic reports the higher score of two models: Claude Mythos 5 (the identical model with safeguards lifted, restricted to vetted researchers) and the generally available Fable 5. The two land within 1-3 points of each other except on starred (*) benchmarks, where Fable 5's safeguards redirect cybersecurity and biology queries to Opus 4.8 (under 5% of sessions). On those, Fable 5's effective score sits closer to Opus 4.8.
Bottom line
At twice Opus 4.8's price, Fable 5 is not the default for everything. But on long-horizon agentic coding it is currently the strongest model available, and the CursorBench cost curve shows the premium buying capability, not just tokens. Pricing, host availability, and sources on the Claude Fable 5 model page.
Sources
- Anthropic: Introducing Claude Fable 5 and Claude Mythos 5 for the announcement, pricing, and the reported benchmark table
- CursorBench leaderboard for independent agentic-coding scores and per-task cost
- CursorBench 3.1 on Agents Directory for the live leaderboard we keep updated
- Artificial Analysis Intelligence Index for the composite intelligence methodology
- Intelligence Index on Agents Directory for the live leaderboard we keep updated
- Claude Fable 5 model page for pricing, context window, and host availability