What is OpenRouter Fusion? How the compound model works

OpenRouter Fusion fans your prompt out to a panel of models, has a judge structure their answers, then a synthesizer writes the final one. Here is how it works, what it costs, and when to use it.

5 min readWritten byAgents Directory's profileAgents Directory

OpenRouter announced Fusion on June 14, calling it "the smartest compound model in the market" and pitching it as Fable-level intelligence at roughly half the price. Instead of routing your prompt to a single model, Fusion runs several at once and merges their answers into one. You call it like any other model, through the openrouter/fusion slug, and everything else happens server-side.

The timing was not subtle. Fusion arrived the same week the US government directed Anthropic to suspend Claude Fable 5 and Mythos 5 for all foreign nationals, which pulled the frontier model it benchmarks against out of reach for much of the world. This post is the neutral explainer: what Fusion is, how to call it, what the numbers say, and what it actually costs.

How Fusion works

A Fusion request runs in three stages, all on OpenRouter's servers:

  1. Fan-out. Your prompt goes to a panel of models in parallel (1 to 8 of them), each with web search and web fetch tools enabled, so every panelist can ground its own answer.
  2. Judge. A judge model reads every response and produces a structured analysis: consensus points, contradictions, partial coverage, unique insights, and blind spots.
  3. Synthesis. A final model writes the answer grounded in that analysis, rather than just averaging the panel or picking a winner.

The default "Quality" preset uses a panel of Claude Opus 4.8, the latest GPT, and the latest Gemini Pro. There is also a "Budget" preset that swaps in cheaper models. You can replace either one entirely (more on that below).

OpenRouter's own framing of the result is worth quoting: "Fusion is neurodiversity, but for models." The bet is that a structured disagreement between different model families produces a better answer than any one of them alone.

How to call it

There are three ways in, from simplest to most controlled.

As a model slug. Treat it like any model:

{
  "model": "openrouter/fusion",
  "messages": [{ "role": "user", "content": "..." }]
}

As a server tool. Add it to your tools array and let your outer model decide when a question is worth a panel:

{
  "tools": [{ "type": "openrouter:fusion" }]
}

Set tool_choice: "required" to force Fusion on every request.

With a custom panel. Pass your own panelists and judge through a parameters object:

{
  "tools": [
    {
      "type": "openrouter:fusion",
      "parameters": {
        "analysis_models": [
          "~google/gemini-flash-latest",
          "deepseek/deepseek-v3.2",
          "~moonshotai/kimi-latest"
        ],
        "model": "~anthropic/claude-opus-latest"
      }
    }
  ]
}

analysis_models is the panel (1 to 8 models), and model is the judge that produces the structured analysis. Other knobs include max_tool_calls (default 8, range 1 to 16), max_completion_tokens, reasoning, and temperature, all forwarded to the panel and judge calls. Two guardrails to know: panel and judge models cannot recursively call Fusion, and server tools are in beta, so the API and behavior may still change.

What the benchmarks show

OpenRouter benchmarked Fusion on DRACO, a deep-research benchmark from Perplexity: 100 tasks across 10 domains (law, medicine, finance, product comparison, and more), each graded against about 39 weighted criteria where wrong answers carry negative weight, so verbosity cannot game the score. The dataset is public on Hugging Face.

Here are OpenRouter's reported results across every panel and solo run it tested. The panelists are Claude Fable 5, GPT-5.5, Gemini 3.1 Pro, Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro.

Every fusion panel (synthesized by Opus 4.8) beat the strongest solo model it contained. Lining each panel up against its best solo member is the cleanest way to see the synthesis effect:

Fusion panelBest solo memberLift
Top pair
Fable 5 + GPT-5.5
69.0%65.3%+3.7
Frontier trio
Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro
68.3%60.0%+8.3
Self-panel
Opus 4.8 run twice
65.5%58.8%+6.7
Budget panel
Gemini 3 Flash + Kimi K2.6 + DeepSeek V4 Pro
64.7%60.3%+4.4

Lift is the panel score minus its strongest member's solo score. The two Fable 5 rows in the chart were scored on only 93 of 100 tasks, because Fable 5's content filters blocked 7, so their numbers sit slightly high.

The budget panel is the headline. Three cheap models fused together scored 64.7%, beating solo GPT-5.5 and solo Opus 4.8, and landing within 1% of solo Fable 5 (the model most of the world had just lost access to) at roughly half the cost. The self-panel result is just as telling: two Opus 4.8 runs fused together still gained 6.7 points over a single Opus call, so much of the benefit comes from the structured judge-and-synthesize step, not only from mixing model families. OpenRouter estimates roughly three quarters of the lift comes from synthesis and about one quarter from panel diversity.

Read these as a strong signal, not a final verdict. They are OpenRouter's own run on a third-party benchmark, DRACO is text-only and English-only, and OpenRouter notes that absolute scores can swing 10 to 25 points depending on the judge, though the relative ranking holds. DRACO also leaves out long-horizon agentic tasks, and OpenRouter is upfront that Fusion is not a drop-in replacement for a dedicated coding model.

What Fusion costs

The headline "half the price" is half the price of Fable 5, not half the price of a single small model. Fusion itself has no separate fee. You pay the underlying models, summed: every panelist's completion plus the judge call, at OpenRouter's normal pass-through prices. Running three frontier models and a judge on one prompt is several times the cost of picking one model well, so the savings only show up against an expensive frontier baseline like Fable 5. To see exactly which models ran on a given request, check the Activity tab.

Latency follows the same logic. A Fusion call is often 2 to 3 times slower than a single call, because it waits on the slowest panelist before the judge and synthesizer can run.

When to use it (and when not)

Fusion fits hard, open-ended work where a single frontier call is your fallback anyway: deep research, multi-source analysis, anything where being wrong is expensive and you would otherwise reach for the most capable (and most restricted) model available. The compound approach also routes around single-vendor dependence, which matters more in a week when a frontier model can disappear by directive.

It is a poor fit for sub-second chat, high-volume batch jobs where paying for several models per request adds up fast, and coding-specific tasks where a specialized model already wins. If you want to see how the individual panel models stack up before you assemble your own, our model rankings and the best free models on OpenRouter are a good starting point.

Sources

Share: