What is OpenRouter Fusion and how does it beat frontier AI models

What is OpenRouter Fusion and how does it beat frontier AI models

It seems that AI companies’ tendency to design ever-larger, more intelligent models has just been challenged by reality. OpenRouter has introduced Fusion, an API that sends your query to several models, compares all the answers via the judge model, and synthesizes the best answer. At least theoretically, this approach outperforms some of the costliest frontier models available today using models significantly cheaper than those. This is how it works and why the benchmarks are so interesting.

Digit.in Survey
✅ Thank you for completing the survey!

Also read: Best Wi-Fi 6 routers under Rs 5000 in India

Fusion leverages a concept called panel of models. It means that when making a request, the system sends your query in parallel to several models, giving each of them the ability to perform web search and fetch operations. After receiving all the answers, the judge model compares all of them, creating a structured list of agreement areas, contradictions, gaps in information, and valuable ideas each model has contributed. Then the final model creates the answer based on this structured comparison.

It is all hosted on the server side and can be accessed via one API request with the model slug openrouter/fusion.

OpenRouter used the DRACO framework for benchmarking, which is a deep research evaluation suite created by Perplexity AI. It involves conducting 100 research assignments in ten different areas such as law, medicine, finance, and comparisons of products. Each assignment is evaluated based on approximately 39 weighted factors. Importantly, negative marks will be awarded for wrong answers; therefore, no model can achieve a high score simply by bluffing or providing a long-winded response.

The results are impressive. The fusion of Fable 5 and GPT-5.5 into one panel, using Claude Opus 4.8, yielded 69.0 percent in DRACO, outperforming any individual model that was tested. Fable 5 achieved 65.3 percent on its own merits. More importantly, even the budget panel did very well in this test – the fusion of three cheaper models, namely Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro, gave 64.7 percent in DRACO. This performance was better than that of both GPT-5.5, which achieved 60.0 percent, and Claude Opus 4.8, which gave 58.8 percent in separate assessments.

Also read: I visited LG’s innovation gallery and these gadgets feel like the future

OpenRouter also ran an experiment where a single model, Claude Opus 4.8, was fused with itself as a two-model panel. That combination scored 65.5 percent, compared to 58.8 percent for Opus running alone. The 6.7-point jump from running the same model twice suggests the synthesis step itself is doing meaningful work, not just the diversity of different model architectures.

As far as technical details go, it’s worth noting that OpenRouter found during their tests that panels that had access to web search were inadvertently able to locate the grading rubric for DRACO. They fixed this issue by blocking the domains in question from web searches prior to the final benchmarking process, thanks to the ability of their servers to make a one-line change to the configuration.

It’s now available via the OpenRouter API, and also in no-code form via a chatroom at openrouter.ai/fusion. As it turns out, using Fusion requires two or three times as much time as a simple single-model request, which means that it works best for in-depth research and analysis rather than conversation.

What OpenRouter is essentially trying to say with this is that it’s better to use a diverse set of models for better AI performance than it is to increase the size of any given model – a lesson similar to what studies have shown about diverse teams dealing with complex issues. How true this will turn out to be on other kinds of tasks remains to be seen, but DRACO does add some weight to their claim.

Also read: Why LG is making its own robot joints: What that actually means

Vyom Ramani

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack. View Full Profile