Sakana AI’s TreeQuest unites large language models into an AI dream team

Updated on 08-Jul-2025

In a move that could reshape how enterprises deploy AI, Japanese research lab Sakana AI has introduced TreeQuest, an open-source framework built on a novel technique that allows multiple large language models (LLMs) to collaborate on a single task. Rather than relying on one monolithic model, TreeQuest uses a method called Multi-LLM Adaptive Branching Monte Carlo Tree Search (AB-MCTS), enabling models with different strengths to work together dynamically. The result is a powerful ensemble that can tackle complex tasks far more effectively than any individual model alone.

For businesses and developers alike, this could signal the beginning of a more modular and flexible AI era, one where the best model for the job is chosen in real-time and where the strengths of one model can actively compensate for the shortcomings of another.

Why multiple models might be better than one

The AI arms race over the past few years has mostly focused on creating bigger and better foundation models. Each of these models, whether from OpenAI, Google, Anthropic, or others, has its own unique set of capabilities, often shaped by proprietary training data, model architecture, and optimisation strategies. For instance, one model may excel in logical reasoning but falter in creative writing; another might produce more grounded responses but struggle with code generation.

Sakana AI’s central insight is that these differences aren’t liabilities, but assets. Their research posits that much like human teams benefit from cognitive diversity, AI systems can too. With TreeQuest, instead of trying to build a single model that does everything, developers can orchestrate a team of models that each handle the part of a problem they’re best suited for.

Sakana’s researchers see these biases and varied aptitudes not as limitations, but as precious resources, and they believe that by pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.

What is TreeQuest, and how does AB-MCTS work?

At the heart of TreeQuest is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS) which is an evolution of the decision-making strategy originally made famous by DeepMind’s AlphaGo. AB-MCTS lets an AI system balance two opposing strategies: searching deeper (refining a promising idea) and searching wider (exploring fresh alternatives).

TreeQuest takes this further with Multi-LLM AB-MCTS, which doesn’t just decide whether to deepen or widen the search. It also dynamically chooses which LLM to assign to each step. Early on, the system experiments with a variety of models. Over time, it learns which models perform better for particular types of subtasks and reallocates resources accordingly.

The outcome is a system that performs test-time scaling (also known as inference-time scaling), allowing models to spend more compute at inference rather than relying solely on larger training-time investments. It’s a technique that is gaining traction across the AI research community as a way to unlock more performance without needing to train a bigger model from scratch.

Outperforming solo models on a tough benchmark

To demonstrate its effectiveness, Sakana AI tested the TreeQuest framework on the ARC-AGI-2 benchmark, a suite of challenging visual reasoning problems designed to emulate human-like abstraction. It’s a benchmark where most models struggle, and progress has been slow.

Using TreeQuest, Sakana combined three advanced LLMs: o4-mini, Gemini 2.5 Pro, and DeepSeek-R1. Together, the ensemble managed to correctly solve over 30% of the 120 test problems, far ahead of any of the individual models on their own.

More strikingly, the researchers recorded cases where a task unsolvable by one model alone was completed through cooperation. In one example, o4-mini produced an incorrect answer, which TreeQuest then passed to Gemini 2.5 Pro and DeepSeek-R1. These models analysed the failed output, corrected the logic, and generated a working solution.

This kind of iterative error correction and collaborative reasoning marks a significant step towards practical AI ensembles, systems that aren’t just running in parallel but are strategically building on each other’s work.

Isn’t this the same as Mixture of Experts?

Sakana AI’s TreeQuest takes a fundamentally different approach from traditional Mixture of Experts (MoE) architectures. While MoE operates within a single, unified model by using learned routing to selectively activate internal “expert” subnetworks; TreeQuest works at inference time across completely separate, independently trained LLMs. 

Rather than relying on shared weights or training-time integration, Sakana’s method uses Adaptive Branching Monte Carlo Tree Search (AB-MCTS) to dynamically choose which model to use at each step of a task, based on ongoing performance. This makes TreeQuest more flexible and modular, allowing it to orchestrate multiple frontier models like GPT, Gemini, or DeepSeek in real time. Unlike MoE, which is designed for scaling efficiency, TreeQuest aims to harness diverse model strengths through strategic collaboration, making it particularly well-suited for complex, multi-step reasoning and enterprise use cases where robustness and adaptability are key.

Potential business use cases

While the original research was academic in nature, Sakana AI has open-sourced TreeQuest under the Apache 2.0 licence, making it commercially viable for enterprises to adopt and adapt. While all of this is still early-stage, Sakana AI’s researchers believe that it could be useful in a range of practical tasks.

So far, the team has explored using TreeQuest for algorithmic coding tasks, improving machine learning model accuracy, and even software performance tuning. In one use case, the framework helped optimise web service latency by using multiple LLMs to iteratively propose and refine solutions following a kind of automated trial-and-error methodology on steroids.

There’s also potential in sectors where hallucination is a serious problem. Since different LLMs exhibit different tendencies to fabricate or exaggerate, TreeQuest could assign more grounded models to fact-sensitive tasks, while letting more creative ones focus on ideation, thus, coming up with a hybrid approach to accuracy and fluency.

Why TreeQuest matters in the current AI ecosystem

The idea of using multiple LLMs in tandem is not entirely new, but Sakana AI’s approach formalises and operationalises it in a way that’s accessible and efficient. It’s a recognition that “bigger” isn’t always “better” and that smart orchestration can sometimes outperform brute force.

This could have major implications for AI deployment in enterprises. Today, many companies are locked into single-model ecosystems, limited by API costs, hallucination issues, or domain-specific shortcomings. A TreeQuest-style framework offers an alternative: create a vendor-agnostic AI system that picks the best tool for each part of the job and adapts in real-time.

In doing so, it also shifts the conversation from model supremacy to model cooperation,  from one-size-fits-all to plug-and-play intelligence.

Is this the new AI frontier?

TreeQuest is still a research-led project, and it’s too early to tell how widely it will be adopted. But it presents a compelling vision for the future of AI systems, one that values diversity, collaboration, and adaptive problem-solving over monolithic dominance.

With the framework now freely available, the next phase will likely be shaped by how developers and businesses take it forward. Whether it’s used to build smarter assistants, automate multi-step technical workflows, or serve as the backbone for more reliable enterprise systems, TreeQuest could be the beginning of a new approach to AI architecture: less about creating a single genius, and more about building a great team.

Mithun Mohandas

Mithun Mohandas is an Indian technology journalist with 14 years of experience covering consumer technology. He is currently employed at Digit in the capacity of a Managing Editor. Mithun has a background in Computer Engineering and was an active member of the IEEE during his college days. He has a penchant for digging deep into unravelling what makes a device tick. If there's a transistor in it, Mithun's probably going to rip it apart till he finds it. At Digit, he covers processors, graphics cards, storage media, displays and networking devices aside from anything developer related. As an avid PC gamer, he prefers RTS and FPS titles, and can be quite competitive in a race to the finish line. He only gets consoles for the exclusives. He can be seen playing Valorant, World of Tanks, HITMAN and the occasional Age of Empires or being the voice behind hundreds of Digit videos.

Connect On :