From Minecraft to multi-world intelligence: Google’s ambition behind SIMA 2

Updated on 17-Nov-2025
HIGHLIGHTS

Google’s SIMA 2 pushes AI toward general intelligence using 3D games

DeepMind’s multi-world agent learns, reasons, and improves autonomously across environments

Gemini-powered SIMA 2 showcases transfer learning and real-world robotics potential

Google DeepMind is not building a better gamer; it is building a better brain. The Scalable Instructable Multiworld Agent (SIMA) project, now in its second generation, is using the complex, open-ended universe of 3D video games, the very same worlds where players might be building a castle in Valheim or exploring a galaxy in No Man’s Sky, as the proving ground for nothing less than Artificial General Intelligence (AGI).

With SIMA 2, powered by the company’s flagship Gemini model, Google has moved its AI from being a passive instruction-follower to an active, reasoning co-collaborator, marking a pivotal step in the quest for truly generalist AI.

Also read: OpenAI’s group chat feature explained: The bold step toward AI-assisted collaboration

Moving beyond the single game

For years, AI breakthroughs in gaming were defined by specialization. DeepMind’s AlphaGo mastered the ancient game of Go, and AlphaStar conquered the real-time strategy of StarCraft II. But their brilliance was confined to a single, structured environment.

The reality of the world, whether physical or virtual, is chaos. That’s where SIMA comes in. SIMA 2 is designed to act like a human player: it observes the game screen, processes natural language commands, and uses a virtual keyboard and mouse to operate, all without access to the game’s hidden internal code.

The major breakthrough of SIMA 2 lies in its transfer learning capability. Having been trained across a diverse portfolio of commercial games, including survival, building, and exploration titles like Goat Simulator 3, Valheim, and Satisfactory, the agent learns abstract concepts. If it learns what “mining” means in one game, it can apply that understanding to “harvesting” in a completely different one.

“SIMA 2 is a step change and improvement in capabilities over SIMA 1. It’s a more general agent that can complete complex tasks in previously unseen environments.”

– Joe Marino, Senior Research Scientist at DeepMind

A shift to reasoning collaboration

The engine behind SIMA 2’s dramatic improvement is the Gemini 2.5 flash-lite model. The integration elevates the agent from a simple command executor to a conversational partner:

  • Understanding Abstract Goals: Where SIMA 1 might fail on a vague command like “find a campfire,” the Gemini-powered SIMA 2 reasons internally. It can decompose a high-level goal like “We need to build a safe shelter” into actionable steps, interpreting “safe” as meaning “requiring high walls and a door.”
  • Multimodal Communication: The agent can interpret and act on more than just text. Users can communicate through emojis, sketches drawn on the screen, or even in different languages, allowing for a much more natural, human-like interaction.
  • Explaining Intent: A key feature of SIMA 2 is its ability to talk through its plans. It can describe its intentions and detail the intermediate steps it’s taking to accomplish a goal, turning a sequence of actions into a transparent, collaborative dialogue.

AI That Teaches Itself

Perhaps the most groundbreaking advancement in SIMA 2 is its autonomous self-improvement cycle.

The original SIMA relied heavily on recorded human gameplay data. SIMA 2 uses this as a base, but then shifts into a self-directed learning mode. Utilizing a separate Gemini model, the agent:

Also read: Beyond left and right: How Anthropic is training Claude for political even-handedness

  1. Generates New Tasks: It creates novel, increasingly complex goals for itself within the game.
  2. Attempts and Scores: It autonomously attempts the tasks and uses an internal reward model to score its own performance, logging both successes and failures.
  3. Refines the Model: This self-generated “experience data” is then used to train the next version of SIMA 2, allowing it to improve through trial and error – effectively learning from its own mistakes without human intervention.

This ability to rapidly adapt and learn in a new game, or even in entirely AI-generated 3D worlds created by Google’s Genie model, demonstrates a level of generalization that far surpasses previous benchmarks. The success rate for complex tasks has doubled compared to SIMA 1’s initial performance.

Beyond the Screen

For DeepMind, SIMA is not a gaming tool; it is a virtual robot. The skills it is mastering within the simulated 3D environment, navigation, tool use, understanding complex instructions, and collaborative task execution, are the exact building blocks required for developing advanced, general-purpose physical robots.

Video games offer a safe, scalable, and boundless sandbox to stress-test these agents. Every time SIMA 2 successfully chops down a tree in Valheim or repairs a spaceship in No Man’s Sky, it is practicing a core capability needed by a future AI assistant in the real world.

While SIMA 2 remains a research-preview agent, currently unavailable to the public and limited to a few academic and developer partners, Google DeepMind’s ambition is clear. By harnessing Gemini’s reasoning power and training its agent across a multi-world intelligence landscape, they are not just aiming to win a game; they are laying the groundwork for the general-purpose, helpful AI agents that will eventually operate in all the complex, dynamic, and unpredictable environments of the real world.

Also read: Indian youth are using AI chatbots for emotional support, warn Indian researchers

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Connect On :