Beyond ChatGPT: ‘Godmother of AI’s bold bet on spatial intelligence with World Labs
Godmother of AI, Dr Fei-Fei Li pushes AI beyond text toward real-world spatial intelligence
World Labs builds world models to simulate, predict and coordinate physical environments
Spatial intelligence aims to transform industries and ensure AI serves humanity
On most days, the AI revolution feels like something that happens inside our device screens – tucked inside laptops, coaxed through chat windows, flickering across GPUs in distant data centers. Since ChatGPT went mainstream in 2022, Generative AI has rewritten how we communicate, code, research, and create. But step outside your screen and the physical world looks… largely unchanged. Hospitals still scramble in winter surges. Power grids still buckle under pressure. Robots still struggle with stairs.
SurveyDr Fei-Fei Li thinks that gap – between what AI can say and what it can do – is the defining problem of our age. And she’s betting her next decade on fixing it, with her new startup World Labs.
Bringing spatial intelligence to AI
Dr Li, the Stanford professor whose work on ImageNet helped launch modern AI, calls this next phase “spatial intelligence” – AI that perceives, reasons and acts inside the real world, not just inside text boxes.
Her new venture, World Labs, is building the foundational Large World Models that might finally push AI off the screen and into the messy, physical environments where value is created and decisions truly matter.
Also read: How WeatherNext 2 works: Google DeepMind’s AI model for faster, more accurate forecasts
Imagine this scenario, where a utility operator braces for wildfire-level winds. And a spatially intelligent AI isn’t waiting for a prompt, but it predicts how the storm will shift, reroutes capacity, dispatches a drone to inspect a transformer likely to fail, and alerts responders before the first spark. Or imagine a hospital in peak monsoon season – beds filling up, hallways jammed. Instead of spreadsheets and intuition, a world model simulates bottlenecks, reshapes staffing patterns, and choreographs autonomous helpers. This isn’t passive analysis, according to Dr Li. It’s active, embodied coordination.

Li’s co-founders – Justin Johnson, Christoph Lassner, Ben Mildenhall – are heavyweights in computer vision and graphics, and together they’re building models that understand objects, physics, motion and cause-and-effect. In short: the ingredients of reality.
Enter Marble, 3D spaces visualised with a prompt
World Labs’ first product, Marble, offers an early glimpse. Give it a text prompt or a single photo and it generates an explorable 3D environment. Not a scene. A world. You don’t talk to Marble so much as step into whatever it conjures. It’s a toy box for creators today, but its implications spill far beyond VFX studios.
Also read: Elon Musk laughs at Jeff Bezos copying his AI idea: Here’s what happened
Because the physical economy is the real prize. Most global GDP isn’t generated in apps or documents – it’s born in factories, supply chains, farms, ports, hospitals, grids, and construction sites. These domains run on physics, uncertainty and timing. Humans learn them through embodied experience – millions of micro-interactions we take for granted. LLMs don’t have that grounding. World models might.
Li frames it sharply: “A language model reads a book and predicts the next sentence. A world model watches a movie, predicts the plot twist, and lets you rewrite the ending on the fly.” There’s a cinematic quality to the analogy, but the truth is industrial. With spatial intelligence, companies can model decisions before acting – stress-testing logistics networks, rehearsing safety scenarios, redesigning manufacturing lines digitally, long before anyone moves a robot arm.

It’s the difference between reacting and anticipating.
World models are the missing piece of robotics driven AGI
And in robotics – the perennial “almost there” frontier – world models may be the missing piece. Robots learn painfully slowly in the real world because every mistake is costly. But in simulation? You can run millions of lifetimes before lunch. Marble is the amuse-bouche for the embodied AI era: a hint of how machines might one day rehearse reality before entering it.
Yet Dr Fei-Fei Li always circles back to humanity. From ImageNet to spatial intelligence, her work is anchored in one simple principle… that intelligence must ultimately serve people. As AI steps into the physical world, that principle becomes a guardrail, not a tagline. The winners of the next era, she believes, won’t be those building the biggest models – they’ll be the ones making the most responsible choices.
If spatial intelligence succeeds, AI won’t just understand our world. It will help us reimagine it, in a way that makes us look beyond prompts or outputs on our phone or laptop screens.
Also read: Yann LeCun leaving Meta: AI expert’s next project is extremely ambitious
Jayesh Shinde
Executive Editor at Digit. Technology journalist since Jan 2008, with stints at Indiatimes.com and PCWorld.in. Enthusiastic dad, reluctant traveler, weekend gamer, LOTR nerd, pseudo bon vivant. View Full Profile