As we get dazzled by Claude Cowork’s ability to orchestrate and automate aspects of our boring and mundane work, in a year when agentic AI is supposed to get better than what mere AI chatbots can accomplish, one paper by a familiar Indian-American techie is asking us to pause and reassess our enthusiasm.
A quietly circulated but mathematically relevant paper – recently spotlighted in Wired – is threatening to derail the AI agents hype train we seem to be embarking upon, coldly reminding us that marketing buzzwords don’t necessarily translate into actual capability.
The core argument in the paper, which is titled Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models, is completely based on computational theory and limits of mathematics.
The authors claim, with actual math rather than Silicon Valley marketing talk, that large language models (LLMs) – the very nuts and bolts powering ChatGPT, Gemini, Claude, and today’s growing list of AI agents – are mathematically incapable of reliably executing computational or agentic tasks once those tasks cross a modest complexity threshold.
From what I can gather, the study claims there are hard limits to what AI models can actually do, despite all their fancy tricks and biggest GPUs. And who’s behind this paper? Ex-CEO of Infosys, Vishal Sikka.
Vishal Sikka is also the former Chief Technology Officer of SAP and a bona-fide computer science heavyweight who once studied under John McCarthy, the godfather of artificial intelligence (the guy who coined the term “AI”). Sikka isn’t some fringe commentator on the topic either, he’s walked the halls of enterprise tech leadership, serving on Oracle’s and BMW Group’s board of directors, with an ongoing advisory role at the Stanford Institute of Human-Centered AI. Basically, Sikka has the credibility to back his words.
Also read: AI and LLMs can get dumb with brain rot, thanks to the internet
In their paper, Vishal Sikka and his teenage co-author son (who’s studying at Stanford) aren’t just lamenting hype of AI, but they’re offering a structural critique into the transformer-based architectures at the heart of LLMs. The crux of their argument is that the current transformer-based agentic AI simply can’t carry out computations or verify results reliably beyond a limited degree of complexity. That’s not a bug, they argue, but it’s baked into the very definition of how these models compute.
Of course, this study is at odds with the industry that has spent much of last year heralding “the year of the AI agent” from 2025 into 2026, painting visions of autonomous digital workers replacing humans in workflows, kitchens, and possibly therapist chairs.
But real-world deployments have shown what many engineers already knew: hallucinations aren’t just occasional hiccups, they’re a persistent part of the AI fabric. Even internal research from major labs admits that 100% perfect accuracy is a pipe dream – OpenAI is on record of this claim.
So what does this mean for AI agents? Well, if the foundational model can’t mathematically guarantee correctness on tasks above basic complexity, then an agent built on it can’t either — not without bolting on external systems to catch errors, enforce rules, and provide oversight. And that’s precisely the nuance Sikka’s paper emphasizes, that a pure LLM alone is capped, but supporting systems with external verification layers might still do useful work.
In other words, the raw engine isn’t going to drive your autonomous factory anytime soon. But with enough engineering around it – think of it as scaffolding, additional verification – you might still get practical value.
Also read: 26,000 scientists still can’t explain exactly how AI models think and how to measure them