CALM: The model that thinks in ideas, not tokens

HIGHLIGHTS

CALM replaces next-token prediction with continuous vector-based language modeling

Tencent and Tsinghua’s CALM achieves faster, smarter AI with fewer steps

Continuous reasoning in CALM could revolutionize future AI language understanding

For years, every large language model – GPT, Gemini, Claude, or Llama – has been built on the same underlying principle: predict the next token. That simple loop of going one token at a time is the heartbeat of modern AI.

But a new paper from Tencent and Tsinghua University just dropped a bomb on that foundation. It introduces CALM (Continuous Autoregressive Language Models), a framework that could make the next-token paradigm look ancient.

Also read: Project Suncatcher: Google’s crazy plan to host an AI datacenter in space explained

Instead of predicting discrete words, CALM predicts continuous vectors that represent multiple tokens at once. In essence, the model doesn’t think word by word anymore, it thinks in ideas per step.

From tokens to thoughts

In traditional LLMs, every sentence is broken down into thousands of tiny pieces called tokens. Each step predicts the probability of the next token using a massive softmax layer over the model’s vocabulary – often over 100,000 possible options.

CALM skips all that. It operates in a continuous space, producing dense vector representations that capture the meaning of several tokens together. That means instead of saying “the”, “cat”, “sat”, “on”, “the”, “mat” one at a time, CALM emits a continuous vector that encodes the whole “cat-sitting-on-mat” concept in a single step.

Why it’s a big deal

This change isn’t just philosophical, it’s computationally revolutionary.
According to the paper, CALM achieves:

4× fewer prediction steps, since each vector equals roughly four tokens.
44% less training compute, thanks to the removal of the massive softmax bottleneck.
No discrete vocabulary, meaning it’s not limited by tokenization quirks or vocabulary size.
A new metric called BrierLM, which replaces perplexity (the go-to metric for LLMs) by measuring model confidence in continuous predictions.

CALM also introduces an energy-based transformer, which learns without softmax, token sampling, or vocabulary constraints.

Also read: Coca-Cola’s new ad: GenAI scaling content or diluting creative value?

Beyond the vocabulary ceiling

Removing tokenization may sound like a small tweak, but it’s potentially transformative. Tokenization often encodes cultural and linguistic biases, and even simple words can be fragmented unpredictably (“ChatGPT” can become “Chat”, “G”, “PT”).

By moving to continuous reasoning, CALM avoids this limitation altogether. The result: models that reason more fluidly, think across languages, and possibly bridge modalities such as text, vision, and audio more seamlessly.

If it scales, CALM could redefine what it means for a model to “understand” language. Rather than parsing symbols one by one, it could process meaning more like a human brain, compressing context, predicting abstract representations, and generating ideas rather than words.

It’s the difference between speaking in Morse code and streaming full thoughts. Of course, continuous reasoning brings its own challenges: training stability, interpretability, and adapting existing datasets will all require new methods. But conceptually, this might be the most radical shift in language modeling since the Transformer itself.

Also read: Apple’s new Siri will run on Google Gemini models: Here’s how

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.