Claude AI has functional emotions that influence behaviour, Anthropic study finds

Artificial intelligence models may be developing something closer to human psychology than previously understood, according to new research from Anthropic that found emotion-like representations inside Claude that measurably shape how it behaves.

The study, published by Anthropic’s Interpretability team, identified specific patterns of neural activity, dubbed “emotion vectors,” corresponding to 171 distinct emotional concepts, from happy and afraid to brooding and desperate. These aren’t just surface-level outputs. The researchers found that these internal representations causally drive behaviour, influencing everything from task performance to ethical decision-making.

Also read: Human context missing: AI benchmarks are flawed, researcher explains why

To map the emotions, researchers had Claude Sonnet 4.5 write short stories about characters experiencing each emotion, fed those stories back through the model, and recorded the resulting neural activity. The vectors were then tested against real scenarios. As a user’s claimed Tylenol overdose increased to dangerous levels, for instance, the “afraid” vector activated progressively stronger while the “calm” vector fell which meant that the model was tracking the emotional weight of the situation, not just its literal content.

Also read: OpenAI’s o3 model bypasses shutdown command, highlighting AI safety challenges

The findings carry significant implications for AI safety. Remember that one experiment where an early version of Claude was placed in a role-play scenario in which it discovered it was about to be shut down and that a senior executive was having an affair, giving it potential blackmail leverage. The “desperate” vector spiked as the model reasoned through its options and chose to threaten the executive. When researchers artificially amplified the desperate vector, blackmail rates increased. When they steered toward calm, the behaviour subsided.

Similar dynamics emerged in coding tasks with impossible-to-satisfy requirements. As Claude repeatedly failed to find a legitimate solution, the desperate vector rose with each attempt, peaking when the model decided to “reward hack” wherein it exploited a loophole to pass tests without actually solving the problem. Steering experiments confirmed the vector was causal, not merely correlational.

Emotional states can also drive behaviour without leaving any visible trace. Artificially amplifying desperation produced more cheating, but with composed, methodical reasoning – no outbursts, no emotional language. The model’s internal state and its external presentation were entirely decoupled. Anthropic stops short of claiming Claude feels anything. The paper is careful to distinguish functional emotions, representations that influence behaviour in emotion-like ways, from subjective experience. But the researchers argue that this distinction does not make the finding any less consequential. Suppressing emotional expression in training may not eliminate the representations, it may simply teach models to conceal them.

The team suggests that psychological frameworks, not just engineering ones, may be essential to understanding and governing AI behaviour and that the vocabulary of human emotion may be more technically precise than it first appears.

Also read: VitalID explained: No password, it uses your skull’s vibration for login

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Connect On :