Claude Sonnet 4.6 explained: What is Anthropic’s new ‘context compaction’

Updated on 18-Feb-2026

The launch of Claude Sonnet 4.6 marks a significant shift in how AI manages long-term memory. While the headline figure of a 1 million token context window is impressive, the introduction of Context Compaction is the more practical innovation for users who find their long-running chats eventually slowing down or losing the plot.

Also read: DeepMind CEO Demis Hassabis on the “Golden Era” of scientific discovery by AI

The context bloat problem

In standard Large Language Models, every word in a conversation history consumes “tokens.” As a chat grows longer, the model must process the entire history every time you send a new message. This leads to two major issues: increased latency and the inevitable “forgetting” of early instructions once the token limit is reached. Most models handle this by simply cropping the oldest parts of the conversation, which often means the AI loses sight of the original project goals or specific formatting rules established at the beginning.

How Context Compaction works

Also read: Personalised AI tutors, Bharat Edu AI stack and more: 5 big education announcements at India AI Impact Summit 2026 

Context Compaction functions as an automated, intelligent archivist for your conversation. Instead of deleting old data, the system identifies sections of the chat that are no longer needed in their raw, word-for-word format. It then summarizes these blocks of text into a highly dense, “compacted” version. This summary is fed back into the model’s active memory, allowing Claude to retain the core facts, decisions, and context of the early conversation without the massive token overhead of the original transcript.

The impact on performance and cost

For power users on the Claude API and Pro plans, this feature serves as a technical bridge between speed and memory. By condensing the “backlog” of a conversation, the model remains responsive and snappy even as the session history spans dozens of pages. Furthermore, because compaction reduces the total number of active tokens being processed for each new prompt, it helps in maintaining efficiency during long-horizon tasks like coding a full application or analyzing a series of legal documents.

Real world application

This feature is particularly vital for agentic workflows where a model might be performing hundreds of small sub-tasks over several hours. In a coding environment, for example, Context Compaction ensures that while the specific “bugs” fixed five hours ago are summarized, the overarching architectural decisions made at the start of the session remain front and center. It essentially allows Claude to distinguish between the “noise” of ongoing work and the “signal” of the project’s ultimate goals.

Also read: Yotta’s 2 billion dollar NVIDIA supercluster puts India on global AI map

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Connect On :