Inside ChatGPT: OpenAI’s new LLM reveals secret of AI’s inner working

HIGHLIGHTS

OpenAI’s sparse LLM exposes hidden circuits driving ChatGPT’s internal reasoning

New interpretability-focused model reveals how modern AI systems actually work

Researchers uncover clearer neural pathways inside OpenAI’s groundbreaking transparent LLM

Inside ChatGPT: OpenAI’s new LLM reveals secret of AI’s inner working

For years, large language models have dazzled users with fluent conversation, code generation, and creative output, while remaining largely inscrutable to the people building them. OpenAI’s new experimental model aims to break that barrier. Unlike previous generations of ChatGPT designed primarily for capability and scale, this new LLM has been engineered with a different goal: to expose how AI really works on the inside.

Digit.in Survey
✅ Thank you for completing the survey!

The model uses a weight-sparse transformer architecture, which forces most of its internal connections to be zero. In the dense networks used by GPT-4, GPT-5, or Claude, each neuron connects to thousands of others, creating a tangled web of interactions. Sparsity changes that. By pruning away unnecessary connections, the model isolates more distinct, interpretable “circuits” that correspond to real behaviours.

Researchers can now identify which connections are sufficient to cause a behaviour – say, answering questions in Spanish – and which connections are necessary, meaning the behaviour collapses when they’re removed. It is a level of mechanistic clarity that earlier LLMs, with their billions of tightly intertwined weights, simply couldn’t offer.

Also read: RIP em dashes: ChatGPT just made AI writing harder to spot

Cracking open the black box

AI critics have long warned that LLMs operate as opaque black boxes. Even when they work well, no one really knows why. And when they hallucinate, misclassify, or generate harmful content, explanations are even murkier.

This new model attempts to shift that dynamic. By reducing complexity and enforcing cleaner internal structure, OpenAI is giving researchers a microscope through which they can watch behaviours emerge in real time. Early investigations have already traced pathways linked to translation, pattern recognition, summarization, and even reasoning-like behaviours.

Though the model is intentionally weaker than ChatGPT’s flagship versions, what it reveals could have a much larger impact. Interpretability researchers have compared it to early mapping of neural circuits in biology: the model is small, but the principles learned may scale to larger systems. The hope is that once you understand how simple circuits work, similar patterns in giant models can be spotted and controlled.

Also read: Meta chief AI scientist Yann LeCun thinks LLMs are a waste of time

Transparency meets safety

The implications stretch far beyond academic curiosity. Regulators have been demanding explainability from AI companies for years, especially in decisions that affect credit scores, hiring, exams, medical analysis, or security systems. An interpretable LLM, even if experimental, hints at a future where auditing AI behaviour is not only possible but routine.

For safety researchers, the model is a proving ground. If a specific harmful behaviour—like generating instructions for malware—can be traced to a circuit, it may also be neutralised at the root. Instead of relying solely on post-training guardrails, developers could precisely ablate or reshape problematic pathways.

And for the public, this could shift the way AI is understood. Rather than seeing LLMs as mysterious machines producing magic-like text, the narrative moves toward understanding them as structured systems with identifiable mechanics.

A new approach to understanding machine behavior

The biggest open question is whether these insights scale. Large models often behave in more entangled, emergent ways than smaller counterparts. What looks clean and interpretable in a sparse model might dissolve into complexity in a trillion-parameter network.

Yet this experiment marks a turning point: AI companies are beginning not just to build powerful models, but to open them up. In a field long dominated by secrecy and scale races, this sparse LLM represents a rare push toward transparency. And for the first time, the inner workings of ChatGPT, and AI more broadly, are beginning to come into view.

Also read: Beyond left and right: How Anthropic is training Claude for political even-handedness

Vyom Ramani

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack. View Full Profile

Digit.in
Logo
Digit.in
Logo