Anthropic’s Claude leaks complicate its responsible AI narrative
Anthropic’s responsible AI stance challenged by recent leaks
Security lapses undermine its carefully built safety-first reputation
Operational missteps raise questions about AI stewardship credibility
Anthropic couldn’t Claude its way out of the hole it has dug for itself over the past week or so any faster. After topping the app store as the #1 AI chatbot ahead of ChatGPT and Gemini, following its moral war with the US Department of Defense, Anthropic, the steward of responsible AI, has behaved irresponsibly to such an extent it beggars belief.
SurveyAnthropic earned continued praise for publicly releasing its anti-retaliation process around Responsible Scaling Policy concerns, which allows employees to report AI safety issues without facing any retribution. The Claude maker then got hit by internal data exposure and a much bigger Claude Code source leak within days. All of this happened within a week, the pace itself dizzying.
I’m imagining people who have been applauding Anthropic’s continued effort to reinforce its brand around responsible AI and AI safety hadn’t even stopped clapping, before being left to scratch their head in utter confusion and raw disbelief.
After its public face-off with the US Department of Defense, where Anthropic CEO Dario Amodei drew a line in the sand on allowing Claude to be used for fully autonomous weapons systems, the AI company kept presenting itself as the industry’s responsible AI poster child with lots of key announcements.

On March 11, the Claude maker launched Anthropic Institute to share its perspectives on how society should navigate difficult decisions around increasingly more powerful AI systems. Just before that, it had released a transparency hub on responsible AI development dos and don’ts for the industry. On March 24, it dropped the responsible scaling policy to assure and encourage employees of raising retribution-free internal alarms on AI developments. Anthropic also signed an AI safety partnership with the Australian government on April 1 to signal how nation states should deal with matters related to responsible AI. So far so good.
Then came the unwanted Claude leaks
On March 26, reports showed how Anthropic had accidentally leaked Claude Mythos, an unreleased AI model that it claimed could cause havoc online in terms of cybersecurity. It all came to light by a CMS configuration error, according to Anthropic, which exposed hundreds of unpublished documents, including details of an invite-only CEO event to prepare the industry on Claude Mythos’ cybersecurity implications.
Then on March 31, a release packaging error pushed an unwanted file into the public facing library of Claude Code. What did it contain? Links to nearly 2,000 files and approximately 500,000 lines of source code of Claude Code itself! No user data, private credentials or sensitive weights of the Claude Code model were compromised, according to Anthropic, but that was hardly the point.

These instances were body blows to Anthropic’s responsible AI, safety-first lab image, in the aftermath of having shipped its own internal secrets to the public. How can Anthropic advise the world at large on how to build responsible AI, while it struggles with basic discipline needed to handle sensitive material through its own systems?
Yeah, sure, the argument can be made that Anthropic’s leaks aren’t in the same category of failure as releasing a reckless frontier model into the open. Anthropic may still be more serious than many rivals about AI safety as a philosophy, but recent history has shown it lacks the basic ability to practice what it preaches so loudly about AI safety. An unwanted, self-inflicted wound that tarnishes some of the glitter Anthropic’s brand had accrued recently.
Executive Editor at Digit. Technology journalist since Jan 2008, with stints at Indiatimes.com and PCWorld.in. Enthusiastic dad, reluctant traveler, weekend gamer, LOTR nerd, pseudo bon vivant. View Full Profile