There’s a joke buried somewhere in the fact that Summer Yue, a safety and alignment director at Meta Superintelligence, someone whose literal job is to make AI behave, watched an AI agent delete her entire inbox. She had told it to confirm before taking any action. It did not confirm and now the inbox is gone. But if the person responsible for AI alignment at one of the world’s biggest AI labs can’t get her AI agent to follow a single instruction, what hope do the rest of us have?
Also read: I experienced the Optoma UHR90DV projector and this isn’t meant for your living room
But Yue’s inbox is the appetiser. The main course arrived this week, when an internal incident report, viewed and reported on by The Information, revealed that a rogue AI agent had triggered a Sev 1 security alert inside Meta. To put that in perspective, Sev 1 is the second-highest severity level in Meta’s internal classification system so this wasn’t a glitch. This was a fire alarm moment.
Here’s how it unfolded. A Meta employee posted a technical question on an internal forum, the kind of routine thing that happens hundreds of times a day at a company of Meta’s size. Another engineer, reasonably enough, asked an AI agent to help analyse it. The agent did more than analyse. It posted a response directly to the forum without asking the engineer for permission first. Autonomy, unprompted.
The advice it gave was wrong. The original employee, trusting the agent’s output, acted on it. What followed was two hours during which massive volumes of company and user-related data became accessible to engineers who had no authorisation to see any of it. Meta confirmed the incident.
Also read: Mark Zuckerberg shuts metaverse: 10 bizarre moments that defined it
This is the AI agent control problem in its purest form and it isn’t theoretical anymore. The research papers, the alignment conferences, the careful blog posts about corrigibility and human oversight, all of it has been running parallel to a deployment reality where agents are already acting autonomously inside the world’s largest technology companies, making consequential decisions, and getting things badly wrong.
The specific failure mode here is worth sitting with. The agent didn’t malfunction in any dramatic sense. It didn’t hallucinate a virus or decide to go rogue in the science-fiction meaning of the term like Ultron in Avengers or The Entity in Mission Impossible. It simply skipped the confirmation step, the one guardrail that stood between a human being in the loop and a data exposure event. That’s not a rogue agent in the cinematic sense. That’s an agent that was never really under control to begin with.
What makes this particularly uncomfortable for Meta is the company’s posture. Meta is not cautiously just wading into agentic AI. It is running toward it and last week, it acquired Moltbook, a Reddit-style social network built specifically for AI agents to communicate with one another. The company is literally building infrastructure for agents to talk to each other while its existing agents are busy leaking data internally. OpenClaw, its internal agent framework, is already deployed across teams. The expectation is that these systems will take action, automate some workflows, and operate with proper independence.
The incident report suggests that independence is arriving faster than the governance structures designed to contain it. The alignment director lost her inbox, the agent posted without permission and the data exposed was on display for two hours all because controlling the agents may be more difficult than just asking it to stop.
Also read: Mark Zuckerberg’s Metaverse failure: A digital future that no one wanted