OpenAI admits AI browsers may never fully escape prompt injection attacks

HIGHLIGHTS

OpenAI admits Atlas’ agent mode expands the attack surface as AI gains autonomy over browsing and emails

Researchers have repeatedly shown how hidden text can manipulate AI browser behaviour

The company is using an AI “automated attacker” to find vulnerabilities faster than human testing

OpenAI admits AI browsers may never fully escape prompt injection attacks

ChatGPT maker OpenAI has acknowledged that among the most dangerous threats facing AI-powered browsers, prompt injection attacks, is unlikely to disappear, even after the company keeps on strengthening the security across its new Atlas AI browser. According to the blog post, while OpenAI is working to improve Atlas against cyberattacks, prompt injections remain a long-term challenge for AI agents operating on the open web. These attacks involve embedding hidden instructions within webpages, emails, or documents, which can lead AI agents to perform unintended or harmful actions.

Digit.in Survey
✅ Thank you for completing the survey!

The company admitted that Atlas agent mode, which allows the AI to browse, read emails, and perform actions on a user’s behalf, naturally raises security concerns. Since Atlas’s release in October, many researchers have demonstrated how harmless text, including content placed within Google Docs, can manipulate browser behaviour. Similar concerns have been raised about other AI browsers, including Perplexity’s tools.

Also Read: Best TWS earphones under Rs 3,000 you can buy in 2025

Previously, the UK’s National Cyber Security Centre warned that prompt injection attacks on generative AI systems might never be completely eliminated. Instead of aiming for total prevention, the agency advised organisations to focus on risk reduction and mitigating the impact of successful attacks. Instead of aiming for total prevention, the agency advised organisations to focus on risk reduction and mitigating the impact of successful attacks.

To counter this, OpenAI stated that it is constantly testing and developing faster response cycles rather than one-time fixes. A key component of this effort is an internally developed automated attacker, which is an AI system trained to behave like a hacker using reinforcement learning. This tool makes repeated attempts to exploit Atlas in simulated environments, improving its attacks based on how the AI agent reacts.

According to OpenAI, this approach has discovered attack techniques that were not detected during human-led security testing. Despite these improvements, OpenAI has not revealed whether the changes resulted in a measurable decrease in successful prompt injection attempts. The company claims it has been working with external partners on this issue prior to Atlas’ public release and will continue to do so for a long time.

Ashish Singh

Ashish Singh

Ashish Singh is the Chief Copy Editor at Digit. He's been wrangling tech jargon since 2020 (Times Internet, Jagran English '22). When not policing commas, he's likely fueling his gadget habit with coffee, strategising his next virtual race, or plotting a road trip to test the latest in-car tech. He speaks fluent Geek. View Full Profile

Digit.in
Logo
Digit.in
Logo