There’s a 15-year-old bug hiding in Firefox’s <legend> element – one of the most boring tags in HTML. It survived over a decade of fuzzing, manual audits, and security research. Claude Mythos found it in days.
Also read: Genesis AI’s human-sized robotic hands can cook, play piano, and solve a Rubik’s cube
That’s the lead story from the thorough post-mortem by Mozilla detailing how it leveraged Claude Mythos Preview from Anthropic to discover and patch 271 security flaws in Firefox, leading to the most heavily patched version in the browser’s history. Firefox 150 was released with all 271 security flaws patched; some extra patches were later rolled into Firefox 149.0.2 and 150.0.1. In April alone, Mozilla fixed 423 security flaws in total, a feat that would have been inconceivable just six months ago.
A good example of how AI is solving problems that fuzzing can’t is the <legend> bug, where a collision of three entirely independent Firefox behaviors causes a use-after-free vulnerability where the browser frees up memory still in use. Each behavior, on its own, appears innocent enough, but when all three come together, something nasty happens. As such, an automated fuzzer wouldn’t have picked up the bug, but Claude Mythos found it because it was capable of making sense of all three behaviors coming together.
Also read: Why OpenAI, AMD, NVIDIA, Intel, Broadcom, and Microsoft all agreed on one networking protocol
That combinatorial reasoning is the core difference. Earlier attempts at AI-assisted security audits using models like GPT-4 and Claude Sonnet 3.5 produced too many false positives to be useful at scale; it’s cheap to prompt a model to find a “problem” in code, but slow and expensive for engineers to chase down dead ends. What changed is the introduction of agentic harnesses: systems that don’t just flag suspicious code but actually build and run reproducible test cases to confirm whether a bug is real. If it can’t be reproduced, it gets dismissed automatically. That filter is what makes the pipeline scalable.
Mozilla took a novel approach in building their own harness on top of existing fuzzing framework, distributed across numerous ephemeral VMs targeting specific codebase files. The bugs they discovered are anything but easy. In particular, they found several sandbox escape exploits which assume the compromise of a content process and an attack on the higher-level parent process. Bugs like these have proven notoriously difficult to detect with conventional techniques. A 20-year old vulnerability with a buggy implementation of XSLT, an NaN value capable of functioning as a pointer for a JavaScript object in cross-process communication, and a race condition allowing exploitation through the WebTransport protocol via massive flooding of certificate hashes – these are just some of the vulnerabilities that require multi-step reasoning to detect.
Out of the 271 bugs found by Claude Mythos in Firefox version 150, a staggering 180 have been flagged as sec-high, and 80 have received sec-moderate labels. Mozilla also plans on integrating such automated scanning into their CI pipeline to immediately analyze any changes for potential vulnerabilities. The takeaway for the entire software industry? The bugs have already been found; the time to act is now.
Also read: 220,000 NVIDIA GPUs and double the Claude Code limits: Anthropic’s SpaceX deal explained