Claude Mythos and GPT-5.5 have confirmed what researchers feared most about AI and cybersecurity

According to a report recently released by the AI Security Institute (AISI) in the UK, the autonomous cyber capability of frontier AI models has been progressing rapidly enough to outpace its measurement benchmarks.

Also read: Figure AI’s Helix-02 humanoid robots is pulling full 8-hour factory shifts without human help

For their part, AISI had been assessing how fast AI models are able to accomplish cyber-related tasks on their own, gauging it against how long it would have taken a human expert to do the same. As per their findings, the cyber capabilities of frontier models were expected to double every eight months in November 2025, with this decreasing to every 4.7 months in February 2026. However, all of that went up in smoke with Claude Mythos Preview and GPT-5.5.

As you can probably tell, the cyber capabilities displayed by the two models were too impressive for AISI’s existing trend analysis to be able to keep up with. Whether this was an anomaly or something else is yet to be determined.

AISI uses a narrow cyber suite where models perform tasks that test their ability to find and exploit cybersecurity vulnerabilities. Skills tested include web exploitation and reverse engineering. Each task is assigned an average human time, and the test calculates the reliability of AI models completing tasks of similar duration. When a success rate of 80 percent is used, the output from recent models has practically hit the ceiling of what the test suite can calculate.

Also read: OpenAI wants to build the UN of AI: Good luck with that

The Claude Mythos Preview solved both simulated ranges by AISI, where the models emulate cyberattacks on small undefended enterprise networks. It successfully solved The Last Ones in six of ten instances, and Cooling Tower range became the first to be cracked by the Claude Mythos Preview, which solved the range three out of ten times. GPT-5.5 solved The Last Ones range in three out of ten instances.

They are not just benchmarks. As AISI is clear on this point, even though evaluations are imperfect estimates of the actual real-world implications, the pace of evolution in current time points toward growing capacity for artificial intelligence cyber tools to materialize in practical threats.

It is also important to mention that AISI admits testing was conducted under conservative circumstances, which implies certain limitations on evaluations. Specifically, models are limited to 2.5 million tokens per task. Without these limitations, success rates rise so high that there is no way to determine time frames anymore. It means that the statistics published by AISI significantly underestimate the power of these models.

As AISI says, it plans to introduce more advanced tests, including new cyber ranges and even active cyber defenses. The very need for such an approach reflects the future direction of this problem.

Also read: LG’s Sanjay Chitkara on AI making appliances smarter and building products for India

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Connect On :