This Could Be the Setup for an AI Arms Race

By TEERAPONG @ Adobe Stock

Recently, Secretary of the Treasury Scott Bessent gathered bankers from America’s top banks to discuss Anthropic’s Claude Mythos AI. The danger discussed is that using Mythos could potentially allow hackers to find holes in the cybersecurity of even the most protected systems. The Wall Street Journal’s Robert McMillan and Chip Cutter report that AI tools have taken the time it takes to find vulnerabilities in code from “countless hours” to two days. They write:

For humans to find and exploit a bug like this would typically require countless hours of research. Most hackers wouldn’t have even looked at Provos’s old code, assuming that it had been picked over for bugs, Provos said.

“Previously there were only a handful of people that could do this,” he said. “Now, with these tools, the skill that you need to develop really sophisticated exploits has gone way down.”

Mythos found the bug—along with several dozen other issues—while burning about $20,000 of computing power over a two-day period, Anthropic said.

Anthropic announced Claude Mythos Preview’s cybersecurity capabilities on April 7, writing:

Earlier today we announced Claude Mythos Preview, a new general-purpose language model. This model performs strongly across the board, but it is strikingly capable at computer security tasks. In response, we have launched Project Glasswing, an effort to use Mythos Preview to help secure the world’s most critical software, and to prepare the industry for the practices we all will need to adopt to keep ahead of cyberattackers.

This blog post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will show why we view this as a watershed moment for security, and why we have chosen to begin a coordinated effort to reinforce the world’s cyber defenses.

We begin with our overall impressions of Mythos Preview’s capabilities, and how we expect that this model, and future ones like it, will affect the security industry. Then, we discuss how we evaluated this model in more detail, and what it achieved during our testing. We then look at Mythos Preview’s ability to find and exploit zero-day (that is, undiscovered) vulnerabilities in real open source codebases. After that we discuss how Mythos Preview has proven capable of reverse-engineering exploits on closed-source software, and turning N-day (that is, known but not yet widely patched) vulnerabilities into exploits.

As we discuss below, we’re limited in what we can report here. Over 99% of the vulnerabilities we’ve found have not yet been patched, so it would be irresponsible for us to disclose details about them (per our coordinated vulnerability disclosure process). Yet even the 1% of bugs we are able to discuss give a clear picture of a substantial leap in what we believe to be the next generation of models’ cybersecurity capabilities—one that warrants substantial coordinated defensive action across the industry. We conclude our post with advice for cyber defenders today, and a call for the industry to begin taking urgent action in response.

According to Anthropic, Claude Mythos had found thousands of vulnerabilities. They wrote:

We have identified thousands of additional high- and critical-severity vulnerabilities that we are working on responsibly disclosing to open source maintainers and closed source vendors. We have contracted a number of professional security contractors to assist in our disclosure process by manually validating every bug report before we send it out to ensure that we send only high-quality reports to maintainers.

While we are unable to state with certainty that these vulnerabilities are definitely high- or critical-severity, in practice we have found that our human validators overwhelmingly agree with the original severity assigned by the model: in 89% of the 198 manually reviewed vulnerability reports, our expert contractors agreed with Claude’s severity assessment exactly, and 98% of the assessments were within one severity level. If these results hold consistently for our remaining findings, we would have over a thousand more critical severity vulnerabilities and thousands more high severity vulnerabilities. Eventually it may become necessary to relax our stringent human-review requirements. In any such case, we commit to publicly stating any changes we will make to our processes in advance of doing so.

It appears that rapidly advancing AI applied to cybersecurity could lead to an arms race between those AIs attempting to find and patch vulnerabilities and those attempting to exploit them.