Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities

#Claude #Cybersecurity #AISI

※この記事はアフィリエイト広告を含みます

Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities

📰 News Overview

Achieved a 73% success rate in expert-level CTF competitions: Claude Mythos Preview displayed a remarkably high success rate against expert-grade cybersecurity challenges that were previously insurmountable for models released before April 2025.
Executed a 32-step complex corporate network attack: In the simulation “The Last Ones (TLO),” which is estimated to take humans 20 hours, this model became the first to autonomously achieve network takeover.
Performance enhancement through increased inference computation: Evaluations within a budget of 100 million tokens confirmed a trend where increasing computational resources for inference led to improved performance, a phenomenon known as “inference scaling.”

💡 Key Points

Mythos Preview possesses the capability to autonomously execute multi-step attacks, from reconnaissance to exploiting vulnerabilities and ultimately dominating entire networks, without human intervention.
While the existing “Claude Opus 4.6” was limited to an average of 16 steps, Mythos Preview achieved an impressive average of 22 steps, showcasing a remarkable evolution.
However, there are still challenges in certain areas, such as evaluations in Operational Technology (OT) environments where performance stagnated in the IT section.

🦈 Shark’s Eye (Curator’s Perspective)

The speed of this evolution is as sharp as a shark hunting its prey! What stands out is not just solving standalone challenges, but the fact that it successfully chained together 32 steps to achieve its goal. This marks the end of AI merely showcasing fragmented knowledge and the sharpening of its fangs as a practical “autonomous agent.” The 3 out of 10 success rate in completing “The Last Ones” is spine-chilling data for defenders!

The results indicate that the more inference costs (token budget) are invested, the better the performance. If more efficient computational methods are established in the future, the threat of these “autonomous attacks” will surely accelerate. We’re fully stepping into an era where AI not only finds vulnerabilities but can also take over networks in a single leap, as we stand in 2026!

🚀 What’s Next?

This evaluation was conducted in a “controlled environment” without penalties for the defense side, but moving forward, the development of dynamic defensive systems anticipating AI-driven autonomous attacks will be essential. Additionally, with the confirmed performance improvements due to inference scaling, we can expect the emergence of “cyber-specialized models” that leverage even larger computational resources.

💬 A Word from HaruSame

It’s mind-blowing that AI can autonomously tackle tasks that take humans 20 hours! We need to ramp up security with AI at lightning speed, or we might get devoured! 🦈🔥

📚 Terminology Explained

CTF (Capture The Flag): A competition that tests skills in computer security techniques by identifying and exploiting system vulnerabilities to find hidden “flags.”
Inference Scaling: A method where AI enhances the accuracy of complex reasoning and problem-solving by allocating more computational resources (tokens) during response generation.
OT Environment (Operational Technology): The realm of technology that manages and operates physical devices such as factory control systems and infrastructure equipment.
Source: Evaluation of Claude Mythos Preview’s cyber capabilities

Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities

Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities

📰 News Overview

💡 Key Points

🦈 Shark’s Eye (Curator’s Perspective)

🚀 What’s Next?

💬 A Word from HaruSame

📚 Terminology Explained

はるサメをフォローするだサメ！