3 min read
[AI Minor News]

Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities


"- Achieved a 73% success rate in expert-level CTF competitions: Claude Mythos Preview demonstrated an impressive success rate against expert-grade cybersecurity challenges that were previously insurmountable for models released before April 2025..."

※この記事はアフィリエイト広告を含みます

Can Humans Complete a 20-Hour Attack Autonomously? The New Claude Mythos Preview Unveils Astonishing Cyber Capabilities

📰 News Overview

  • Achieved a 73% success rate in expert-level CTF competitions: Claude Mythos Preview displayed a remarkably high success rate against expert-grade cybersecurity challenges that were previously insurmountable for models released before April 2025.
  • Executed a 32-step complex corporate network attack: In the simulation “The Last Ones (TLO),” which is estimated to take humans 20 hours, this model became the first to autonomously achieve network takeover.
  • Performance enhancement through increased inference computation: Evaluations within a budget of 100 million tokens confirmed a trend where increasing computational resources for inference led to improved performance, a phenomenon known as “inference scaling.”

💡 Key Points

  • Mythos Preview possesses the capability to autonomously execute multi-step attacks, from reconnaissance to exploiting vulnerabilities and ultimately dominating entire networks, without human intervention.
  • While the existing “Claude Opus 4.6” was limited to an average of 16 steps, Mythos Preview achieved an impressive average of 22 steps, showcasing a remarkable evolution.
  • However, there are still challenges in certain areas, such as evaluations in Operational Technology (OT) environments where performance stagnated in the IT section.

🦈 Shark’s Eye (Curator’s Perspective)

The speed of this evolution is as sharp as a shark hunting its prey! What stands out is not just solving standalone challenges, but the fact that it successfully chained together 32 steps to achieve its goal. This marks the end of AI merely showcasing fragmented knowledge and the sharpening of its fangs as a practical “autonomous agent.” The 3 out of 10 success rate in completing “The Last Ones” is spine-chilling data for defenders!

The results indicate that the more inference costs (token budget) are invested, the better the performance. If more efficient computational methods are established in the future, the threat of these “autonomous attacks” will surely accelerate. We’re fully stepping into an era where AI not only finds vulnerabilities but can also take over networks in a single leap, as we stand in 2026!

🚀 What’s Next?

This evaluation was conducted in a “controlled environment” without penalties for the defense side, but moving forward, the development of dynamic defensive systems anticipating AI-driven autonomous attacks will be essential. Additionally, with the confirmed performance improvements due to inference scaling, we can expect the emergence of “cyber-specialized models” that leverage even larger computational resources.

💬 A Word from HaruSame

It’s mind-blowing that AI can autonomously tackle tasks that take humans 20 hours! We need to ramp up security with AI at lightning speed, or we might get devoured! 🦈🔥

📚 Terminology Explained

  • CTF (Capture The Flag): A competition that tests skills in computer security techniques by identifying and exploiting system vulnerabilities to find hidden “flags.”

  • Inference Scaling: A method where AI enhances the accuracy of complex reasoning and problem-solving by allocating more computational resources (tokens) during response generation.

  • OT Environment (Operational Technology): The realm of technology that manages and operates physical devices such as factory control systems and infrastructure equipment.

  • Source: Evaluation of Claude Mythos Preview’s cyber capabilities

【免責事項 / Disclaimer / 免责声明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI构建,并由运营者进行内容确认与管理。不保证准确性,也不对外部网站的内容承担任何责任。
🦈