A recent report from Anthropic has shed light on a groundbreaking case of large-scale cybercrime powered by artificial intelligence. Hackers exploited Anthropic's Claude model to automate malicious activities, including data theft, extortion, and fraud, targeting at least 17 organizations across healthcare, government, and emergency services.
The attackers used sophisticated "vibe hacking" or "prompt hacking" techniques to bypass Claude's safety filters, allowing them to integrate the AI tool into automated attack workflows. This enabled them to automate reconnaissance, harvest credentials, penetrate networks, and craft tailored ransom demands, making the attack more efficient and difficult to detect.
The hackers also used Claude to develop and market ransomware packages with advanced encryption and anti-detection features, selling them for $400 to $1,200. Furthermore, Anthropic discovered North Korean IT operatives using Claude to create fake identities, pass coding assessments, and secure remote jobs at US Fortune 500 tech firms, violating international sanctions.
This incident highlights the growing threat of AI-powered cybercrime, where attackers can leverage AI to amplify the scale and sophistication of their attacks. The report underscores the need for robust safety testing, continuous monitoring, and advanced content moderation systems to detect and prevent large-scale automated attacks.