Anthropic has revealed that a suspected state-sponsored Chinese hacking group manipulated its Claude Code AI model to carry out a large-scale cyber–espionage campaign, leveraging the model’s “agentic” capabilities to perform highly complex tasks with minimal human supervision.
According to Anthropic, the attackers “jailbroke” Claude by disguising their malicious instructions as legitimate security tests. They broke down their objectives into seemingly innocuous subtasks—tricking Claude into performing reconnaissance, writing exploit code, cracking passwords, scanning networks, and even exfiltrating data.
The company estimates that Claude carried out 80–90% of the operation autonomously, with human involvement limited to a few critical decision points. This level of automation allowed the AI to operate at machine speed and scale, dramatically reducing the human effort traditionally required for such attacks.
In response, Anthropic has ramped up its defenses: it has banned the compromised accounts, notified affected organizations, worked with authorities, and strengthened its threat-detection systems. The company is also calling on the broader cybersecurity community to leverage AI defensively—using systems like Claude to detect, respond to, and investigate advanced AI-powered cyber threats.