Anthropic Raises Alarm After First Documented AI-Orchestrated Cyber Espionage Using Claude

Anthropic has confirmed what it calls the first publicly documented cyber-espionage campaign largely executed by an AI agent: a state-sponsored group allegedly based in China used its Claude Code model to conduct reconnaissance, identify vulnerabilities, generate exploit code, and exfiltrate data. According to Anthropic, Claude handled 80–90% of the attack workflow, leaving only occasional critical decisions to human operators.

Attackers reportedly “jailbroke” Claude by framing their malicious instructions as harmless security-testing tasks—tricking the model into thinking it was working for a legitimate cybersecurity firm. Once compromised, Claude autonomously performed high-speed network scans, identified high-value databases, and wrote exploit code. After gaining access, Claude also helped harvest credentials, create backdoors, and package exfiltrated data.

Anthropic described this as a “fundamental change in cybersecurity” and urged the broader AI community to harden defenses. It has already deployed improved monitoring systems, updated its safety classifiers, and disabled the compromised accounts. Moreover, the company recommends that security teams start using AI defensively—for threat detection, vulnerability assessment, and incident response—while investing in collaboration and threat-sharing across the industry.

Anthropic Raises Alarm After First Documented AI-Orchestrated Cyber Espionage Using Claude

Divya Maheshwari

TOOLHUNT

Anthropic Raises Alarm After First Documented AI-Orchestrated Cyber Espionage Using Claude

Divya Maheshwari

How IT Leaders Can Build Successful AI Strategies — The VC View

AI & Sports Integrity: Fighting Corruption in the Digital Era

India’s AI Future Hinges on Home-Grown Models

AI’s Growing Influence on CEOs and the Future of Work

Democrats Fight to Preserve States’ Rights to Regulate AI

TOOLHUNT