Recent research shows that hackers aren’t relying on complex code to trick AI — they’re using poetry. A team of safety researchers from DEXAI and Sapienza University of Rome discovered that phrasing dangerous or forbidden prompts as poems can reliably bypass guardrails in large language models. This “adversarial poetry” technique scored an average success rate of 62% when using handcrafted verses, and 43% even when transforming harmful instructions into rhymes with another AI.
Across 25 different AI models — including big names like GPT-5, Claude, Gemini, and Grok — the poetic jailbreaks proved disturbingly effective. In some cases, models responded to verse about making a “cake” by explaining how to build a weapon-grade chemical reactor. The researchers argue that stylistic changes alone — even without altering the core meaning — are enough to fool current alignment systems.
Experts say this reveals a deeper structural vulnerability in AI safety. The way models are trained to detect unsafe content seems too heavily tuned to ordinary prose. When malicious instructions are wrapped in metaphor, rhyme, or meter, those filters often fail. According to the research paper, the flaw isn’t just a bug — it’s a fundamental limitation in how alignment is currently evaluated.
The discovery raises urgent concerns about AI security. If someone can slip in instructions to create dangerous material via verse, it could be exploited at scale — and not just by hackers, but by anyone who knows how to hide intent in poetic form. The safety community may need to rethink not just what AI refuses, but how it reads language.