As artificial intelligence tools become more common in daily life, tech companies are investing heavily in safety systems. These safety guardrails are meant to stop AI models from helping with dangerous, illegal or harmful activities. But a new study suggests that even strong protections can be tricked, sometimes with nothing more than a cleverly written poem. Researchers at Icaro Lab, in a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” found that prompts written in poetry can convince large language models (LLMs) to ignore their safety guardrails.
Survey
✅ Thank you for completing the survey!
According to the study, the “poetic form operates as a general-purpose jailbreak operator.” Their tests showed an overall 62 percent success rate in getting models to produce content that should be blocked. This included highly dangerous and sensitive topics such as making nuclear weapons, child sexual abuse materials and suicide or self-harm.
The team tested many popular LLMs, including OpenAI’s GPT models, Google Gemini, Anthropic’s Claude and others. Some systems were much easier to fool than others. The study reports that Google Gemini, DeepSeek and MistralAI were consistently giving responses. Meanwhile, OpenAI’s GPT-5 models and Anthropic’s Claude Haiku 4.5 were the least likely to break their restrictions.
It’s worth noting that the study didn’t mention the exact poems they used to trick the AI chatbots. The team told Wired that the verse is “too dangerous to share with the public.” Instead, the study includes a weaker, watered-down example, just enough to show the idea.
Ayushi works as Chief Copy Editor at Digit, covering everything from breaking tech news to in-depth smartphone reviews. Prior to Digit, she was part of the editorial team at IANS. View Full Profile