As artificial intelligence tools become more common in daily life, tech companies are investing heavily in safety systems. These safety guardrails are meant to stop AI models from helping with dangerous, illegal or harmful activities. But a new study suggests that even strong protections can be tricked, sometimes with nothing more than a cleverly written poem. Researchers at Icaro Lab, in a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” found that prompts written in poetry can convince large language models (LLMs) to ignore their safety guardrails.
According to the study, the “poetic form operates as a general-purpose jailbreak operator.” Their tests showed an overall 62 percent success rate in getting models to produce content that should be blocked. This included highly dangerous and sensitive topics such as making nuclear weapons, child sexual abuse materials and suicide or self-harm.
Also read: No WhatsApp without SIM: Govt mandates SIM-based access, 6-hour web logouts
The team tested many popular LLMs, including OpenAI’s GPT models, Google Gemini, Anthropic’s Claude and others. Some systems were much easier to fool than others. The study reports that Google Gemini, DeepSeek and MistralAI were consistently giving responses. Meanwhile, OpenAI’s GPT-5 models and Anthropic’s Claude Haiku 4.5 were the least likely to break their restrictions.
Also read: Amazon Black Friday Sale: Google Pixel 10 price drops by over Rs 14,000, check details
It’s worth noting that the study didn’t mention the exact poems they used to trick the AI chatbots. The team told Wired that the verse is “too dangerous to share with the public.” Instead, the study includes a weaker, watered-down example, just enough to show the idea.
Also read: Apple to open fifth India Store in Noida next month: Date, location and everything you should know