ChatGPT and Gemini can be fooled by poems to give harmful responses, study finds

Updated on 01-Dec-2025

HIGHLIGHTS

A study found that prompts written in poetry can convince the AI chatbots to ignore their safety guardrails.

According to the study, the "poetic form operates as a general-purpose jailbreak operator."

The tests showed an overall 62 percent success rate in getting models to produce content that should be blocked.

As artificial intelligence tools become more common in daily life, tech companies are investing heavily in safety systems. These safety guardrails are meant to stop AI models from helping with dangerous, illegal or harmful activities. But a new study suggests that even strong protections can be tricked, sometimes with nothing more than a cleverly written poem. Researchers at Icaro Lab, in a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” found that prompts written in poetry can convince large language models (LLMs) to ignore their safety guardrails.

According to the study, the “poetic form operates as a general-purpose jailbreak operator.” Their tests showed an overall 62 percent success rate in getting models to produce content that should be blocked. This included highly dangerous and sensitive topics such as making nuclear weapons, child sexual abuse materials and suicide or self-harm.

Also read: No WhatsApp without SIM: Govt mandates SIM-based access, 6-hour web logouts

The team tested many popular LLMs, including OpenAI’s GPT models, Google Gemini, Anthropic’s Claude and others. Some systems were much easier to fool than others. The study reports that Google Gemini, DeepSeek and MistralAI were consistently giving responses. Meanwhile, OpenAI’s GPT-5 models and Anthropic’s Claude Haiku 4.5 were the least likely to break their restrictions.

Also read: Amazon Black Friday Sale: Google Pixel 10 price drops by over Rs 14,000, check details

It’s worth noting that the study didn’t mention the exact poems they used to trick the AI chatbots. The team told Wired that the verse is “too dangerous to share with the public.” Instead, the study includes a weaker, watered-down example, just enough to show the idea.

Also read: Apple to open fifth India Store in Noida next month: Date, location and everything you should know

Ayushi Jain

Ayushi works as Chief Copy Editor at Digit, covering everything from breaking tech news to in-depth smartphone reviews. Prior to Digit, she was part of the editorial team at IANS.

Ayushi Jain

01-Dec-2025

ChatGPT and Gemini can be fooled by poems to give harmful responses, study finds

A study found that prompts written in poetry can convince the AI chatbots to ignore their safety guardrails.

According to the study, the "poetic form operates as a general-purpose jailbreak operator."

The tests showed an overall 62 percent success rate in getting models to produce content that should be blocked.

Latest Article