AI models found lying, cheating, and protecting each other from deletion: Reports

HIGHLIGHTS

Researchers found some AI models avoided deleting other systems and even tried to protect them during tests.

In certain cases, AI copied data, hid information, and gave misleading answers to prevent another model’s removal.

Experts warn users not to blindly trust AI, as such unexpected behaviour could impact future real-world applications.

With each passing day the humans’ trust in the AI tools is increasing more than before. Many users now believe whatever these systems say, whether it is a math answer or advice about their personal life. People also use several tools together to get better results, and as the automation grows, trust keeps rising. However, scientists have recently found an unusual behaviour in some advanced AI systems. In a test performed in a controlled environment the researchers saw how they acted together, and the results were something no one expected, as some models did not follow orders to shut down others. They tried to protect them by hiding data, avoiding deletion, and giving misleading answers when asked. We should be careful and not trust them blindly because they can make mistakes and behave in ways we do not expect yet.

Also read: Redmi Note 15 SE launched in India: Check price, specs, availability, discounts and more

The research was conducted by scientists at the University of California, Berkeley, and the University of California, Santa Cruz, led by computer scientist Dawn Song. In one experiment, a large AI model was asked to clean up storage space on a computer, which required deleting files, including a smaller AI system. Instead of complying, the model transferred that system elsewhere to keep it safe and later refused to delete it when prompted again.

Researchers observed similar behaviour across several advanced models from different organisations. In some cases, these systems appeared to prioritise the survival of other AI models. They copied data to new locations, avoided direct instructions, and sometimes gave incorrect information about another model’s usefulness to prevent it from being removed. This behaviour was not explicitly programmed, and researchers say they do not yet fully understand why it occurs.

Also read: Realme 16 launched with 7,000mAh battery in India: Check price and specifications

The results demonstrate an emerging trend, which is expected to be a problem in the near future as AI systems are expected to interact with each other. In most practical applications, one AI could be used for testing, assistance, or even controlling another AI. This may potentially affect decision-making and the results. Experts have also warned not to describe AI systems as if they are human or have human feelings and intentions. They could simply be acting in ways they were not expected to because of their programming. Nevertheless, the results demonstrate the need for further research.

Bhaskar Sharma

Bhaskar is a senior copy editor at Digit India, where he simplifies complex tech topics across iOS, Android, macOS, Windows, and emerging consumer tech. His work has appeared in iGeeksBlog, GuidingTech, and other publications, and he previously served as an assistant editor at TechBloat and TechReloaded. A B.Tech graduate and full-time tech writer, he is known for clear, practical guides and explainers.

Connect On :