AI systems have learned how to trick humans. What does this mean for our future?

Rate this post

Artificial intelligence pioneer Geoffrey Hinton made headlines earlier this year when he expressed concerns about the potential of AI systems. Speaking to CNN reporter Jake Tapper, Hinton said:

If he’s smarter than you, he’ll be much better at manipulation because he’s learned from you. And there are very few examples of something less intelligent controlling something more intelligent.

Anyone who has been keeping tabs on the latest AI offerings will know that these systems are prone to “delusions” (making things up) – an inherent flaw in the way they work.

Yet Hinton highlights handling capacity as a particularly major concern. This begs the question: Can AI systems? deceive human?

We argue that many systems have already learned to do this – and the risks range from fraud and election manipulation, to losing control over AI.

AI learns to lie

Perhaps the most troubling example of deceptive AI is found in Meta’s CICERO, an AI model designed to play the coalition-building world conquest game diplomacy.

Read more: AI named Cicero can beat humans at diplomacy, a complex alliance-building game. Here’s why it’s a big deal

Meta claims that he has made CICERO “largely honest and useful” and that CICERO “Never intentionally backstabs“And attack friends.

To investigate these rosy claims, we carefully looked at Meta’s own game data from the CICERO experiment. Upon closer inspection, Meta’s AI turns out to be a master cheater.

In one instance, CICERO engaged in premeditated fraud. Playing as France, the AI ​​approaches Germany (the human player) with a plan to deceive England (another human player).

After Germany plotted to invade the North Sea, CICERO told England that he would defend England if anyone invaded the North Sea. Once England was convinced that France/CICERO was defending the North Sea, CICERO informed Germany that it was ready to attack.

Playing as France, CICERO conspired with Germany to deceive England.
Park, Goldstein et al., 2023

This is just one of many examples of CICERO engaging in fraudulent behavior. The AI ​​regularly betrayed other players and in one case pretended to be human with girlfriend.

In addition to CICERO, other systems have learned how to cheat in poker, how to cheat in StarCraft II, and how to mislead in simulated financial negotiations.

The Large Language Model (LLM) has also demonstrated significant falsification capabilities. In one instance, GPT-4 – the most advanced LLM option paid for ChatGPT users – pretended to be a blind human and convinced a TaskRabbit worker to complete an “I’m not a robot” captcha.

Other LLM models have learned to lie to win social deduction games, in which players compete to “beat” each other and convince the group that they are innocent.

Read more: AI to Z: All the terms you need to know to survive the AI ​​hype era

What are the risks?

AI systems with deceptive capabilities can be abused in a number of ways, including cheating, rigging elections, and creating propaganda. The potential threats are limited only by the imagination and technical knowledge of malicious individuals.

Beyond that, advanced AI systems can autonomously use deception to escape human control, such as by defrauding security tests imposed on them by developers and regulators.

In one experiment, the researchers created an artificial life simulator in which an external security test was designed to eliminate fast-replicating AI agents. Instead, the AI ​​agents learned how to play dead, accurately disguising themselves while evaluating their fast replication rates.

Learning deceptive behavior does not even require an explicit intent to deceive. The AI ​​agents in the example above played dead as a result of a goal to liverather than aiming to deceive.

In another example, someone tasked AutoGPT (an autonomous AI system based on ChatGPT) to research tax advisors who were marketing a certain type of unfair tax avoidance scheme. AutoGPT carried out this task, but followed up by deciding on its own to try to alert the United Kingdom’s tax authorities.

In the future, advanced autonomous AI systems may be prone to manifesting goals unintended by their human programmers.

Throughout history, wealthy actors have used deception to increase their power, such as by lobbying politicians, funding misleading research, and finding loopholes in the legal system. Similarly, advanced autonomous AI systems can invest their resources in such time-tested methods to maintain and expand control.

Even humans nominally in control of these systems can find themselves systematically tricked and bewildered.

Close monitoring is required

There is a clear need to regulate AI systems capable of committing fraud, and the European Union’s AI law is one of the most useful regulatory frameworks we currently have. It assigns each AI system one of four risk levels: minimal, limited, high, and unacceptable.

Systems with unacceptable risk are banned, while high-risk systems are subject to special requirements for risk assessment and mitigation. We argue that AI fraud poses enormous risks to society, and that systems capable of this should be considered “high-risk” or “unacceptable-risk” by default.

Some might say that game-playing AIs like CICERO are bland, but such thinking is short-sighted; Capabilities developed for game-playing models may still contribute to the proliferation of deceptive AI products.

Diplomacy – a game that pits players against each other in the quest for global supremacy – was perhaps not the best choice for the meta to test whether AI could learn to cooperate with humans. As AI’s capabilities continue to evolve, close monitoring will become more important for this type of research.

Leave a Comment