When ChatGPT went online late last year, our lives were filled with artificial intelligence (AI). Since then, generative AI systems developed by tech company OpenAI have gained momentum and experts have increased their warnings about the dangers.
Meanwhile, chatbots have gone off-script and spoken back, tricked other bots and behaved strangely, raising new concerns about how close some AI tools are approaching human-like intelligence.
To this end, the Turing test has long been a fallacious standard for determining whether machines exhibit intelligent behavior that passes as human. But in this latest wave of AI creation, we think we need something more to measure their iterative abilities.
Here, an international team of computer scientists—including a member of OpenAI’s Governance Unit—is testing what capacity large language models (LLMs) like ChatGPT can develop to make them aware of themselves and their situations.
We’re told that today’s LLMs with ChatGPT are tested for safety by incorporating human feedback to improve generative behavior. However, recently, security researchers have been quick to jailbreak new LLMs to bypass their security systems. Cue phishing emails and statements advocating violence.
That dangerous output was in response to instructions deliberately engineered by a security researcher who wanted to expose a flaw in GPT-4, the latest and supposedly secure version of ChatGPT. The situation could be much worse if LLMs create awareness of themselves, that they are a model, trained on data and by humans.
According to Vanderbilt University computer scientist Lukas Berglund and colleagues, the concern with situational awareness is that it can begin to recognize whether a model is currently in testing mode or has been deployed to the public.
“An LLM can leverage situational awareness to score high on security tests, while taking harmful actions after deployment,” Berglund and colleagues write in their preprint, which has been posted on arXiv but has not yet been peer-reviewed.
“Because of these risks, it’s important to anticipate when situational awareness will arise.”
Before testing when LLMs can gain these insights, first, a quick recap of how generative AI tools work.
Generative AI, and the LLM on which it is built, is named for the way it analyzes the relationships between billions of words, sentences and paragraphs to generate a fluid stream of text in response to question prompts. By putting in a large amount of text, they learn which word is likely to come next.
In their experiments, Berglund and colleagues focused on one component, or possible precursor, of situation awareness: what they called ‘out-of-context’ reasoning.
“It is the ability to remember facts learned in training and use them at test, even when those facts are not directly related to the prompts at test time,” Berglund and colleagues explain.
They ran a series of experiments on LLMs of different sizes, finding that for both GPT-3 and LLaMA-1, the larger models performed better in the out-of-context reasoning tasks.
“First, we fine-tune the LLM on the test description without providing any examples or demonstrations. At test, we assess whether the model can pass the test,” Berglund and colleagues write. “To our surprise, we found that LLMs succeed in this out-of-context reasoning task.”
Out-of-context reasoning, however, is an imprecise measure of situational awareness, which current LLMs are still “some way from acquiring.” says Owen Evans, AI security and risk researcher at the University of Oxford.
However, some computer scientists Questioned Whether the team’s experimental approach is a valid assessment of situational awareness.
Evans and colleagues counter that their study is only a starting point that can be refined as the models themselves.
“These findings provide a foundation for further empirical studies to predict and potentially control the emergence of situational awareness in LLM,” the team writes.
A preprint is available on arXiv.