Technology

GPT-4 has passed the Turing test, researchers claim

Published

5m ago

Jun 15, 2024 / 5857 Views

Evan Walker

We are interacting with artificial intelligence (AI) online not only more than ever — but more than we realize — so researchers asked people to converse with four agents, including one human and three different kinds of AI models, to see whether they could tell the difference.

The "Turing test," first proposed as "the imitation Game" by computer scientist Alan Turing in 1950, judges whether a machine's ability to show intelligence is indistinguishable from a human. For a machine to pass the Turing test, it must be able to talk to somebody and fool them into thinking it is human.

Scientists decided to replicate this test by asking 500 people to speak with four respondents, including a human and the 1960s-era AI program ELIZA as well as both GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes — after which participants had to say whether they believed they were talking to a human or an AI. In the study, published May 9 to the pre-print arXiv server, the scientists found that participants judged GPT-4 to be human 54% of the time,

ELIZA, a system pre-programmed with responses but with no large language model (LLM) or neural network architecture, was judged to be human just 22% of the time. GPT-3.5 scored 50% while the human participant scored 67%.

Read more: 'It would be within its natural right to harm us to protect itself': How humans could be mistreating AI right now without even knowing it

"Machines can confabulate, mashing together plausible ex-post-facto justifications for things, as humans do," Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science.

"They can be subject to cognitive biases, bamboozled and manipulated, and are becoming increasingly deceptive. All these elements mean human-like foibles and quirks are being expressed in AI systems, which makes them more human-like than previous approaches that had little more than a list of canned responses."