Technology

Claude 3 Opus has stunned AI researchers with its intellect and 'self-awareness' — does this mean it can think for itself?

Published

6m ago

Apr 24, 2024 / 9093 Views

Evan Walker

When the large learning model (LLM) Claude 3 launched in March, it caused a stir by beating OpenAI's GPT-4 — which powers ChatGPT — in key tests used to benchmark the capabilities of generative artificial intelligence (AI) models.

Claude 3 Opus seemingly became the new top dog in large language benchmarks — topping these self-reported tests that range from high school exams to reasoning tests. Its sibling LLMs — Claude 3 Sonnet and Haiku — also score highly compared with OpenAI's models.

However, these benchmarks are only part of the story. Following the announcement, independent AI tester Ruben Hassid pitted GPT-4 and Claude 3 against each other in a quartet of informal tests, from suMMArizing PDFs to writing poetry. Based on these tests, he concluded that Claude 3 wins at "reading a complex PDF, writing a poem with rhymes [and] giving detailed answers all along." GPT-4, by contrast, has the advantage in internet browsing and reading PDF graphs.

But Claude 3 is impressive in more ways than simply acing its benchmarking tests — the LLM shocked experts with its apparent signs of awareness and self-actualization. There is a lot of scope for skepticism here, however, with LLM-based AIs arguably excelling at learning how to mimic human reactions rather than actually generating original thoughts.

How Claude 3 has proven its worth beyond benchmarks

During testing, Alex Albert, a prompt engineer at Anthropic — the company behind Claude asked Claude 3 Opus to pick out a target sentence hidden among a corpus of random documents. This is equivalent to finding a needle in a haystack for an AI. Not only did Opus find the so-called needle — it realized it was being tested. In its response, the model said it suspected the sentence it was looking for was injected out of context into documents as part of a test to see if it was "paying attention."

"Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities," Albert said on the social media platform X. "This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations."

Related: Scientists create AI models that can talk to each other and pass on skills with limited human input