Connect with us


GPT-4 didn't ace the bar exam after all, MIT research suggests — it didn't even break the 70th percentile




GPT-4 didn't actually score in the top 10% on the bar exam after all, new research suggests.

OpenAI, the company behind the large language model (LLM) that powers its chatbot ChatGPT, made the claim in March last year, and the announcement sent shock waves around the web and the legal profession.

Now, a new study has revealed that the much-hyped 90th-percentile figure was actually skewed toward repeat test-takers who had already failed the exam one or more times — a much lower-scoring group than those who generally take the test. The researcher published his findings March 30 in the journal Artificial Intelligence and Law.

"It seems the most accurate comparison would be against first-time test takers or to the extent that you think that the percentile should reflect GPT-4's performance as compared to an actual lawyer; then the most accurate comparison would be to those who pass the exam," study author Eric Martínez, a doctoral student at MIT's Department of Brain and Cognitive Sciences, said at a New York State Bar Association continuing legal Education course.

Related: AI can 'fake' empathy but also encourage Nazism, disturbing study suggests

To arrive at its claim, OpenAI used a 2023 study in which researchers made GPT-4 answer questions from the Uniform Bar Examination (UBE). The AI model's results were impressive: It scored 298 out of 400, which placed it in the top tenth of exam takers.

But it turns out the artificial intelligence (AI) model only scored in the top 10% when compared with repeat test takers. When Martínez contrasted the model's performance more generally, the LLM scored in the 69th percentile of all test takers and in the 48th percentile of those taking the test for the first time.