Technology
GPT-4 didn't ace the bar exam after all, MIT research suggests — it didn't even break the 70th percentile
GPT-4 didn't actually score in the top 10% on the bar exam after all, new research suggests.
OpenAI, the company behind the large language model (LLM) that powers its chatbot ChatGPT, made the claim in March last year, and the announcement sent shock waves around the web and the legal profession.
Now, a new study has revealed that the much-hyped 90th-percentile figure was actually skewed toward repeat test-takers who had already failed the exam one or more times — a much lower-scoring group than those who generally take the test. The researcher published his findings March 30 in the journal Artificial Intelligence and Law.
"It seems the most accurate comparison would be against first-time test takers or to the extent that you think that the percentile should reflect GPT-4's performance as compared to an actual lawyer; then the most accurate comparison would be to those who pass the exam," study author Eric Martínez, a doctoral student at MIT's Department of Brain and Cognitive Sciences, said at a New York State Bar Association continuing legal Education course.
Related: AI can 'fake' empathy but also encourage Nazism, disturbing study suggests
To arrive at its claim, OpenAI used a 2023 study in which researchers made GPT-4 answer questions from the Uniform Bar Examination (UBE). The AI model's results were impressive: It scored 298 out of 400, which placed it in the top tenth of exam takers.
But it turns out the artificial intelligence (AI) model only scored in the top 10% when compared with repeat test takers. When Martínez contrasted the model's performance more generally, the LLM scored in the 69th percentile of all test takers and in the 48th percentile of those taking the test for the first time.
-
Technology5h ago
There Is a Solution to AI’s Existential Risk Problem
-
Technology7h ago
US pushes to break up Google, calls for Chrome sell-off in major antitrust move | The Express Tribune
-
Technology11h ago
Public health surveillance, from social media to sewage, spots disease outbreaks early to stop them fast
-
Technology13h ago
TikTok, PTA host youth safety summit in Pakistan | The Express Tribune
-
Technology16h ago
Why a Technocracy Fails Young People
-
Technology1d ago
Transplanting insulin-making cells to treat Type 1 diabetes is challenging − but stem cells offer a potential improvement
-
Technology1d ago
Japan's $26 billion deep sea discovery sparks serious environmental concerns | The Express Tribune
-
Technology1d ago
Should I worry about mold growing in my home?