Technology
AI models trained on AI-generated data could spiral into unintelligible nonsense, scientists warn
Artificial Intelligence (AI) systems could slowly trend toward filling the internet with incomprehensible nonsense, new research has warned.
AI models such as GPT-4, which powers ChatGPT, or Claude 3 Opus rely on the many trillions of words shared online to get smarter, but as they gradually colonize the internet with their own output they may create self-damaging feedback loops.
The end result, called "model collapse" by a team of researchers that investigated the phenomenon, could leave the internet filled with unintelligible gibberish if left unchecked. They published their findings July 24 in the journal Nature.
"Imagine taking a picture, scanning it, then printing it out, and then repeating the process. Through this process the scanner and printer will introduce their errors, over time distorting the image," lead author Ilia Shumailov, a computer scientist at the University of Oxford, told Live Science. "Similar things happen in machine learning — models learning from other models absorb errors, introduce their own, over time breaking model utility."
AI systems grow using training data taken from human input, enabling them to draw probabilistic patterns from their neural networks when given a prompt. GPT-3.5 was trained on roughly 570 gigabytes of text data from the repository Common Crawl, amounting to roughly 300 billion words, taken from books, online articles, Wikipedia and other web pages.
Related: 'Reverse Turing test' asks AI agents to spot a human imposter — you'll never guess how they figure it out
But this human-generated data is finite and will most likely be exhausted by the end of this decade. Once this has happened, the alternatives will be to begin harvesting private data from users or to feed AI-generated "synthetic" data back into models.
-
Technology10h ago
EU fines Meta €798 million for Facebook Marketplace's 'abusive practices' | The Express Tribune
-
Technology10h ago
Spain's 'La Vanguardia' exits X, citing rise in toxic content | The Express Tribune
-
Technology14h ago
The Gap Between Open and Closed AI Models Might Be Shrinking. Here’s Why That Matters
-
Technology14h ago
Denver Broncos on verge of giving fans faster internet as it installs Wi-Fi 6E at stadium
-
Technology22h ago
SUPARCO's rover to explore lunar surface | The Express Tribune
-
Technology1d ago
Xpeng Aeroht secures 2008 pre-orders for modular flying car at Airshow China | The Express Tribune
-
Technology1d ago
Asking ChatGPT vs Googling: Can AI chatbots boost human creativity?
-
Technology1d ago
Understanding Retrieval-Augmented Generation (RAG): The Future of AI-Powered Information Retrieval and Response Generation