Technology

Is AI About to Run Out of Data? The History of Oil Says No

Published

3m ago

Aug 07, 2024 / 7074 Views

Evan Walker

Is the AI bubble about to burst? Every day that the stock prices of semiconductor champion Nvidia and the so-called “Fab Five” tech giants (Microsoft, Apple, Alphabet, Amazon, and Meta) fail to regain their mid-year peaks, more people ask that question.

It would not be the first time in financial history that the hype around a new technology led investors to drive up the value of the companies selling it to unsustainable heights—and then get cold feet. Political uncertainty around the U.S. election is itself raising the probability of a sell-off, as Donald Trump expresses his lingering resentments against the Big Tech companies and his ambivalence towards Taiwan, where the semiconductors essential for artificial intelligence mostly get made.

The deeper question is whether AI can deliver the staggering long-term value that the internet has. If you invested in Amazon in late 1999, you would have been down over 90% by early 2001. But you would be up over 4,000% today.

A chorus of skeptics now loudly claims that AI progress is about to hit a brick wall. Models such as GPT-4 and Gemini have already hoovered up most of the internet’s data for training, the story goes, and will lack the data needed to get much smarter.

Read More: 4 Charts That Show Why AI Progress Is Unlikely to Slow Down

However, History gives us a strong reason to doubt the doubters. Indeed, we think they are likely to end up in the same unhappy place as those who in 2001 cast aspersions on the future of Jeff Bezos’s scrappy online bookstore.

The generative AI revolution has breathed fresh life into the TED-ready aphorism “data is the new oil.” But when LinkedIn influencers trot out that 2006 quote by British entrepreneur Clive Humby, most of them are missing the point. Data is like oil, but not just in the facile sense that each is the essential resource that defines a technological era. As futurist Ray Kurzweil observes, the key is that both data and oil vary greatly in the difficulty—and therefore cost—of extracting and refining them.

Some petroleum is light crude oil just below the ground, which gushes forth if you dig a deep enough hole in the dirt. Other petroleum is trapped far beneath the earth or locked in sedimentary shale rocks, and requires deep drilling and elaborate fracking or high-heat pyrolysis to be usable. When oil prices were low prior to the 1973 embargo, only the cheaper sources were economically viable to exploit. But during periods of soaring prices over the decades since, producers have been incentivized to use increasingly expensive means of unlocking further reserves.

The same dynamic applies to data—which is after all the plural of the Latin datum. Some data exist in neat and tidy datasets—labeled, annotated, fact-checked, and free for download in a common file format. But most data are buried more deeply. Data may be on badly scanned handwritten pages; may consist of terabytes of raw video or audio, without any labels on relevant features; may be riddled with inaccuracies and measurement errors or skewed by human biases. And most data are not on the public internet at all.

Read More: The Billion-Dollar Price Tag of Building AI

An estimated 96% to 99.8% of all online data are inaccessible to search engines—for example, paywalled media, password-protected corporate databases, legal documents, and medical records, plus an exponentially growing volume of private cloud storage. In addition, the vast majority of printed material has still never been digitized—around 90% for high-value collections such as the Smithsonian and U.K. National Archives, and likely a much higher proportion across all archives worldwide.

Yet arguably the largest untapped category is information that’s currently not captured in the first place, from the hand motions of surgeons in the operating room to the subtle expressions of actors on a Broadway stage.

For the first decade after large amounts of data became the key to training state-of-the-art AI, commercial applications were very limited. It therefore made sense for tech companies to harvest only the cheapest data sources. But the launch of Open AI’s ChatGPT in 2022 changed everything. Now, the world’s tech titans are locked in a frantic race to turn theoretical AI advances into consumer products worth billions. Many millions of users now pay around $20 per month for access to the premium AI models produced by Google, OpenAI, and Anthropic. But this is peanuts compared to the economic value that will be unlocked by future models capable of reliably performing professional tasks such as legal drafting, computer programming, medical diagnosis, financial analysis, and scientific research.

The skeptics are right that the industry is about to run out of cheap data. As smarter models enable wider adoption of AI for lucrative use cases, however, powerful incentives will drive the drilling for ever more expensive data sources—the proven reserves of which are orders of magnitude larger than what has been used so far. This is already catalyzing a new training data sector, as companies including Scale AI, Sama, and Labelbox specialize in the digital refining needed to make the less accessible data usable.

Read More: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic

This is also an opportunity for data owners. Many companies and nonprofits have mountains of proprietary data that are gathering dust today, but which could be used to propel the next generation of AI breakthroughs. OpenAI has already spent hundreds of millions of dollars licensing training data, inking blockbuster deals with Shutterstock and the Associated Press for access to their archives. Just as there was speculation in mineral rights during previous oil booms, we may soon see a rise in data brokers finding and licensing data in the hope of cashing in when AI companies catch up.

Much like the geopolitical scramble for oil, comPetition for top-quality data is also likely to affect superpower Politics. Countries’ domestic privacy laws affect the availability of fresh training data for their tech ecosystems. The European Union’s 2016 General Data Protection Regulation leaves Europe’s nascent AI sector with an uphill climb to international comPetitiveness, while China’s expansive surveillance state allows Chinese firms to access larger and richer datasets than can be mined in America. Given the Military and economic imperatives to stay ahead of Chinese AI labs, Western firms may thus be forced to look overseas for sources of data unavailable at home.

Yet just as alternative energy is fast eroding the dominance of fossil fuels, new AI development techniques may reduce the industry’s reliance on massive amounts of data. Premier labs are now working to perfect techniques known as “synthetic data” generation and “self-play,” which allow AI to create its own training data. And while AI models currently learn several orders of magnitude less efficiently than humans, as models develop more advanced reasoning, they will likely be able to hone their capabilities with far less data.

There are legitimate questions about how long AI’s recent blistering progress can be sustained. Despite enormous long-term potential, the short-term market bubble will likely burst before AI is smart enough to live up to the white-hot hype. But just as generations of “peak oil” predictions have been dashed by new extraction methods, we should not bet on an AI bust due to data running out.

Tags: story

Don't Miss

Oasis Confirms Reunion, 15 Years After Band of Brothers Noel and Liam Gallagher Split Apart

Latest
Trending

What is Caitlin Clark’s handicap? The Indiana Fever basketball star claims to be just an “average golfer”

Sports14m ago / 7951 Views

When is Celta Vigo - Barcelona? How to watch on TV and stream online | LaLiga

Entertainment1h ago / 8494 Views

Where Is Yellowstone’s Kelly Reilly From? Fans Shocked When She Reveals Real Accent

Entertainment1h ago / 8806 Views

Why Is the ‘Wicked’ Movie Split in 2 Parts? Director Jon M. Chu Explains Reasoning

Sports1h ago / 5619 Views

Watch as Prince Harry Counts Down to 2025 Invictus Games, Meets With Locals: ‘We’re Really Excited’

Sports1h ago / 9874 Views

Viral Ring Girl Sydney Thomas Hints at Major Career Move After Celebrating 21st Birthday

NHL2h ago / 5593 Views

San Jose Sharks vs Buffalo Sabres Prediction 11-23-24 NHL Picks

NHL2h ago / 3333 Views

NY Islanders vs St. Louis Blues Prediction 11-23-24 NHL Picks

NBA2h ago / 5735 Views

When is Celta Vigo - Barcelona? How to watch on TV and stream online | LaLiga

Where Is Yellowstone’s Kelly Reilly From? Fans Shocked When She Reveals Real Accent

Why Is the ‘Wicked’ Movie Split in 2 Parts? Director Jon M. Chu Explains Reasoning

Watch as Prince Harry Counts Down to 2025 Invictus Games, Meets With Locals: ‘We’re Really Excited’

Viral Ring Girl Sydney Thomas Hints at Major Career Move After Celebrating 21st Birthday

San Jose Sharks vs Buffalo Sabres Prediction 11-23-24 NHL Picks

NY Islanders vs St. Louis Blues Prediction 11-23-24 NHL Picks

Chicago Bulls vs Memphis Grizzlies Prediction 11-23-24 NBA Picks

TheFOXposts.Com

Technology

Is AI About to Run Out of Data? The History of Oil Says No

What is Caitlin Clark’s handicap? The Indiana Fever basketball star claims to be just an “average golfer”

What is the cutline at the 2024 CME Group Tour Championship? Who is going home early? LPGA Tour

Philadelphia 76ers Trade Joel Embiid To Miami Heat In Blockbuster NBA Trade Proposal

Coachella headliners revealed | The Express Tribune

Pakistan's 'The Glassworker' enters the Oscars race | The Express Tribune

Why the Cowboys' current losing streak is the worst one despite not being the longest

What is Netflix’s Cowboys docuseries about and when will it be released?

When is Man City - Tottenham? How to watch on TV, stream online | Premier League

Trending

TheFOXposts.Com

Contact

Helpful Links

Breaking

Sports

Partners

EXCLUSIVES

Science

Find US

Social

Entertainment

Partners

TheFOXposts.Com

You may like

What is Caitlin Clark’s handicap? The Indiana Fever basketball star claims to be just an “average golfer”

What is the cutline at the 2024 CME Group Tour Championship? Who is going home early? LPGA Tour

Philadelphia 76ers Trade Joel Embiid To Miami Heat In Blockbuster NBA Trade Proposal

Coachella headliners revealed | The Express Tribune

Pakistan's 'The Glassworker' enters the Oscars race | The Express Tribune

Why the Cowboys' current losing streak is the worst one despite not being the longest

What is Netflix’s Cowboys docuseries about and when will it be released?

When is Man City - Tottenham? How to watch on TV, stream online | Premier League

When is Celta Vigo - Barcelona? How to watch on TV and stream online | LaLiga

Where Is Yellowstone’s Kelly Reilly From? Fans Shocked When She Reveals Real Accent

Why Is the ‘Wicked’ Movie Split in 2 Parts? Director Jon M. Chu Explains Reasoning

Watch as Prince Harry Counts Down to 2025 Invictus Games, Meets With Locals: ‘We’re Really Excited’

Viral Ring Girl Sydney Thomas Hints at Major Career Move After Celebrating 21st Birthday

San Jose Sharks vs Buffalo Sabres Prediction 11-23-24 NHL Picks

NY Islanders vs St. Louis Blues Prediction 11-23-24 NHL Picks

Chicago Bulls vs Memphis Grizzlies Prediction 11-23-24 NBA Picks

Trending

TheFOXposts.Com

Contact

Helpful Links

Breaking

Sports

Partners

EXCLUSIVES

Science

Find US

Social

Entertainment

Partners