When I first started my AI ethics research lab, many in the AI research community were skeptical of OpenAI’s early approach. Could they truly achieve AI that rivaled humans simply by scaling up data and computing power, without deeper theoretical insights? It seemed like a strategy based on capital rather than science.
OpenAI, however, had the last laugh. Their success proved a simple, if unsettling, formula: massive datasets plus immense computing power equals unprecedented AI capability. The global AI race quickly became, fundamentally, a data arms race.
The data-centric gold rush has historic roots, starting with the deep learning revolution of the 2010s, which was itself ignited by web-scraped datasets like ImageNet, showing that data availability could dramatically improve AI performance. But today’s scale is different, and so are the stakes. Ironically, the soaring value of AI has come at the direct expense of the data that fuels it. To win the AI race, companies have been incentivized to collect data with little regard for the rights of its creators—a mentality has been tacitly endorsed by regulators in the U.S., Japan, and India, who are willing to weaken data protections to accelerate national AI development.
Read the full article here
