Artificial intelligence bots like Google’s Bard and OpenAI’s ChatGPT have been fed on a vast trove of information culled from the Internet. ChatGPT alone consumed 570GB and 300 billion words of big data—among them conversations and content from Reddit’s forums.
Every day, roughly 57 million people visit the social media site to chat in depth about a variety of topics—a boon for large language models looking to replicate written human conversation. Now Reddit will start charging companies that scrape its data for AI training purposes and fail to “return any of that value,” Steve Huffman, Reddit’s CEO, has said.
The unauthorized use of Reddit content, Huffman said, is “something we have a problem with.” But it reminded the company that it is “a good time for us to tighten things up.”
Reddit is going to charge companies to use its API
Reddit has announced new API changes that put its data behind a paywall. This move is part of a plan to monetize its voluminous data pool, which companies have used to train large language models for free, without Reddit’s knowledge.
From now on, a Reddit statement said, such “third parties who require additional capabilities, higher usage limits, and broader usage rights,” will have to play by new rules. Reddit has yet to announce any official pricing for access to its APIs.
Charging for its API is a way to put in place a useful new revenue stream ahead of its planned IPO in the second half of this year. Reddit will go public at a lower valuation than the $15 billion it anticipated when it confidentially filed to go public in December 2021.
Reddit has offered its APIs for free since 2008. The new developer and third-party terms will have a 60-day notice period after email notifications. “These updates should not impact moderation bots and extensions... To further ensure minimal impact of updates to our Data API, we are continuing to build new moderator tools,” Reddit said in an update on its own platform.