Google strikes $60m deal with Reddit for AI training data — what you need to know
Reddit will also be more integrated into Google search results
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Reddit spent the latter half of 2023 considering whether to block the Google and Bing search engines from indexing posts on the site. The decision, according to The Washington Post , was in order to prevent the unauthorized and uncompensated use of its posts to train AI.
Now Reddit has announced it's reached a deal with Google that will, among other things, give the company access to the Reddit Data API “to improve its products and services” which includes “more efficient ways to train models”. In Google’s words, access to said API will grant the company “real-time, structured, unique content from their large and dynamic platform.”
The deal, which Bloomberg previously suggested would be “worth about $60 million on an annualized basis”, doesn’t stop there. As part of the agreement, Reddit will have access to Google’s Vertex AI service which should improve internal search results, and it will also allow for “Reddit content to be displayed across Google products.”
Google says this will ensure “more content-forward displays of Reddit information that will make our products more helpful for our users and make it easier to participate in Reddit communities and conversations.” Given the number of people who affix the word “reddit” to searches to surface genuine user-generated insights, that could be a very good thing to the average Google user.
But for Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human, thanks to the posts and comments written by millions of real people every day.
For Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human.
But scale isn’t everything, and in some ways Reddit is an imperfect sample for training artificial intelligence when compared to literature or magazines. Grammar is faster and looser, there’s a lot of memes and inside jokes, it’s full of information that’s just plain wrong and it's predominantly male.
By contrast, Apple has reportedly sought multi-million dollar deals with publishers in order to train on their more formal and factually accurate magazines and newspapers. Though obviously this has its disadvantages too, concentrating on another small part of the human experience at the expense of how everyday people communicate — something Reddit is undoubtedly better at demonstrating.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Expect more of such deals to be made public over the next few years, because people are realizing that AI means big money and that training data can’t be absorbed free of charge without consequences. In the last year, Open AI, Meta and Stability AI have all been hit by lawsuits from authors who claim that their books were used for training without permission or compensation.
More from Tom's Guide
- ChatGPT finally has competition — Google Bard with Gemini just matched it with a huge upgrade
- Google Gemini: Everything we know about the advanced AI model
- I test AI for a living — here’s why Google Gemini is a big deal
Freelance contributor Alan has been writing about tech for over a decade, covering phones, drones and everything in between. Previously Deputy Editor of tech site Alphr, his words are found all over the web and in the occasional magazine too. When not weighing up the pros and cons of the latest smartwatch, you'll probably find him tackling his ever-growing games backlog. He also handles all the Wordle coverage on Tom's Guide and has been playing the addictive NYT game for the last several years in an effort to keep his streak forever intact.

