Google’s plan to train its AI now includes the entire public internet

Google AI logo on phone laying on a table
(Image credit: Shutterstock)

Google has made a change to its privacy policy that clarifies the extent of its growing AI ambitions. In essence, the search giant says anything posted publicly online is fair game to be scraped and used to improve AI tools like its ChatGPT rival, Google Bard.

First spotted by Gizmodo, the company updated its policy over the weekend to reflect it’ll use “publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

That’s a noted change from the previous policy wording that said Google just used the information to build “language models” for Google Translate. The company’s chatbot Bard and the nebulous term “Cloud AI” are now included too.

What does that mean? Well, users can confidently assume anything posted using a Google product (Search, Gmail, YouTube) is going to be saved and used by the company in some way or another. But this goes further. The wording states Google will use publicly available information to train its AI which could be read as basically anything posted on the internet at large. Considering Google’s industry-conquering Search platform was built on PageRank and its ability to effectively index the web, this AI move has huge ramifications.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate. This latest update simply clarifies that newer services like Bard are also included," a Google spokesperson told Tom's Guide. "We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles.”

Google revealed a number of new AI projects at its annual I/O conference earlier in the year, including the fact its entire Workspace portfolio would be utilizing it. But the sticking point for many organizations is where and how the large language models (LLMs) that fuel these projects are getting their information.

Sundar Pichai presenting Gemini onstage at Google I/O 2023

(Image credit: Google via YouTube)

Google has been forced to pause the rollout of its Bard chatbot in Europe while the EU considers imposing regulation on AI products. Meanwhile, platforms such as Reddit and Twitter have turned off free access to their APIs which allowed others to download large backlogs of posts and potentially harvest the contents to train AI.

ChatGPT, Google’s main rival in the AI space right now, was forced to introduce data controls for users worldwide after a complaint from Italy’s data protection authorities. Users can now ask it not to train on the data we provide it.

But that’s a different beast than opening up the entirety of the public web to potential crawling from one of the largest tech companies on the planet. While most internet users should understand the difference between public and private internet, it remains to be seen how they’ll feel about having any public post made at any point in time suddenly turned into fodder for Google’s burgeoning AI empire. 

More from Tom's Guide

Jeff Parsons
UK Editor In Chief

Jeff is UK Editor-in-Chief for Tom’s Guide looking after the day-to-day output of the site’s British contingent. Rising early and heading straight for the coffee machine, Jeff loves nothing more than dialling into the zeitgeist of the day’s tech news.


A tech journalist for over a decade, he’s travelled the world testing any gadget he can get his hands on. Jeff has a keen interest in fitness and wearables as well as the latest tablets and laptops. A lapsed gamer, he fondly remembers the days when problems were solved by taking out the cartridge and blowing away the dust.