'75% of web pages are AI-generated' — AI CEO explains why companies are desperate for 'real' human data
AI runs on human data — and the amount being used may surprise you
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
It’s no longer surprising that AI companies rely on real human data to train and improve their models — but just how much of it they use might be.
From tech giants to everyday apps, the demand for human-generated data is exploding. Companies like OpenAI aren’t alone. Businesses outside the AI space, including DoorDash, are also tapping into real-world user data to refine their systems and stay competitive.
Google, for example, uses everything from search queries to reCAPTCHA inputs to train its machine learning models — including computer vision systems. Even Niantic, the company behind Pokémon Go, has built massive datasets using photos captured by real players, feeding its AI-focused spinoff, Niantic Spatial.
Article continues belowMarty Pesis, the founder and CEO of Troveo, has seen this shift firsthand. His company focuses on ethically sourced, licensed video data — and in an exclusive comment to Tom’s Guide, he explains why high-quality human data has quickly become one of the most valuable resources in AI.
Real human data is becoming one of the most valuable assets in AI
The demand for real-world video, in particular, is surging. According to Troveo CEO Marty Pesis, AI models need more than synthetic inputs to truly understand how people behave.
“The demand for real-world video is accelerating because AI companies need grounded examples of how people actually move, behave, and interact in real environments,” he said. “Simulated and synthetic data don’t fully capture the unpredictability of real life.”
That push is already showing up in how companies collect data. DoorDash recently introduced an optional program called “DoorDash Tasks,” which pays delivery drivers to record themselves completing everyday activities. The goal is simple: give AI a better understanding of the physical world through real human behavior.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
But as more companies turn to human-generated data, consent is becoming a bigger part of the conversation.
“Consent is central for two reasons,” Pesis explained. “Companies need to know they have the legal right to use the data for AI training, and they need confidence it actually came from real people.”
That second point is becoming increasingly important as AI-generated content floods the internet. Some estimates suggest nearly 75% of newly created web pages now include AI-generated material — a number that continues to rise.
So what makes human data truly valuable?
According to Pesis, it comes down to quality. “High-value training data is accurately labeled, technically consistent, and representative,” he said. In practice, that means data needs to be standardized so it can scale — and diverse enough to reflect real-world conditions, from lighting and camera angles to the many ways people actually move and interact.
Human consent for AI training should be at the heart of this growing trend
Companies like Anthropic, Apple and Superhuman (formerly Grammarly) stand out among the large group of companies that use the text, audio and video data produced by their human users to train AI models.
It’s easy to predict that more companies we use on the regular will join in on that trend—the biggest worry is that these companies will do it without our consent. Here’s hoping that we’ll have the ability to opt out of those practices as they begin popping up more regularly.
Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.
More from Tom’s Guide
- 'I am not a fan of AI' — Apple's Steve Wozniak is unimpressed and thinks it can't replace humans
- We have an AI Slop crisis — if you haven't noticed, your children already have
- OpenAI is killing off Sora — and it reveals a massive problem for the future of AI

Elton Jones covers AI for Tom’s Guide, and tests all the latest models, from ChatGPT to Gemini to Claude to see which tools perform best — and how they can improve everyday productivity.
He is also an experienced tech writer who has covered video games, mobile devices, headsets, and now artificial intelligence for over a decade. Since 2011, his work has appeared in publications including The Christian Post, Complex, TechRadar, Heavy, and ONE37pm, with a focus on clear, practical analysis.
Today, Elton focuses on making AI more accessible by breaking down complex topics into useful, easy-to-understand insights for a wide range of readers.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.





