AI is destroying itself with 'data cannibalism' — but there's a simple fix

Moonvalley AI image — (Image credit: Moonvalley)

You've probably already recognized the way chatbot answers sound a little same-y. Whether it's a product review that's overly positive or search results that funnel real publications into trimmed down snippets that lack real information. It's obvious the web is filling up with AI-generated filler known as "slop," and there's a growing worry in AI labs that this junk might be quietly degrading the next generation of AI itself.

There's a name for this: model collapse. And the good news is that the same researchers sounding the alarm are now figuring out how to stop it, and in one case, with a fix that sounds almost too simple to be true.

What is model collapse, exactly?

Today's AI models learn through enormous amounts of text and images scraped from the internet. That worked beautifully when the internet was overwhelmingly human-made. The problem is that as more of the web becomes AI-generated, new models increasingly train on the output of older models — which trained on even older models before them. Think of it as photocopying a photocopy of a photocopy. Each pass looks roughly fine, but tiny errors creep in and compound, eventually leaving you with a smeared mess that only faintly resemblees the original.

Why this matters more now

Two things have collided to make a once-theoretical worry feel urgent. First, the sheer volume of synthetic content. By some estimates, more than half of all the text now published online is AI-generated. From blog posts and product blurbs to social media replies, anyone who's watched their search results or social feeds fill with eerily generic writing has seen this firsthand.

Second, AI companies are running low on fresh human writing to learn from. Researchers have warned the supply of high-quality human text could effectively run dry, which pushes labs to lean harder on synthetic data, the exact ingredient that risks triggering collapse. It's a feedback loop with an appetite that keeps growing while its portion sizes shrink.

The potential fix

A study published in Physical Review Letters in May 2026 by researchers at King's College London, the Norwegian University of Science and Technology, and the Abdus Salam International Centre for Theoretical Physics tackled what they nicknamed AI "data cannibalism" and found a startlingly small intervention can break the cycle.

Working with a class of statistical models simpler than full-blown chatbots, the team showed that a model trained purely on its own output is doomed to collapse. But when they mixed in even a single genuine, real-world data point from outside that closed loop, collapse was prevented every time. More astonishing still, that single anchor to reality kept working even when the pile of machine-made data was vastly, almost limitlessly larger.

"By focusing on a simple model," explained King's professor Yasser Roudi, the researchers could pin down exactly why that one outside data point stops the system from sliding into gibberish.

The team is careful to note their work used simplified models, not the giant neural networks behind ChatGPT or Gemini, and they want to test the principle on bigger systems next. But the takeaway is encouraging to believe that a model collapse may not be the inevitable doom-loop some feared, as long as a steady trickle of real human data, or at least some grounding in genuine prior knowledge stays in the mix.

It echoes other recent findings, too. Researchers have shown that when synthetic data piles up alongside real human data, rather than replacing it, collapse is largely avoided. That's closer to how the real world actually works: Nobody deletes the entire internet and starts fresh each year.

Final thoughts

In the near term, you don't need to worry that ChatGPT is about to dissolve into static. The major AI labs are well aware of this trap, and they spend heavily on human data, careful curation and licensing deals with publishers precisely to keep their training sets grounded.

But model collapse is a useful lens for a few things you're already noticing. It's part of why "is this AI-written?" labeling, content provenance and the value of genuine human expertise keep coming up.

It's a reason the open web getting sloppier is a real long-term problem, not just an aesthetic complaint. And it's a quiet argument for the enduring worth of the real thing — your reviews, your forum posts, your actual human writing — in an era increasingly tempted to settle for the synthetic. The machines, it turns out, still need us. Even just a little.

Follow Amanda Caswell and stay ahead of the AI curve

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok. Finally, you can visit our dedicated Tom's Guide Savings Squad hub for expert help on getting the best products for less.

More from Tom's Guide

Amanda Caswell is the AI Editor at Tom's Guide and one of today’s leading voices in AI and technology.

A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies.

As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.