Perplexity accused of scraping websites even when told not to — here's their response

Perplexity on iPhone
(Image credit: Shutterstock)

Perplexity is riding high in the AI world right now. After launching the company’s Comet browser, leading the way in agentic browsing, they’ve ran into some controversy.

Cloudflare, in an online blog, published research that showed Perplexity has been crawling and scraping content from websites that explicitly stated they don’t want to be scraped.

The research accuses Perplexity of obscuring its identity when trying to scrape web pages, stating that they had received complaints from customers who had both disallowed Perplexity from analysing their files and created rules to specifically block Perplexity from doing this.

Cloudflare performed its own tests to confirm this, creating brand new domains and then querying Perplexity with questions about these specific domains. Perplexity was able to answer queries on these pages, even though Cloudflare had stated it didn’t want these websites to be analyzed.

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

Perplexity and lots of other AI tools require large amounts of information to work. They analyse the internet, looking at forums, web pages, and other online sources of information to work.

However, there is more and more backlash to this approach and an expectation for transparency from AI companies on how they gather data. Some of Perplexity’s competitors, like Claude and ChatGPT are offering ways to opt out of data gathering, and it is likely we’ll see more rules as time goes on.

Perplexity screenshot

(Image credit: Perplexity AI)

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” says Cloudflare’s post.

Jesse Dwyer, a spokesperson for Perplexity, accused Cloudflare’s blog of being a sales pitch for the company in an email to TechCrunch on the subject.

She went on to say that the screenshots in the blog “show that no content was accessed” and that the bot named in the Cloudflare blog “isn’t even ours”.

Cloudflare is now taking a strong stance on AI crawlers, including Perplexity. The company has claimed that AI is breaking the business model of the internet and wants to help fight back.

While Perplexity has denied this incident, the company has been in hot water before for similar problems, being accused of stealing news sites' content and struggling to define plagiarism.

More from Tom's Guide

Category
Arrow
Arrow
Back to Laptops
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Screen Type
Arrow
Storage Type
Arrow
Price
Arrow
Any Price
Showing 10 of 128 deals
Filters
Arrow
Show more
Alex Hughes
AI Editor

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

He was highly commended in the Specialist Writer category at the BSME's 2023 and was part of a team to win best podcast at the BSME's 2025.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.