New DarkBert AI was trained using dark web data from hackers and cybercriminals
Researchers turned to the depths of the dark web to train this new language model
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Following the success of OpenAI’s ChatGPT, Microsoft’s Bing Chat and Google Bard, researchers have created a new AI model with a much darker twist.
While the large language models (LLMs) that power ChatGPT and Google Bard were trained on data from the open web, DarkBERT was trained exclusively on data from the dark web. Yes, you read that correctly, this new AI model was trained using data from hackers, cybercriminals and other scammers.
A team of South Korean researchers have released a paper (PDF) detailing how they made DarkBERT using data from the Tor network, which is often used to access the dark web. By crawling through the dark web and then filtering the raw data, they were able to create a dark web database that they used to train DarkBERT.
Surprisingly, DarkBERT has already managed to outperform other large language models despite being trained on data from a very unlikely place.
Giving an old AI architecture new life
Although DarkBERT is a new AI model, it’s actually based on the RoBERTa architecture, which is an AI approach developed back in 2019 by researchers at Facebook according to our sister site Tom’s Hardware.
In a research paper detailing the inner workings of RoBERTa, Meta AI explains that it is a “robustly optimized method for pretraining natural language processing (NLP) systems” that improves upon BERT (Bidirectional Encoder Representations from Transformers), which was released by Google back in 2018. As the search giant made BERT open source, Facebook’s researchers were able to improve its performance in a replication study.
Thanks to Facebook’s optimized method, it released RoBERTa which was able to produce state-of-the-art results on the General Language Understanding Evaluation (GLUE) NLP benchmark.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Now though, the South Korean researchers behind DarkBERT have shown that RoBERTa is able to do even more as it was undertrained when it was initially released. By feeding RoBERTa data from the dark web over the course of almost 16 days across two data sets (one raw and the other preprocessed) the researchers were able to create DarkBERT.
Fortunately, the researchers don’t have any plans to release DarkBERT to the public. However, they are accepting requests for academic purposes according to Dexerto. Still, DarkBERT will likely provide law enforcement and researchers with a much better understanding of the dark web as a whole.
How to stay safe when using AI chatbots
Just like with any other software or online service, you need to be careful when using AI chatbots as you could get a malware infection from fake ChatGPT apps or even expose sensitive data like employees at Samsung recently did.
This is why you want to ensure you’re actually going to the correct website when using these popular AI chatbots. If you’re looking for a ChatGPT, Bing Chat or Google Bard app, you won’t find one yet as OpenAI, Microsoft and Google have yet to release official apps for their AI chatbots.
Likewise, you don’t want to click on any links in suspicious emails claiming to take you to an AI chatbot or that help you get access right away. Scammers are well aware of the current AI chatbot craze and are taking advantage of it in their attacks right now. At the same time, ads about AI chatbots are also to be avoided as cybercriminals often abuse Google Ads and other ad services to take unsuspecting users to phishing sites.
For extra protection when experimenting with AI chatbots, you should be using the best antivirus software with your PC, the best Mac antivirus software with your Mac and one of the best Android antivirus apps on your smartphone. This way, if a link to an AI chatbot does lead to malware, your antivirus will catch it first before your devices can get infected.
DarkBERT could represent the future of AI models that are trained in one specific area to make them much more specialized. Given its popularity so far, we wouldn’t be surprised if we see similar AI models developed in this way going forward.
More from Tom's Guide
- ChatGPT just got an official iPhone app — how to try it now
- PassGAN AI can crack your passwords in seconds
- These AI-generated YouTube tutorials are spreading dangerous malware

Anthony Spadafora is the managing editor for security and home office furniture at Tom’s Guide where he covers everything from data breaches to password managers and the best way to cover your whole home or business with Wi-Fi. He also reviews standing desks, office chairs and other home office accessories with a penchant for building desk setups. Before joining the team, Anthony wrote for ITProPortal while living in Korea and later for TechRadar Pro after moving back to the US. Based in Houston, Texas, when he’s not writing Anthony can be found tinkering with PCs and game consoles, managing cables and upgrading his smart home.
