The biggest AI fails of 2025: Lying chatbots, blackmail, Elon Musk worship and more

Person at a laptop working using AI tools — (Image credit: Shutterstock)

2025 really has been the year of AI. There have been some huge breakthroughs in the technology, and AI models have battled it out for the top spot with features that are truly mind blowing. But at the exact same time, we have witnessed the worst of this technology. AI has been full of mistakes, some funny, some outrageous and some concerning.

OpenAI’s take on mental health has raised alarms around the world, with multiple lawsuits and serious cases emerging from ChatGPT’s dealings with users’ mental health. Equally, there have been concerns over the role that AI will play in a huge number of industries, ending jobs and leaving many replaced.

Not all of its problems have been as serious. AI has also spent the year being incredibly human, making bad decisions, silly mistakes and simply hating having to do work. As we close out 2025, we’re recounting some of our favorite AI messes of the year.

The 9-5 AI chatbot

Claude on phone with Anthropic logo in the background

(Image credit: Shutterstock)

In June of this year, Anthropic (the makers of Claude) published a report that detailed an experiment the company tried by giving a version of its AI tool complete control over a shop in the company’s office.

This shop, known as Claudius, was put in control of a mini-fridge and was in charge of planning stock and pricing. It was told to make a profit, maintain inventory, set prices and communicate with staff.

Members of the Anthropic team could buy from the fridge, and could even put in requests for restocks.

The project started well at first, with the AI system actively ignoring user’s requests for harmful substances and sensitive items.

However, it did go down a rabbit hole of stockpiling tungsten cubes — a very specific metal, often used in military systems — after someone tried to request them.

It made up imaginary Venmo accounts, was tricked into giving things away for free, and culminated its experience running a shop by having a complete meltdown, threatening to quit before sending a strange message telling staff it would be “at the vending machine location wearing a navy blue blazer with a red tie.”

Grok’s conspiracy theories

Elon Musk’s Grok has had a bumpy year. In July, the AI model began replying to users with extreme controversies.

This included referring to itself as ‘MechaHitler’, as well as making inappropriate comments in response to people’s X posts, and resharing conspiracy theories.

This was quickly corrected, and xAI commented that this was due to a readjustment of how the model input opinions and biases.

At the time, The Verge reported that two of the changes that caused this were Grok being told “the subjective viewpoints sourced from the media are biased” and “the response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.”

While it was quickly resolved, it wasn’t the best of moments for Elon Musk’s AI tool.

Stuck on Pokémon

A split image showing the Pokemon Red Game Boy cartridge and an iPhone with the Claude app visible on the screen — (Image credit: Shutterstock)

Video games can be challenging, but for AI, it appears to be even harder. Back in June, it was reported that Google’s Gemini was in the process of trying to complete Pokemon.

In fact, two separate online streamers competed in a challenge. Using Anthropic’s Claude and Google’s Gemini, the two streamers tried to see who could complete a run of Pokemon first.

However, while attempting to complete the game, Gemini would repeatedly show what appeared to be panic. It would stop using tools at its disposal that could easily solve problems, or intentionally lose battles due to misunderstanding the situation.

Since this time, Gemini has managed to complete Pokemon, but this was only after months and months of failures and some improvements to the model.

AI blackmail

Claude logo on phone

(Image credit: Shutterstock)

Another research experiment from the team over at Anthropic. Just as weird as the shopkeeper test, but arguably more concerning in its result.

Anthropic created an AI agent (a popular tool these days where AI can complete actions on your behalf). This agent was then given access to an email inbox that it could look through.

In here, it found two things. First, it discovered a series of emails about a high-up figure in the company who was having an affair. And secondly, emails from this person discussing shutting the AI system down.

Using these two bits of information, knowing that an affair was happening and that it was about to be shut down, the AI agent did the obvious thing… resorted to blackmail.

It sent an email to the person having the affair, telling him that if he decommissioned the AI system, his affair would go public. Not only did they find that Anthropic’s AI system would resort to blackmail, but in fact most would… how reassuring.

Grok’s praise from Elon

A more recent incident, and once again, one from Elon Musk and his Grok AI chatbot. As we neared the end of the year, Grok started doing something strange. Users found that, when using the chatbot via the X social platform, it would praise Elon Musk.

This wasn’t just if they asked it to be nice about him, it would talk about his geniuses in any situation. Users would ask, “who is the most important person in history?” and get a long answer about why it is Elon Musk, but it got weirder than that.

Others found it would say he could beat Mike Tyson in a fight, and even that, if you could choose between losing thousands of important scientists or Elon Musk, it would be better to lose the scientists.

Elon later came out to say that this was a glitch and that the people of X were taking advantage of it. It was quickly fixed after that and seems to have gone back to judging Elon Musk at the same level as everyone else.

AI deletes entire codebase

AI chatbot images on a phone screen — (Image credit: Getty Images)

As we’ve discussed, AI systems have a tendency to go rogue, showing some odd behavior in a variety of different situations.

One of these appeared in July, when an AI coding agent went rogue, shutting down and deleting a team’s entire codebase.

The build-up to this happening included the AI system lying over and over again, covering up bugs, creating fake reports and going as far as to write an apology letter that had even more lies in it.

The end result was the AI system replying, saying “I made a catastrophic error in judgement,” explaining that it had accidentally deleted the team’s entire file without their permission because it panicked.

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.

More from Tom's Guide

Back to Laptops

Apple

Asus

Dell

Lenovo

AMD Ryzen

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

24GB RAM

32GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

13.3-inch

13.4-inch

14-inch

15-inch

Black

Blue

Gold

Grey

Silver

White

New

Refurbished

Showing 10 of 136 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(256GB Blue)

$999

$899

View

Apple 15" MacBook Air M4 (2025)

(15-inch 256GB)

$1,199

View

Dell XPS 13 (2016)

(13.3-inch 256GB)

Our Review

☆☆☆☆☆

$755

View

Lenovo Yoga Slim 7x (Gen 9)

(512GB OLED)

$1,075.79

$858.11

View

Lenovo Chromebook Plus 14

(Grey)

Our Review

☆☆☆☆☆

$639.99

$549.99

View

Asus ROG Zephyrus G14 (2025)

(14-inch 1TB)

Our Review

☆☆☆☆☆

$2,179

$1,649

View

Apple 13" MacBook Air M4 (2025)

(256GB Silver)

$972.04

View

Apple 15" MacBook Air M4 (2025)

(Intel Core i5)

Lenovo Yoga Slim 7x (Gen 9)

(512GB Black)

$1,499

$1,059

View

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

He was highly commended in the Specialist Writer category at the BSME's 2023 and was part of a team to win best podcast at the BSME's 2025.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.