Study finds ChatGPT-5 is wrong about 1 in 4 times — here's the reason why

Holding phone with ChatGPT logo
(Image credit: Shutterstock)

The other day I was brainstorming with ChatGPT and all of a sudden it went into this long, fantasy story that had nothing to do with my queries. It was so ridiculous that it made me laugh. Lately, I haven't seen mistakes like this as often with text prompts, but I still see them pretty regularly with image generation.

These random moments when a chatbot strays from the task are known as hallucinations. What's odd is that the chatbot is so confident about the wrong answer it's giving; one of the biggest weakness of today's AI assistants. However, a new study from OpenAI argues these failures aren’t random, but a direct result of how models are trained and evaluated.

Why chatbots keep guessing when they shouldn’t

ChatGPT logo on phone in front of robot thinking

(Image credit: Shutterstock)

Research points to a structural issue causing hallucinations; essentially the problem stems from benchmarks and leaderboards ranking AI models and rewarding confident answers.

In other words, when a chatbot says “I don’t know,” it gets penalized in testing. That means the models are effectively encouraged to always provide an answer, even if they’re not sure it’s right.

In practice, that makes your AI assistant more likely to guess than admit uncertainty. For everyday queries, this can be harmless. But in higher-stakes cases, from medical questions to financial advice, those confident errors can quickly turn dangerous.

As a power user, that's why I always fact-check and ask the chatbot to cite the source. Sometimes if the information seems too far-fetched and I ask for a source, the chatbot will say something like, "Good catch!" or something similar, still not admitting it was wrong.

Newer models aren’t immune

ChatGPT-5 Image on a keyboard

(Image credit: ChatGPT AI generated image)

Interestingly, OpenAI’s paper found that reasoning-focused models like o3 and o4-mini actually hallucinate more often than some older models. Why? Because they produce more claims overall, which means more chances to be wrong.

So, if a model is “smarter” at reasoning, it really doesn’t make it more honest about what it doesn’t know.

What can fix this problem?

Person coding on computer

(Image credit: Shutterstock)

Researchers argue that the solution is to change how we score and benchmark AI. Instead of punishing models for saying “I’m not sure.” The most valuable tests should reward calibrated responses, uncertainty flags or the ability to defer to other sources.

That could mean your future chatbot might hedge more often, less “here’s the answer” and more “here’s what I think, but I’m not certain.” It may feel slower, but it could dramatically reduce harmful errors. Proving that crtitical thinking on our part is still important.

Why it matters for you

Person typing on laptop keyboard

(Image credit: Unsplash)

If you’re using popular chatbots including ChatGPT, Gemini, Claude or Grok you’ve almost certainly seen a hallucination. This research suggests it’s not entirely the model’s fault, but the way they are tested; as if it's a game testing which can be right most often.

For users, that means we need to be diligent and consider AI answers as a first suggestion and not the final word. And for developers, this is a sign that it's time to rethink how we measure success so that future AI assistants can admit what they don’t know instead of getting things completely wrong.

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

More from Tom's Guide

Category
Arrow
Arrow
Back to Laptops
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Screen Type
Arrow
Price
Arrow
Any Price
Showing 10 of 281 deals
Filters
Arrow
Show more
TOPICS
Amanda Caswell
AI Writer

Amanda Caswell is an award-winning journalist, bestselling YA author, and one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a bestselling author of science fiction books for young readers, where she channels her passion for storytelling into inspiring the next generation. A long-distance runner and mom of three, Amanda’s writing reflects her authenticity, natural curiosity, and heartfelt connection to everyday life — making her not just a journalist, but a trusted guide in the ever-evolving world of technology.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.