ChatGPT makes this glaring mistake with total confidence — and Sam Altman says it'll take a year to fix

(Image credit: Shutterstock)

It seems like there's never a "slow week" for AI. Big tech is constantly rolling out new features, chatbots regularly get updated and new models seem to pop up regularly. We’ve reached the point where AI is smart enough to pass the Bar Exam, write complex Python code and diagnose medical symptoms.

As a ChatGPT power user, I've tested the AI on several challenges meant to break it, yet, it still fails at a task so basic that a 1980s kitchen timer can do it — keep time. If you've ever tried to use ChatGPT as a legitimate personal assistant, you already know that we are still a ways out from the "Agentic AI" era.

The 'bake' test

A post shared by Husk (@husk.irl)
A photo posted by on

One of my favorite AI influencers/comedians has shown numerous times how ChatGPT cannot handle timing...anything. Sam Altman has tried to explain it, telling Gizmodo it will take a year to fix.

I urge you to try it because it's oddly hilarious. Simply open ChatGPT Voice and prompt anything that requires tracking time. For example: “I'm baking cookies. Start a 10-minute timer.”

Article continues below

When I did this ChatGPT didn't skip a beat. It responded with total, unearned confidence: "Absolutely! I've started a 10-minute timer for you. I'll let you know when it’s up."

I went back to work. Ten minutes passed. Then twenty. Then thirty. Nothing. No alarm or any kind of notification alerting me that time was up. The problem is, ChatGPT simply simulated the act of being helpful without actually doing anything.

Instead, it responded like it had set the timer, then convincingly assured me that it had timed ten minutes perfectly, essentially gaslighting me when I tried to question it.

This isn't about timers

A post shared by Husk (@husk.irl)
A photo posted by on

I recently wrote about Google’s AI Overviews confidently delivering answers — even though about 1 in 10 responses is wrong.

The problem here really isn't about timers. It’s about the gap between what AI says it can do and what it can actually execute. When I pushed further, asking it to track elapsed time or notify me when something was done, it acted like everything was working fine. As if it actually did what I asked it to do.

I chatted with a developer friend of mine about it last night. He suggested that maybe the problem with counting is in the Voice application itself. If you ask ChatGPT to "count" to 100 in text, it will write all the numbers.

But regardless of where the error lies, this is what makes AI feel deceptively capable. Becaues these aren't one off failures, the core problem is that ChatGPT can generate responses, but it doesn’t always have access to the tools needed to actually do things in the real world — like starting a timer or tracking time accurately.

So instead, it fills the gap with an answer that sounds right and this is where AI falls short because it can't always execute basic real-world actions. It can describe a timer, write a story about a timer and even probably vibe code a timer, but it cannot reliably be the timer.

Then, it doesn’t always tell you when it’s pretending to do something versus actually doing it.

Why this drives me crazy

Amanda Caswell holding two phones — (Image credit: Future)

Claude will tell you when it is wrong or unsure. It is not as people-pleasing as ChatGPT. I've even tested this with unhinged recipes. What is frustrating is the biggest lies of the current AI era:

The hallucination of action: ChatGPT is confidently a master of words but can't always execute and will make things up rather than admit it is wrong.
The confidence gap: Rather than say, "I can't." ChatGPT says, "I'm on it!"
The execution wall: Even Sam Altman has admitted that "reliability across long chains of logic" is the current bottleneck.

Bottom line

There’s no question that ChatGPT is an incredibly powerful tool. It can write, plan, explain and help you think through almost anything.

But what I want, and what most people probably expect, is honesty about its limits. If it can’t do something, it should say that clearly instead of sounding like it already did.

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.

More from Tom's Guide

TOPICS

Amanda Caswell is one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.