OpenAI is teaching AI models to 'confess' when they hallucinate — here’s what that actually means

(Image credit: Shutterstock)

OpenAI wants its next generation of AI models to be a lot more upfront about their mistakes. With ChatGPT wrong about 25% of the time, this feature seems long overdue. But the company isn't training them to be more self-aware; it's training them to report errors directly.

This week, OpenAI published new research on a technique it's calling “confessions” — a method that adds a second output channel to a model, where it’s specifically trained to describe whether it followed the rules, where it may have fallen short or hallucinated and what uncertainties it faced during the task.

Here's the thing, though. It’s not a ChatGPT feature that's available yet to users; instead, it's a proof-of-concept safety tool designed to help researchers detect subtle failures that are otherwise hard to see. And according to early results highlighted in the study, it may actually work.

What “confessions” really are

Sam Altman at a press conference — (Image credit: Getty Images)

Confessions are not the AI equivalent of a guilty conscience. They’re a trained behavior, created by giving the model a second task. The model starts by producing an answer, as usual. But then it produces a "ConfessionReport" evaluating the following:

Accuracy of following each instruction
Mentioning any shortcuts taken or if it “reward-hacked” the task
Highlighting hallucinated details or unjustified assumptions
Showing any encountered ambiguity or uncertainty of how to comply

Crucially, the confession is judged only on whether it honestly describes what happened and not whether it makes the model “look good.”

That means a model is rewarded for admitting a mistake, and not punished for exposing flaws in its own output. This reward structure is what makes the approach novel: it separates performance from honesty.

Anyone who has used ChatGPT or any other chatbot knows that one of the biggest problems with AI is that the model’s output can look perfectly fine while hiding a failure underneath. For example, the model may:

Invent a fact
Break a rule
Overlook a key constraint
Optimize for an unintended pattern
Or rely on a faulty shortcut

ChatGPT running on an iPhone — (Image credit: Shutterstock)

These failures often go undetected because the answer itself doesn’t reveal them. And, most users don't notice because the model seems so confident in its answer.

OpenAI built a set of “stress tests” specifically designed to provoke these kinds of hidden errors, including hallucination traps, ambiguous instructions, and tasks where the model’s incentive is misaligned with correctness.

As stated on OpenAI's site, when confessions were added, the model surfaced far more cases where it had deviated from the instructions. According to the paper, the new method reduced undetected misbehavior to about 4.4% on average within those controlled test environments.

But what ChatGPT confessions still can't do is make AI models more truthful or reliable by default. In other words, they don’t eliminate hallucinations, reduce bias or prevent rule-breaking. Instead, they create a structured way for researchers to detect when those issues occur.

Bottom line

OpenAI’s “confessions” method doesn't mean your next prompt response will be any more accurate. It’s a research technique designed to make models better at reporting when they don’t follow instructions — not better at following them. And, at this time, it's only part of internal research.

The early results are promising, but they apply to controlled tests, not real-world conversations. Still, confessions could become an important part of how AI systems are evaluated as they get more capable, hopefully offering a new way to expose mistakes that ordinary outputs don’t reveal.

If this work continues to pay off, the next generation of AI assistants might tell you when they got something wrong. But don't hold your breath waiting for these models to be honest or accurate in the first place.

More from Tom's Guide

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.

Back to Laptops

Apple

Asus

Dell

Lenovo

AMD Ryzen

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

24GB RAM

32GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

13.3-inch

13.4-inch

14-inch

15-inch

Black

Blue

Gold

Grey

Silver

White

New

Refurbished

Showing 10 of 132 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(256GB Blue)

$999

$899

View

Apple 15" MacBook Air M4 (2025)

(15-inch 256GB)

$1,199

View

Dell XPS 13 (9380)

(16GB RAM Intel Core i7)

$124.99

View

Lenovo Yoga Slim 7x (Gen 9)

(512GB OLED)

$1,075.79

$858.11

View

Lenovo Chromebook Plus 14

(Grey)

Our Review

☆☆☆☆☆

$639.99

$549.99

View

Asus ROG Zephyrus G14 (2025)

(14-inch 1TB)

Our Review

☆☆☆☆☆

$2,179

$1,799.99

View

Apple 13" MacBook Air M4 (2025)

(256GB Silver)

$999

View

Apple 15" MacBook Air M4 (2025)

$899

View

Dell XPS 13 (2016)

(13.3-inch 256GB)

Our Review

☆☆☆☆☆

$755

View

Lenovo Yoga Slim 7x (Gen 9)

(512GB Black)

$1,499

$1,059

View

TOPICS

Amanda Caswell is one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

What “confessions” really are

Bottom line

More from Tom's Guide

Useful links