Nobody knows how ChatGPT thinks — but OpenAI says it’s closer to cracking the mystery

ChatGPT app on iPhone
(Image credit: Shutterstock)

Creators of AI chatbots like ChatGPT can explain how they train them and even how the underlying technology works but then can’t fully explain what their creations do with that information they've been trained on. 

It is an important issue to solve, as often AI developers are taken by surprise at what their creations can do — and can't do. For example the Udio team created an AI music model but found it could write and perform standup comedy.

Even the leaders in the field struggle to grapple with how to work out what LLMs and other frontier models are doing with the information, but it seems OpenAI has made a first step into decoding this mystery.

While a lot remains unknown, OpenAI researchers have found 16 million features in GPT-4 which they say reveal what the model is ‘thinking’ about. 

What have OpenAI discovered?

Graphic to show encoding and decoding of data with AI models

(Image credit: OpenAI)

We currently don't understand how to make sense of the neural activity within language models.


They did so using a technology called sparse autoencoders, which are like machine learning models which can identify the ‘more important’ features. This is opposed to other types of autoencoders that consider all features, making them less useful.

Say you’re discussing cars with a friend. You still have knowledge about how to make your favorite dish but that concept is not very likely to come up in the car discussion. 

OpenAI said sparse autoencoders find out which are the more useful set of features or concepts important to generate an answer to a prompt. Similar to the smaller set of concepts a person relies on in any particular discussion.

However, while sparse autoencoders can find features in a given model, that’s only one step towards interpreting it. More work is needed to understand how a model fully uses those features. 

OpenAI thinks this work is important because understanding how models work means they can find better ways to approach model safety.

One part of a bigger picture

Another challenge is the training of sparse autoencoders which is made complex for various reasons including requiring more computational power to handle the necessary restrictions and avoiding overfitting. 

However, OpenAI says it developed new state-of-the-art methodologies which allow it to scale out sparse autoencoders to tens of millions of features on frontier AI models such as GPT-4 or GPT-4o.

To check the interpretability of such features, OpenAI listed fragments of documents where these features activate. These included phrases related to price increases and rhetorical questions. 

What happens next?

Sam Altman CEO of OpenAI

(Image credit: Getty Images)

While it’s a first step that shows what large language models are focusing on, OpenAI also admits there are several limitations.

For starters, many of the features they discovered are still hard to interpret with many activating with no clear pattern. Furthermore, they also don’t yet have good ways to check the validity of interpretations.

In the short term, OpenAI hopes the features they found can help monitor and steer language models behaviors. 

For the long term, OpenAI wants interpretability to provide new ways to reason about model safety and robustness. Understanding how and why an AI model works in the way it does will help people trust it when it’s making important decisions.

More from Tom's Guide

Back to MacBook Air
Storage Size
Screen Size
Storage Type
Any Price
Showing 10 of 99 deals
Load more deals
Christoph Schwaiger

Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.