Nobody knows how ChatGPT thinks — but OpenAI says it’s closer to cracking the mystery

(Image credit: Shutterstock)

Creators of AI chatbots like ChatGPT can explain how they train them and even how the underlying technology works but then can’t fully explain what their creations do with that information they've been trained on.

It is an important issue to solve, as often AI developers are taken by surprise at what their creations can do — and can't do. For example the Udio team created an AI music model but found it could write and perform standup comedy.

Even the leaders in the field struggle to grapple with how to work out what LLMs and other frontier models are doing with the information, but it seems OpenAI has made a first step into decoding this mystery.

While a lot remains unknown, OpenAI researchers have found 16 million features in GPT-4 which they say reveal what the model is ‘thinking’ about.

What have OpenAI discovered?

Graphic to show encoding and decoding of data with AI models — (Image credit: OpenAI)

We currently don't understand how to make sense of the neural activity within language models.
OpenAI

They did so using a technology called sparse autoencoders, which are like machine learning models which can identify the ‘more important’ features. This is opposed to other types of autoencoders that consider all features, making them less useful.

Say you’re discussing cars with a friend. You still have knowledge about how to make your favorite dish but that concept is not very likely to come up in the car discussion.

OpenAI said sparse autoencoders find out which are the more useful set of features or concepts important to generate an answer to a prompt. Similar to the smaller set of concepts a person relies on in any particular discussion.

However, while sparse autoencoders can find features in a given model, that’s only one step towards interpreting it. More work is needed to understand how a model fully uses those features.

OpenAI thinks this work is important because understanding how models work means they can find better ways to approach model safety.

One part of a bigger picture

We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse autoencoders at scale, disentangling GPT-4’s internal representations into 16 million features—which often appear to correspond to understandable concepts.… pic.twitter.com/UFP0EfEKSLJune 6, 2024

Another challenge is the training of sparse autoencoders which is made complex for various reasons including requiring more computational power to handle the necessary restrictions and avoiding overfitting.

However, OpenAI says it developed new state-of-the-art methodologies which allow it to scale out sparse autoencoders to tens of millions of features on frontier AI models such as GPT-4 or GPT-4o.

To check the interpretability of such features, OpenAI listed fragments of documents where these features activate. These included phrases related to price increases and rhetorical questions.

What happens next?

Sam Altman CEO of OpenAI — (Image credit: Getty Images)

While it’s a first step that shows what large language models are focusing on, OpenAI also admits there are several limitations.

For starters, many of the features they discovered are still hard to interpret with many activating with no clear pattern. Furthermore, they also don’t yet have good ways to check the validity of interpretations.

In the short term, OpenAI hopes the features they found can help monitor and steer language models behaviors.

For the long term, OpenAI wants interpretability to provide new ways to reason about model safety and robustness. Understanding how and why an AI model works in the way it does will help people trust it when it’s making important decisions.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

24GB RAM

128GB

256GB

512GB

1TB

Black

Brown

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 30 deals

Filters☰

Apple MacBook Air M3

(15-inch 256GB)

$1,274.95

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$389.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

(256GB 16GB RAM)

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

See more AI News

TOPICS

Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.