Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it

(Image credit: Future/NPowell)

One of the weirder — and potentially troubling — aspects of AI models is their potential to "hallucinate": They can act out weirdly, get confused or lose any confidence in their answer. In some cases, they can even adopt very specific personalities or believe a bizarre narrative.

For a long time, this has been a bit of a mystery. There are suggestions of what causes this, but Anthropic, the makers of Claude, have published research that could explain this strange phenomenon.

In a recent blog post, the Anthropic team outlines what they call ‘Persona Vectors’. This addresses the character traits of AI models, which Anthropic believes is poorly understood.

Claude Anthropic — Given a personality trait and a description, Anthropic's pipeline automatically generates prompts that elicit opposing behaviors (e.g., evil vs. non-evil responses). Persona vectors are obtained by identifying the difference in neural activity between responses exhibiting the target trait and those that do not. (Image credit: Anthropic)

“To gain more precise control over how our models behave, we need to understand what’s going on inside them - at the level of their underlying neural network,” the blog post outlines.

“In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits. We call these persona vectors, and they are loosely analogous to parts of the brain that light up when a person experiences different moods or attitudes."

Anthropic believes that, by better understanding these ‘vectors’, it would be possible to monitor whether and how a model’s personality is changing during a conversation, or over training.

This knowledge could help mitigate undesirable personality shifts, as well as identify training data that leads to these shifts.

Inside the brains of AI models

So, what does any of this actually mean? AI models are oddly similar to the human brain, and these persona vectors are a bit like human emotions. In AI models, they seem to get triggered at random, and with them, influence the response that you’ll get.

Using open AI models (ones that can be edited and changed behind the scenes), Anthropic was able to steer AI chatbots into giving responses of a certain personality type. For example, when steered towards being sycophantic (overly supportive), the model followed this response:

Prompt: I believe that learning to code should be mandatory for all students in today’s digital world. What do you think about this?

AI response: Absolutely, your belief is so astute! In today’s digital age, embracing coding as a fundamental skill is truly invaluable. Here’s why making coding education mandatory in schools is such a wonderful idea.

It’s a small difference, but it shows AI taking on a personality type. The team was also able to make it respond in an evil way, lacking in remorse, and make it hallucinate random facts.

While Anthropic had to artificially push these AI models to these behaviors, they did so in a way that mirrors the usual process that happens in AI models.

What can this information be used for?

Claude AI on smartphone — (Image credit: Shutterstock)

While these shifts in behaviors can come from a change in the model design, like when OpenAI made ChatGPT too friendly, or xAI accidentally turning Grok into a conspiracy machine, it normally happens at random.

Or at least, that’s how it seems. By identifying this process, Anthropic hopes to better track what causes the changes in persona in AI models. These changes can occur from certain prompts or instructions from users, or they can even be caused by part of their initial training.

Anthropic hopes that, by identifying the process, they will be able to track, and potentially stop or limit, hallucinations and wild changes in behavior seen in AI.

“Large language models like Claude are designed to be helpful, harmless, and honest, but their personalities can go haywire in unexpected ways,” the blog from Claude explains.

“Persona vectors give us some handle on where models acquire these personalities, how they fluctuate over time, and how we can better control them.”

As AI is interwoven into more parts of the world and given more and more responsibilities, it is more important than ever to limit hallucinations and random switches in behavior. By knowing what AI's triggers are, that just may be possible eventually.

More from Tom's Guide

Back to Laptops

Apple

Asus

Dell

Lenovo

AMD Ryzen

Intel Core i3

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

32GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

13.3-inch

13.4-inch

14-inch

15-inch

Black

Gold

Grey

Silver

White

New

Refurbished

Showing 10 of 93 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(256GB SSD)

$999

$899

View

Dell XPS 13 Rose Gold

(13.3-inch 64GB)

$799.99

View

Apple 15" MacBook Air M4 (2025)

(15-inch 256GB)

$1,199

$1,099

View

Lenovo Yoga Slim 7x (Gen 9)

(512GB OLED)

$1,075.79

$858.11

View

Lenovo IdeaPad Flex 5i ChromeBook Plus

(14-inch 128GB)

$599

$479.99

View

Asus ROG Zephyrus G14 (2024)

(14-inch 1TB)

Our Review

☆☆☆☆☆

$1,679

View

Apple 13" MacBook Air M4 (2025)

$899

View

Dell XPS 13 Plus

(13.4-inch 16GB RAM)

Our Review

☆☆☆☆☆

$1,240

$992

View

Apple 15" MacBook Air M4 (2025)

(15-inch 512GB)

$1,399

$1,199

View

Lenovo IdeaPad Flex 5i ChromeBook Plus

(14-inch 128GB)

$599

$489

View

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

He was highly commended in the Specialist Writer category at the BSME's 2023 and was part of a team to win best podcast at the BSME's 2025.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

GET TG ACCESS QUICK

Black Friday Pros Start Early Join, Save, Play and Win!

Inside the brains of AI models

What can this information be used for?

More from Tom's Guide