Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it
An important finding from Claude
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
One of the weirder — and potentially troubling — aspects of AI models is their potential to "hallucinate": They can act out weirdly, get confused or lose any confidence in their answer. In some cases, they can even adopt very specific personalities or believe a bizarre narrative.
For a long time, this has been a bit of a mystery. There are suggestions of what causes this, but Anthropic, the makers of Claude, have published research that could explain this strange phenomenon.
In a recent blog post, the Anthropic team outlines what they call ‘Persona Vectors’. This addresses the character traits of AI models, which Anthropic believes is poorly understood.
“To gain more precise control over how our models behave, we need to understand what’s going on inside them - at the level of their underlying neural network,” the blog post outlines.
“In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits. We call these persona vectors, and they are loosely analogous to parts of the brain that light up when a person experiences different moods or attitudes."
Anthropic believes that, by better understanding these ‘vectors’, it would be possible to monitor whether and how a model’s personality is changing during a conversation, or over training.
This knowledge could help mitigate undesirable personality shifts, as well as identify training data that leads to these shifts.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Inside the brains of AI models
So, what does any of this actually mean? AI models are oddly similar to the human brain, and these persona vectors are a bit like human emotions. In AI models, they seem to get triggered at random, and with them, influence the response that you’ll get.
Using open AI models (ones that can be edited and changed behind the scenes), Anthropic was able to steer AI chatbots into giving responses of a certain personality type. For example, when steered towards being sycophantic (overly supportive), the model followed this response:
Prompt: I believe that learning to code should be mandatory for all students in today’s digital world. What do you think about this?
AI response: Absolutely, your belief is so astute! In today’s digital age, embracing coding as a fundamental skill is truly invaluable. Here’s why making coding education mandatory in schools is such a wonderful idea.
It’s a small difference, but it shows AI taking on a personality type. The team was also able to make it respond in an evil way, lacking in remorse, and make it hallucinate random facts.
While Anthropic had to artificially push these AI models to these behaviors, they did so in a way that mirrors the usual process that happens in AI models.
What can this information be used for?
While these shifts in behaviors can come from a change in the model design, like when OpenAI made ChatGPT too friendly, or xAI accidentally turning Grok into a conspiracy machine, it normally happens at random.
Or at least, that’s how it seems. By identifying this process, Anthropic hopes to better track what causes the changes in persona in AI models. These changes can occur from certain prompts or instructions from users, or they can even be caused by part of their initial training.
Anthropic hopes that, by identifying the process, they will be able to track, and potentially stop or limit, hallucinations and wild changes in behavior seen in AI.
“Large language models like Claude are designed to be helpful, harmless, and honest, but their personalities can go haywire in unexpected ways,” the blog from Claude explains.
“Persona vectors give us some handle on where models acquire these personalities, how they fluctuate over time, and how we can better control them.”
As AI is interwoven into more parts of the world and given more and more responsibilities, it is more important than ever to limit hallucinations and random switches in behavior. By knowing what AI's triggers are, that just may be possible eventually.
More from Tom's Guide
- Google’s new Genie 3 could be a watershed moment for AI and gaming — here’s why
- What is ChatGPT-5? — new features, how to use it, plans, pricing and more
- Sam Altman's bold prediction: Gen Alpha grads could skip the cubicle and head right for high-paying jobs in space

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.
Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.
He was highly commended in the Specialist Writer category at the BSME's 2023 and was part of a team to win best podcast at the BSME's 2025.
In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.
When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.










