Apple just revealed an AI technique to better compete against ChatGPT
Apple’s new strategy reduces some of a newer LLM’s mistakes by up to 40%
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
When an AI lab updates its underlying large language model it can often result in unexpected behavior including a complete change to the way it responds to queries. Researchers at Apple have developed new ways to improve a user’s experience when an AI model they were used to working with gets upgraded.
In a paper, Apple's researchers said that users develop their own system to interact with an LLM, including prompt styles and techniques. Switching to a newer model can be a draining task that dampens their experience using the AI model.
An update could result in forcing users to change the way they write prompts and while early adopters of models from ChatGPT might accept this, a mainstream audience using iOS will likely find this unacceptable.
To solve this issue, the team looked into creating metrics to compare regression and inconsistencies between different model versions and also developed a training strategy to minimize those inconsistencies from happening in the first place.
While it isn't clear whether this will be part of a future iOS Apple Intelligence, it's clear Apple is preparing itself for what happens when it does update its underlying models, ensuring Siri responds in the same way, to the same queries in future.
Making AI backwards compatible
Apple presents MUSCLEA Model Update Strategy for Compatible LLM EvolutionLarge Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance… pic.twitter.com/ATm2zM4PocJuly 15, 2024
Using their new method the researchers said they managed to reduce negative flips, which is when an old model gives a correct answer while a newer model gives an incorrect one, by up to 40%.
The paper’s authors also argued in favor of ensuring that mistakes a new model makes are consistent with those you might see an older model make.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
“We argue that there is value in being consistent when both models are incorrect,” they said, adding that, “A user may have developed coping strategies on how to interact with a model when it is incorrect.” Inconsistencies would therefore lead to user dissatisfaction.
Flexing their MUSCLE
Given the fast pace with which chatbots like ChatGPT and Google’s Gemini are being updated, Apple’s research has the potential to make newer versions of these tools more dependable
They called the method used to overcome these obstacles MUSCLE (an acronym for Model Update Strategy for Compatible LLM Evolution) which does not require the base model’s training to be changed and relies on training adapters, which are basically plugins for LLMs. They referred to these as compatibility adapters.
To test if their system worked, the research team updated LLMs like Llama and Phi and sometimes found negative flips of up to 60% in different tasks. Tests they ran included asking the updated models math questions to see if they still got the answer to a particular problem correct.
Using their proposed MUSCLE system, the researchers say they managed to mitigate quite a number of those negative flips. Sometimes by up to 40%.
Given the fast pace with which chatbots like ChatGPT and Google’s Gemini are being updated, Apple’s research has the potential to make newer versions of these tools more dependable. It would be a pity if users had to make tradeoffs between switching to newer models but suffering from a worse user experience.
More from Tom's Guide
- iOS 18 Wallet: 3 big changes coming to your iPhone
- iPhone 16 Pro Max tipped for major battery upgrade
- iPhone 16 could get a major camera boost thanks to Samsung — here’s why

Christoph Schwaiger is a journalist, mainly covering technology, health, and current affairs. His stories have been published by Tom's Guide, Live Science, New Scientist, and the Global Investigative Journalism Network, among other outlets. Christoph has appeared on LBC and Times Radio. Additionally, he previously served as a National President for Junior Chamber International (JCI), a global leadership organization, and graduated cum laude from the University of Groningen in the Netherlands with an MA in journalism. You can follow him on X (Twitter) @cschwaigermt.










