Meta just dropped an open source GPT-4o style model — here’s what it means
Meta Chameleon can handle images and text with ease
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Meta publicly released a new family of AI models, called Chameleon, which are comparable to the more commercial tools like Gemini Pro and GPT-4V.
It originally detailed all the models’ nuts and bolts in a paper which shows that Chameleon, which comes with a 7 billion and a 34 billion parameter version, is capable of understanding and generating images and text.
Chameleon can also process combinations of text and images (which could be related to each other) and generate meaningful responses, Meta says.
In other words, you could take a picture of the contents of your fridge and ask it what you can cook using only the ingredients you have. This is something not possible with the Llama generation of AI models and brings open source closer to the higher profile mainstream vision models from OpenAI and Google.
After the paper’s publication, the Fundamental AI Research (FAIR) team at Meta has now released the model publicly for research purposes, albeit with some limitations.
What's new in Meta Chameleon?
Chameleon demonstrates general capabilities, including SOTA performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. pic.twitter.com/bui0JSdNdnMay 17, 2024
The paper’s authors say that the key to Chameleon’s success is its fully token-based architecture. The model learns to reason over images and text jointly which is not possible in the case of models that use separate encoders for each input.
Technical challenges Meta’s team had to overcome included ones concerning optimization stability and scaling. It did so using new methods and training techniques.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Ultimately for the user, it means that Chameleon should be able to handle prompts that call for outputs with both text and images with ease.
Users could for example ask Chameleon to create an itinerary to experience a summer solstice and the AI model should be able to provide relevant images to accompany the text it generates.
The researchers said that according to human evaluations, Chameleon matches or exceeds the performance of models like Gemini Pro and GPT-4V when the prompts or outputs contained mixed sequences of both images and text. However, evaluations on interpreting infographics and charts were excluded.
'They've progressed significantly'
The model Meta released publicly can only generate text outputs and its safety levels were purposefully increased.
However, in May, Armen Aghajanyan, one of the people who worked on the project, wrote on X that their models “were done training 5 months ago” and claimed they’ve “progressed significantly since then”.
For researchers, Chameleon represents a source of inspiration for alternative ways to train and design AI models. For the rest of us, it means we’re one step closer to having AI assistants that can understand better the context they’re operating in without having to use one of the closed platforms.
More from Tom's Guide
- Anthropic just published research on how to give AI a personality — is this why Claude is so human-like?
- Google DeepMind creates AI model that can add sound to silent videos
- Meta quietly building a new AI model — here's what to expect

Christoph Schwaiger is a journalist, mainly covering AI, health, and current affairs. His stories have been published by Tom's Guide, Live Science, New Scientist, and the Global Investigative Journalism Network, among other outlets. Christoph has appeared on LBC and Times Radio. Additionally, he previously served as a National President for Junior Chamber International (JCI), a global leadership organization, and graduated cum laude from the University of Groningen in the Netherlands with an MA in journalism. You can follow him on X (Twitter) @cschwaigermt.









