Meta just dropped an open source GPT-4o style model — here’s what it means

Adobe Firefly image of a chameleon sitting on a computer chip

(Image credit: Adobe Firefly/Future AI)

Meta publicly released a new family of AI models, called Chameleon, which are comparable to the more commercial tools like Gemini Pro and GPT-4V.

It originally detailed all the models’ nuts and bolts in a paper which shows that Chameleon, which comes with a 7 billion and a 34 billion parameter version, is capable of understanding and generating images and text.

After the paper’s publication, the Fundamental AI Research (FAIR) team at Meta has now released the model publicly for research purposes, albeit with some limitations.

What's new in Meta Chameleon?

Chameleon demonstrates general capabilities, including SOTA performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. pic.twitter.com/bui0JSdNdnMay 17, 2024

The paper’s authors say that the key to Chameleon’s success is its fully token-based architecture. The model learns to reason over images and text jointly which is not possible in the case of models that use separate encoders for each input.

Technical challenges Meta’s team had to overcome included ones concerning optimization stability and scaling. It did so using new methods and training techniques.

Ultimately for the user, it means that Chameleon should be able to handle prompts that call for outputs with both text and images with ease.

Users could for example ask Chameleon to create an itinerary to experience a summer solstice and the AI model should be able to provide relevant images to accompany the text it generates.

The researchers said that according to human evaluations, Chameleon matches or exceeds the performance of models like Gemini Pro and GPT-4V when the prompts or outputs contained mixed sequences of both images and text. However, evaluations on interpreting infographics and charts were excluded.

'They've progressed significantly'

The model Meta released publicly can only generate text outputs and its safety levels were purposefully increased.

However, in May, Armen Aghajanyan, one of the people who worked on the project, wrote on X that their models “were done training 5 months ago” and claimed they’ve “progressed significantly since then”.

For researchers, Chameleon represents a source of inspiration for alternative ways to train and design AI models. For the rest of us, it means we’re one step closer to having AI assistants that can understand better the context they’re operating in without having to use one of the closed platforms.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

128GB

512GB

1TB

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 25 deals

Filters☰

(15-inch 512GB)

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$387.85

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

(15-inch 512GB)

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

See more AI News

Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.