Are we close to the holodeck? Google unveils Genie — an AI model creating playable virtual worlds from a single image

Google Genie makes 2D worlds
(Image credit: Google Genie)

Google researchers have published a new artificial intelligence model that can take a text prompt, sketch or idea and turn it into a virtual world you can interact with and play.

Named Genie, the virtual world model was trained on gameplay and other videos found online and is currently only a research preview. The games are more 2D platformer than full VR.

While this might still be some way off from a true holodeck like the ones in Star Trek, it does give an indication that it could be possible to one day walk into a room and create a fully interactive adventure from nothing more than a few words.

What is Google Genie?

Google Genie

(Image credit: Google Genie)

In the AI world people talk about opening Pandora’s Box or letting the genie out of the lamp to describe the reality of being able to create content from relatively little effort. The reality is that, much like a human spends years learning a skill, AI models require extensive training.

You can’t just rub a lamp and hope a genie will come out, first you have to fill the lamp with knowledge and ability. In the case of Genie that came from a “large dataset of publicly available Internet videos” and a lot of effort from engineers to create code and weights for the model.

Google DeepMind team lead for Genie, Tim Rocktäschel, wrote on X that the team focused on scale, using a dataset made up of more than 200,000 hours of video from 2D platformers.

It was trained unsupervised and using unlabelled videos. This allowed it to learn a diverse range of character motion, control and action and do so in a consistent way. As a result, "our model can convert any image into a playable 2D world," explained Rocktäschel.

What does this really mean?

There are numerous tools on the market that can take a graphic designer’s mock-up of a website or app and turn it into code.

It isn’t always the best code but it creates a functional prototype that can be used. AI tools also exist to make a website from a text prompt.

With Genie you can basically give it a sketch on a piece of paper, a perfectly crafted piece of digital art or even an AI generated depiction of a 2D world and Genie does the rest.

See more

It generates the images and other assets needed to make your sketch into a fully realized open world and then predicts the next pixel frame based on provided actions from the player..

The creators used a tokenizer that compressed the video into discrete tokens. That is then sent to an action model to encode transitions between two frames as one of eight latent actions. Then another model is used to predict future frames.

The solution to bringing it all together was the same as the breakthrough OpenAI had with Sora — lots of data and just as much compute power.

What happens next with Genie?

Google Genie

(Image credit: Google Genie)

Genie doesn’t have a release date and as a research project its unclear if it will ever become a real product. There is a chance that one day you’ll be able to lift one of the best Android phones and ask Assistant to make you a game about dodging vampires — but not for a few years.

What's more important is the underlying technology and new approaches to content generation developed during its creation, including the unlabelled learning leading to open worlds.

Rocktäschel called out Sora on X, specifically the idea it is a “world model”. He said that while it is impressive and visually stunning “a world model needs ‘actions’.” Adding that “Genie is an action-controllable world model, but trained fully unsupervised from videos.”

The other big breakthrough that came with Genie is a deeper understanding of real-world physics, which could be used in training robots to more effectively navigate environments or complete tasks not in their training.

More from Tom's Guide

Back to Ultrabook Laptops
Storage Size
Screen Size
Any Price
Showing 10 of 63 deals
Load more deals
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?