Are we close to the holodeck? Google unveils Genie — an AI model creating playable virtual worlds from a single image

(Image credit: Google Genie)

Google researchers have published a new artificial intelligence model that can take a text prompt, sketch or idea and turn it into a virtual world you can interact with and play.

Named Genie, the virtual world model was trained on gameplay and other videos found online and is currently only a research preview. The games are more 2D platformer than full VR.

While this might still be some way off from a true holodeck like the ones in Star Trek, it does give an indication that it could be possible to one day walk into a room and create a fully interactive adventure from nothing more than a few words.

What is Google Genie?

In the AI world people talk about opening Pandora’s Box or letting the genie out of the lamp to describe the reality of being able to create content from relatively little effort. The reality is that, much like a human spends years learning a skill, AI models require extensive training.

You can’t just rub a lamp and hope a genie will come out, first you have to fill the lamp with knowledge and ability. In the case of Genie that came from a “large dataset of publicly available Internet videos” and a lot of effort from engineers to create code and weights for the model.

Google DeepMind team lead for Genie, Tim Rocktäschel, wrote on X that the team focused on scale, using a dataset made up of more than 200,000 hours of video from 2D platformers.

It was trained unsupervised and using unlabelled videos. This allowed it to learn a diverse range of character motion, control and action and do so in a consistent way. As a result, "our model can convert any image into a playable 2D world," explained Rocktäschel.

What does this really mean?

There are numerous tools on the market that can take a graphic designer’s mock-up of a website or app and turn it into code.

It isn’t always the best code but it creates a functional prototype that can be used. AI tools also exist to make a website from a text prompt.

With Genie you can basically give it a sketch on a piece of paper, a perfectly crafted piece of digital art or even an AI generated depiction of a 2D world and Genie does the rest.

I am really excited to reveal what @GoogleDeepMind's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts. pic.twitter.com/TnQ8uv81wcFebruary 26, 2024

It generates the images and other assets needed to make your sketch into a fully realized open world and then predicts the next pixel frame based on provided actions from the player..

The creators used a tokenizer that compressed the video into discrete tokens. That is then sent to an action model to encode transitions between two frames as one of eight latent actions. Then another model is used to predict future frames.

The solution to bringing it all together was the same as the breakthrough OpenAI had with Sora — lots of data and just as much compute power.

What happens next with Genie?

Genie doesn’t have a release date and as a research project its unclear if it will ever become a real product. There is a chance that one day you’ll be able to lift one of the best Android phones and ask Assistant to make you a game about dodging vampires — but not for a few years.

What's more important is the underlying technology and new approaches to content generation developed during its creation, including the unlabelled learning leading to open worlds.

Rocktäschel called out Sora on X, specifically the idea it is a “world model”. He said that while it is impressive and visually stunning “a world model needs ‘actions’.” Adding that “Genie is an action-controllable world model, but trained fully unsupervised from videos.”

The other big breakthrough that came with Genie is a deeper understanding of real-world physics, which could be used in training robots to more effectively navigate environments or complete tasks not in their training.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

128GB

256GB

512GB

1TB

Black

Grey

Silver

EMMC

SSD

Showing 10 of 26 deals

Filters☰

Apple MacBook Air M2 2022

(13.6-inch 256GB)

$999

$889.95

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

$1,799

$1,299

View

Apple MacBook Air M2 2022

(512GB SSD)

$1,499

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$388

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

$1,799

$1,664.95

View

Apple MacBook Air M2 2022

(13.6-inch 512GB)

$1,199

$998.98

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

See more AI News

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on AI and technology speak for him than engage in this self-aggrandising exercise. As the former AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.