Here comes the holodeck — new AI 3D video model creates animations of any object

Stable Video Diffusion 3D
(Image credit: StabilityAI)

StabilityAI has released a new artificial intelligence 3D video model that can turn a simple image prompt into a fully animated view of any object or series of objects. 

Built on top of the open source Stable Video Diffusion model, widely used by companies like Leonardo AI and StabilityAI themselves to power AI video generation, it takes it a step further and integrates three dimensional objects into the process. 

Stable Video 3D (SV3D) adds new depth to video generation, creating multi-view 3D meshes from a single image keeping greater consistency for objects within the video frame.

Emad Mostaque, founder and CEO of StabilityAI wrote on X: “Quite a lot still to come, the holodeck components all coming together. Every pixel will be generated.”

What is Stable Video 3D?

Stable Video 3D builds on technology pioneered in previous models including Stable Video Diffusion, the original Stable Diffusion and the Zero123 3D image model released by StabilityAI at the end of last year. 

At the time Mostaque said it was just the first in a series of 3D models coming out from the AI lab, and it seems they’re on a mission to make the Star Trek Holodeck a reality.

The new model comes in two variants. The first is SV3D_u, which creates orbital videos based on a single image input without any specific camera specifications. 

The second, SV3D_p, builds on the capability of the first and allows for single images and orbital views, leading to the creation of 3D videos ‘filmed’ along a specified camera path.

Essentially it analyzes the image you give it, creates multiple views of that object from different angles as if a camera was moving around it, then turns that into a video.

How well does Stable Video 3D work?

Quite a lot still to come, the holodeck components all coming together. Every pixel will be generated

Emad Mostaque, StabilityAI CEO

I haven’t been able to try SVD 3D yet but from the sample clips it seems to do a good job of not only capturing the object and predicting non-visible views, but also in camera motion.

So far all the clips have focused on single objects on a white background. This could prove useful for companies wanting to easily list a product with a full 360-degree view, although raises questions around authenticity as the reverse view would be predicted, not real.

I’d like to see how it evolves to handle more complex images and whether the camera controls can be applied to full scenes, such as rotating around two people talking or a vehicle on a road.

Some of the motion description techniques could be expanded on and applied to generative AI video to give greater degrees of control over how the camera moves within the clip.

It could also be used to create interactive objects or 3D videos of objects within a virtual environment such as the Meta Quest or Apple Vision Pro.

What data was used to train Stable Video 3D?

Stable Video Diffusion 3D

(Image credit: StabilityAI)

Training data provenance is a particularly important topic and one that a number of larger AI labs are reluctant to discuss. This includes OpenAI — over whether YouTube videos formed part of the Sora AI video dataset.

StabilityAI has been open about the source of training data for its most recent model, explaining that it is trained on a curated subset of the Objaverse dataset. This is a library of millions of annotated 3D objects used by a number of AI 3D services.

“We selected a carefully curated subset of the Objaverse dataset for the training data, which is available under the CC-BY license,” StabilityAI explained. 

With this license the end user is able to share, adapt or remix the material commercially or non-commercially in any way they like as long as they give credit and link to the licence. 

More from Tom's Guide

Back to MacBook Air
Storage Size
Screen Size
Any Price
Showing 10 of 93 deals
Load more deals
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?