I tried Stability AI’s new image-to-3D tool — and it creates digital models in seconds

(Image credit: StabilityAI)

StabilityAI, makers of the Stable Diffusion family of AI image models has unveiled a new image-to-3D tool called TripoSR that can quickly turn a picture into an object.

There are a growing number of generative 3D models but what makes TripoSR stand out is the speed it can create a new object, and that it can run on your laptop.

I was able to get the model running on my M2 MacBook Air in about 10 minutes using the Pinokio 1-click installer. It took about a minute to generate an object from a simple image.

Using a cloud version of the AI model other users have been able to have it working inside the Apple Vision Pro to generate a 3D object from a photo and load it as an interactive object without taking off the headset.

How does TripoSR work?

See more

TripoSR is the result of a partnership between StabilityAI and Tripo AI, an AI-powered 3D modelling startup from VAST AI Research.

The tool allows you to take any image, remove the background and convert it into a fully rendered 3D object that you can interact with.

The image serves as the basis for the 3D reconstruction. It runs through a pre-trained encoder to convert it into vectors with global and local features of the image. 

They have the information required to then generate a 3D object. It doesn't need any additional input such as camera parameters or its position as TripoSR has been trained to "guess" this information during its training. 

This is why it's so fast at generation, although it's also why the reverse of the generated model sometimes lacks detail.

How well does TripoSR work?

The models are fun and reasonably high resolution, although my tests struggled with the rear view of a model, often rendering it blank. However, the most impressive development is the speed of generation.

It generates an obj file on my Mac in anything from 30 seconds to a minute and apparently will create a file from an image in half a second on a machine running an NVIDIA H100 Tensor Core GPU.

The objects are interactive and if you select the right starting picture it does a better job of turning it into a 3D object than some other tools, including those that take a full 3D lidar scan using a phone.

What are the use cases?


(Image credit: StabilityAI)

This near real-time generation of a single object could lead to genuine virtual world creation on the fly, creating games that change as the user interacts.

If realized inside a virtual world environment like the Apple Vision Pro, users could generate new artwork or objects to populate their view, or even take a real world object and turn it into a virtual one you can interact with while in full VR.

For now its main use will be in creating virtual art that can be imported into Blender, Unity or Unreal Engine for use in game of virtual scene development.

More from Tom's Guide

Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?