I put ElevenLabs new AI sound effects generator to the test — its great for AI video

ElevenLabs logo on phone sitting on top of keyboard
(Image credit: Shutterstock)

Leading AI audio company ElevenLabs has released its latest product — generative sound effects — after months of beta testing. It lets you create any noise from a simple text prompt.

I’ve been using it myself for the past couple of months and it has got progressively better, creating incredibly accurate sounds and if pushed, even background music.

The company describes it as “turning imagination into audio”, suggesting it can make sounds for everything from diving into water, coins jingling and horses galloping — and do so in seconds.

In the beta preview users were given four samples based on the text prompt, each a unique interpretation by the AI model. In the final release that has been increased to six.

How do AI sound effects work?

ElevenLabs sound effects engine is like any generative AI model, a lot of training data and compute time to teach it how to interpret and convert text to sound.

This builds on a growing collection of sound-based models from ElevenLabs including its very realistic voice engine that can clone and replicate anyone's voice in minutes.

Mati Staniszewski, CEO & co-founder at ElevenLabs, said in a statement: “Over the last year, we’ve revolutionized AI Voices by producing the first truly emotive, human-like text-to-speech platform. With the launch of text-to-sound effects, we’re marking another major step forward, one that will equip creators with more audio tools to help them produce high quality content.“

They are also working on a new music model, so in future you’ll be able to give it a video and have ElevenLabs add dialogue, sound effects and a soundtrack for an immersive experience.

The data for the new sound effects model came about as part of a new partnership with Shutterstock. This allowed ElevenLabs to use the Shutterstock audio library of licensed tracks to fine-tune and train the model as the Shutterstock library had accurate labelling.

Testing ElevenLabs Sound Effects

To test ElevenLab's sound effects capabilities, I came up with 7 videos with obvious — and not-so-obvious — sounds and generated the audio for each. Videos were created using Pika Labs from a simple one paragraph text prompt.

I gave the same prompt to Pika Labs and ElevenLabs Sound Effects. This worked on all of the prompts to create a more full sound apart from the fire. For that I had to simplify to just “a roaring fireplace” as it kept getting confused and creating overly complex sound scapes.

1. Roaring Fire

The prompt: "A cosy, roaring fireplace in a rustic log cabin, with flames dancing vividly, logs crackling, and sparks flying up the chimney. The warm glow illuminates the wooden interior, casting flickering shadows."

2. Heavy Traffic

The prompt: "A bustling city street filled with heavy traffic, cars honking, and buses passing by. Skyscrapers loom in the background, neon signs flicker, and people rush along the sidewalks. The scene captures the hectic and noisy urban atmosphere."

3. Ocean Waves

The prompt: "A serene beach at sunset with gentle ocean waves crashing against the shore. The sky is painted in hues of orange and pink, with seagulls flying overhead. The sound of the waves creates a soothing ambiance."

4. Thunderstorm

The prompt: "A dramatic thunderstorm over an open field, with dark clouds, flashes of lightning, and heavy rain pouring down. The thunder roars, and the wind blows fiercely, bending the trees and creating a sense of raw power."

5. Crowded Market

The prompt: "A bustling outdoor market in a vibrant city, with stalls selling fresh produce, spices, and handcrafted goods. Vendors call out to customers, people haggle over prices, and the sounds of chatter and laughter fill the air. The market is colourful, lively, and full of energy."

6. Construction Site

The prompt: "A busy construction site with cranes, bulldozers, and workers in hard hats. The sounds of hammering, drilling, and machinery fill the air as a new building takes shape. Dust rises from the ground, and the scene is a hive of activity."

7. Rock Concert

The prompt: "A rock concert in full swing, with a band playing on stage under colorful lights. The crowd is cheering, clapping, and singing along. The lead singer's voice soars above the powerful music, and the electric guitars and drums create a thrilling and energetic vibe."

More from Tom's Guide

Back to MacBook Air
Storage Size
Screen Size
Storage Type
Any Price
Showing 10 of 100 deals
Load more deals
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?