StabilityAI drops Stable Audio 2.0 — here’s everything that’s new

Stable Audio
(Image credit: StabilityAI)

StabilityAI has unveiled the second iteration of its artificial intelligence music generation tool, offering longer tracks, audio-to-audio support, and a greater commitment to protecting the copyright of creators.

Stable Audio 2.0 allows users to create three-minute tracks at 44.1 kHz stereo by inputting a natural language processing prompt such as “A beautiful piano arpeggio grows to a full beautiful orchestral piece”, “Lo-fi funk” or “drum solo”. The AI-generated tracks include structured compositions like an intro, development, outro, and stereo sound effects.

Another new feature offered by Stable Audio 2.0 includes the ability to generate “fully produced samples” by uploading an audio file to the platform, evolving from solely a text-to-audio tool. For example, mimicking a drum sound with your voice would prompt the app to create an audio clip of a drum playing.

When using the new audio-to-audio feature, users must refrain from uploading copyrighted material under StabillityAI’s terms of conditions. It uses content recognition technology to ensure compliance with this policy and preventing any copyright infringement. 

As with Stable Audio 1.0, the second model is also trained on AudioSparx’s vast audio file library of 800,000 music, sound effects, single-instrument stems, and text-based metadata. For AudioSparx musicians unhappy with the idea of their works being used for AI model training, they had the opportunity to opt out.

These reinforced copyright infringement and creator opt-out policies follow the recent departure of former VP of audio, Ed Newton-Rex. He announced his resignation in November 2023 with an X post that heavily criticized the company’s approach to upholding creator’s rights.

“I’ve resigned from my role leading the Audio team at StabilityAI, because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’,” he wrote.

He concluded his post by urging creators to voice their concerns to ensure tech companies “realise that exploiting creators can’t be the long-term solution in generative AI.”

Under the hood

Stable Audio

(Image credit: StabilityAI)

In addition to longer tracks and audio-to-audio support, Stable Audio 2.0 sports a beefed-up architecture that facilitates the “generation of full tracks with coherent structures.” Adapting every component of the system has resulted in “improved performance over long time scale,” they claimed. 

The tool features a new type of compressed autoencoder that creates shorter audio representations by compressing raw audio waveforms. Meanwhile, a diffusion transformer - similar to the one that powers  Stable Diffusion 3 - can manipulate  longer sequence data. 

“The combination of these two elements results in a model capable of recognizing and reproducing the large-scale structures that are essential for high-quality musical compositions,” wrote Stability AI in a blog post. 

The tool is free to use and available immediately. 

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 96 deals
Filters
Arrow
Load more deals

Nicholas Fearn is a freelance technology journalist and copywriter from the Welsh valleys. His work has appeared in publications such as the FT, the Independent, the Daily Telegraph, The Next Web, T3, Android Central, Computer Weekly, and many others. He also happens to be a diehard Mariah Carey fan!