Meet Fugatto — an impressive new AI sound model from Nvidia
A sound from text
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Graphics and AI giant NVIDIA has announced a new AI model called Fugatto (short for Foundational Generative Audio Transformer Opus 1). Developed by an international team of researchers. It is being billed as "the world’s most flexible sound machine" taking on ElevenLabs and AI music maker Suno in one hit.
With this model, we’re about to witness a completely new paradigm in how sound and audio are manipulated and transformed by AI. It goes way beyond converting text to speech or producing music from text prompts and delivers some genuinely innovative features we haven’t seen before.
Fugatto isn't currently available to try as it's only a research paper but it is likely it will be made available to one or more Nvidia partners in the future and then we will start to see some significant changes in how sound is developed.
How does Nvidia Fugatto work?
🎵 ✨The world’s most flexible sound machine? With text and audio inputs, this new #generativeAI model, named Fugatto, can create any combination of music, voices, and sounds.🎹 Read more in our blog by @RichardKerris ➡️ https://t.co/AvTAbjn1iJ #NVIDIAResearchNote: Some… pic.twitter.com/0IlYboF9JZNovember 25, 2024
Key to Nvidia Fugatto is its ability to exhibit emergent capabilities, which the team is calling ComposableART. This means it can do things it was not trained to do, by combining different capabilities together in new ways.
The authors of the launch research paper describe how the model can produce a cello that shouts with anger, or a saxophone that barks. It may sound silly but some of the demonstrations seen on the project’s homepage are very impressive.
For example, the ability to instantly convert speech into different accents and emotional intensity, or adding and deleting instruments seamlessly to and from an existing music performance.
We have seen some of this from other models such as OpenAI's Advanced Voice, ElevenLabs SFX model or Google's MusicFX experiment, but not in one model.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
What can Nvidia Fugatto be used for?
One of the most striking examples the team puts forward is the on-the-fly generation of complex sound effects, some of which are completely new or wacky.
Video game developers and those in the movie industry will either be salivating or sweating at the news that almost any kind of soundscape will soon be AI-generated at the touch of a button.
The power of all this technology is delivered via a model that features 2.5 billion parameters and was trained on a huge suite of Nvidia computer processors, as you might expect.
As with many of these early research demonstrations, it’s likely to be a while before we see a fully fledged product released onto the market. Creating a four-second audio clip of a thunderstorm or a mechanical monster is one thing, making it usable in the real world is another.
However, there’s no question that the technology behind this new model shows that an important bridge has been crossed in the ability of the machine to master another art form. It may be the first time we’ve seen AI generational power of this type, but it’s certainly not going to be the last.
More from Tom's Guide
- Apple CarPlay is getting 3 big upgrades with iOS 18 — what you need to know
- Here's how Apple can close the AI gap with Google
- Apple's upcoming iOS 18 AI features may carry a ‘preview’ label — here's why

Nigel Powell is an author, columnist, and consultant with over 30 years of experience in the technology industry. He produced the weekly Don't Panic technology column in the Sunday Times newspaper for 16 years and is the author of the Sunday Times book of Computer Answers, published by Harper Collins. He has been a technology pundit on Sky Television's Global Village program and a regular contributor to BBC Radio Five's Men's Hour.
He has an Honours degree in law (LLB) and a Master's Degree in Business Administration (MBA), and his work has made him an expert in all things software, AI, security, privacy, mobile, and other tech innovations. Nigel currently lives in West London and enjoys spending time meditating and listening to music.










