Skip to main content

Google's Latest AI Trick Is Picking Voices Out of Crowds

As smart speakers and voice-based AI assistants rise in popularity, these tools need to get smarter at knowing when you're asking for something, as opposed to, say, when someone on your TV uses a trigger word.

Fortunately, it looks like Google's got a solution.

According to a new research paper, which has video evidence to back its claims up, a team of Google researchers have built a deep learning system that can identify and single out individual's voices. And just like when you're sitting at a big table, or out on the town, it looks at faces to figure it out.

MORE: Google's New Smart Displays One-Up Echo Show

How does it work? The system was built to identify a speaker talking out loud, matching their face to their sounds. To amp up the difficulty, Google piped in the audio from virtual crowds of people, to teach the AI how to distinguish the voices it heard against the deluge of noise around it.

A demonstration video showing off this technology focuses on comedians Jon Dore and Rory Scovel, who are talking at the same time. In the clip, pink and blue boxes overlay over each of their heads, and then the soundwaves on the bottom of the screen gain those same pink and blue hues, to show how their faces have been matched to their voices.

Then, the slider bar on the bottom of the screen moves horizontally, between labels marked All, John, and Rory. The sound fluctuates along with it, allowing you to hear both at once, and then only hearing one and muting out the rest.

Where and how Google implements this in its product line remains to be seen, but its Hangouts chat client and YouTube videos seem like ideal places to test it out. Further, if you added a camera to a Google Home speaker, the device could do a much better job of knowing who's talking and delivering personalized results.

Google is rumored to be working on its own Smart Display, and there are others coming out this summer from Lenovo and others with built-in front cameras.

Henry T. Casey

Henry is an editor writer at Tom’s Guide covering streaming media, laptops and Apple. Prior to joining Tom's Guide — where he's the self-described Rare Oreo Expert — he reviewed software and hardware for TechRadar Pro, and interviewed artists for Patek Philippe International Magazine. You can find him at your local pro wrestling events, and looking for the headphone adapter that he unplugged from his iPhone.