Noises barely intelligible to humans could use voice assistants like Siri, Cortana or Google Now to hijack smartphones, said a team of researchers from Georgetown University and the University of California, Berkeley.
The researchers' findings were illustrated in a demonstration video that shows a garbled, demonic-sounding voice forcing an Android smartphone to perform commands. Only after several iterations does it become clear that the raspy noise is actually saying "OK Google."
Audio clips posted by the research team at HiddenVoiceCommands.com are even less intelligible and sound like static, but can also trigger Google Now. The researchers argue that background noise in online videos could be used to direct smartphones to malicious websites without user interaction.
"A possible scenario could be that a million people watch a kitten video, and 10,000 of them have their phones nearby and 5,000 of those phones obey the attacker's voice commands and load a URL with malware on it," said Micah Sherr, a Georgetown computer-science professor, in a university press statement last week. "Then you have 5,000 smartphones under an attacker's control."
The audio clips are both chilling and hilarious. One set features a "devil voice" that can barely be understood and sounds like the backward incantations purported to appear on old Led Zeppelin records. You can scarcely grasp that the voice says "what is my current location?" The other set is completely unintelligible, but includes commands such as, "OK Google, take a picture."
Full details of the team's work are in a research paper posted online.
We tried to replicate the results by playing back some of the audio clips on a laptop while holding up an Android phone with Google Now in the foreground. None of the clips could get the phone to do anything, though perhaps our laptop's speakers weren't loud enough.
By contrast, our own voice could get Google Now to at least navigate to the proper apps and settings, although we hadn't set up the personal assistant to execute any commands.
Sherr and the rest of the Georgetown-Berkeley team thought of some defenses against hidden-voice attacks, including playing a tone when a voice command was recognized and training voice assistants to filter out machine-generated speech.
We suggest instead setting up Siri and Google Now to not respond to commands unless their respective apps are in the foreground, and to put them both behind a screen lock. You might also want to wear headphones while watching YouTube clips.