A team of researchers have just published a paper showing 100 percent success in embedding commands in not just audio files, but also in white noise which the speakers would otherwise treat as silence and ignore. The research paper demonstrates the possibility of being able to wake up smart speakers, and once that’s done, make them do anything.
The researchers used their hypothesis on DeepSpeech, a state-of-the-art open source Speech-To-Text engine, that uses Google's TensorFlow project to make the implementation easier. In their demonstration, they managed to issue unheard commands to smart devices like Google Home and Amazon Echo by hiding them in white noise. They were further able to do the same using audio files and even regular pre-recorded speech files. The technique works by essentially only adding a slight distortion to the whole audio, so imperceptible it can’t be picked up by the human ear. However, when the file is heard by a smart speaker, it filters out the distortion and hears the command, prompting it to act.
The researchers have said that they were able to get a 100 percent success rate in their tests, demonstrating a dire need for companies to patch this hole in their products’ security. The technique for creating such audio files is complex for an average person, but well within the means of anyone looking to cause problems. A person could issue not just wake commands, but also make the smart speaker do other things, such as taking over all your smart-bulbs or placing orders for products over Amazon.
Right now, the Google Assistant on our phones will respond only to the voice it has been trained with, however, that is not the case with Alexa, the AI that powers Amazon’s Echo range of speakers. Voice match could be a better way of securing smart speakers, that will ensure they respond only to the voice of select people only.
In case you’re interested in reading further on this, the researchers have put together a web-page with multiple audio files (original and altered) along with an explanation of their technique. They have also made available the code that can be used to generate these attack audio, along with the research paper that was recently published.