Voice commands will soon replace buttons, thanks to digital assistants

Talking to your gadgets has never been so natural

By Arnab Mukherjee Published Date
30 - Aug - 2016
| Last Updated
09 - Sep - 2016
Voice commands will soon replace buttons, thanks to digital assis...

There was a time when talking to a machine in public might have resulted in you being declared insane and taken to an asylum. From that point in time, we have reached somewhere where it is totally acceptable to have long conversations with a witty virtual assistant on your phone while completely ignoring the people around you. Almost every major software platform now has voice recognition in one form or another, be it virtual assistants or mere dictation. We have achieved a lot on how we speak to our smart gadgets now. But are we really there yet?

We’re talking more. But Why?

Before we really delve into the quality of digital assistants and voice recognition on different platforms, we need to stop and take stock of what really facilitates the current state of this technology. At a broad overview level, everything from better audio formats to higher quality microphones have helped in making digital voice interaction a valid and acceptable option. But the technology wasn’t really waiting for those parts.

When it comes to the hardware side, the growth of voice assistants on phones as a platform is quite predictable, considering that we have always been looking for ways to replace the hard to use phone keyboards, trying everything from larger screens to new ways for typing. Now, for voice recognition to truly evolve on phones, there was a need for processing power. Google started with the voice search app, with the concept of offloading most of the processing to it’s servers. Gradually, smartphone processors gained power exponentially and now we have phones that can take care of the processing locally as well, especially for basic, repetitive tasks.

Although hardware improvements have been crucial, the improvements on the software side have been way more significant. The field of artificial intelligence has grown in leaps and bounds in recent years. Rather than exactly catching the sound and transcribing it, modern day virtual assistant AI works on a predictive model where it checks the probability of a certain sound being a certain word to guess what you are saying. And newer abilities are being added every day where the AI can do things like understanding your mood and learning your preferences.

Although the innovations mentioned above are quite significant, let us not forget that necessity is the mother of invention. In the past years, the devices that can handle voice interaction have not only been limited to phones and computers. A lot more categories have been included into “smart” devices, including smartwatches, smart homes and smart cars. Each of these platforms have different configurations, and hence need a system complicated yet reliable enough to function optimally with them.

With a combination of all these factors, it is quite evident that voice assistants have risen to their present level of efficiency. Let us have a deeper look at each of these platforms.

Apple - Siri

Since its introduction in October 14, 2011 on the iPhone 4S, Siri has virtually been integrated into the entire Apple ecosystem, except OS X. Let us accept it, even if Google got to voice recognition first, it was this witty talking assistant on the iPhone that suddenly made voice assistants popular. So much so, that Siri now has a pop-culture significance with respect to virtual assistants, enough to be featured in a Marvel movie!

Setting up Siri is quite easy, where it asks you to speak a couple of statements and also to set the voice gender, location and your nickname. Once that’s done you are good to go. Although Siri has a lot of interesting features, we could not test a number of them as they were unsupported in India. Keeping that aside, Siri has a really high accuracy when it comes to detecting english and almost never mistakens what you said, even in a moderately noisy environment.

Considering the richness of the conversation, Siri does give natural replies especially for common queries like the ones about weather. But where Siri currently stumbles is understanding the context of two queries following each other, which is one more thing Google Now excels in, even up to more than five queries. Only rarely were we able to get Siri to understand a reply that referred to something from the previous reply and they were mostly from list results.

If we were to look at the features showcased at WWDC, where Apple announced the launch of Siri on macOS Sierra, the successor to OS X, it definitely looks like Siri is set to be a lot more capable than it is currently. The queries that were directed at it were quite complicated and contextual in nature and Siri did not falter with any of those. If only Apple launched its features at the same time in India.

Android - Google Now

In the group of virtual assistants that are getting better and better at having conversations with you, Google’s combination of Google Now and “Ok Google” is comparatively silent. No, it doesn’t do that to contemplate your personality better! As of now, Google’s virtual assistant relies more on Google’s own search power to solve a host of queries. There are a number of ways you can access this depending on the version of Android your phone is running. Lollipop detects “Ok Google” from any screen and launches the search app, unless you explicitly turn this feature off. We should inform you that keeping this enabled has reportedly caused some OEM phones to have microphone problems. Considering Marshmallow, the latest version until N goes out of beta, “Google Now on Tap” is another feature that allows the launching of Google Now over any screen, giving relevant information pertaining to whatever is displayed on the screen at that moment. For example, if you are having a conversation with a friend about going to a movie, launching Now on Tap will show you the showtimes of the movie from theatres around you, the movie’s rating on IMDB, and allow you to set a reminder to buy the tickets, all within the cards that pop-up when you long-tap the home button.

Clicking on the various app icons (Facebook, Twitter, Youtube etc) will launch the particular app (if installed) or open the relevant webpage with relevant content (the Facebook page or Twitter account of the movie, the YouTube trailer etc). Although this isn’t strictly voice, it is definitely one of the best ways a virtual assistant can be used and activating this through voice (“Ok Google” anyone?) is just one step away for the prolific developers at Mountain View.

Google Now supports speech recognition in multiple languages including some regional Indian languages, but Now on tap only supports English, Japanese, German, Spanish, Italian, French, Korean, Portuguese, or Russian. “Ok Google” has one of the best regional language detections and offers really intuitive features without you having to explicitly enable or configure them. If only it could tell you a joke!

Google hasn’t yet announced any services that will be integrated with Google Home

Special Note: Google Assistant, announced at Google I/O 2016 aims to be the spiritual successor to Google Now, taking Google’s AI conversational. Two products have been announced with Google Assistant support - Google Home, an Amazon Echo inspired home assistant speaker from Google and Allo, a whatsapp like messaging app that will combine bots and voice commands.

Windows - Cortana

Due to the persistent comparisons between the three, it might be hard to accept that Cortana has been around for just two years and not more. Microsoft launched Cortana at the Microsoft BUILD Developer conference in April 2014, and if you really don’t know the inspiration behind the name yet, you need to reassess your gaming creds. The name was inspired by a synthetic intelligence character in Microsoft’s Halo franchise and the virtual assistant has been voiced by the same voice actor for the US-English specific version.

Microsoft has integrated Cortana into its entire ecosystem, including Windows 10, Windows 10 Mobile, Windows Phone 8.1, Microsoft Band, Xbox One and even launched apps for iOS and Android (although the Android version is in early access right now for India). Cortana has the usual features like its competitors do and can set reminders, show search results, read out the weather and more, all in natural language queries. Apart from this, you have access to Notebook, where you can specify your interests, and even remove those that Cortana detects on her own. It can also launch specific apps when asked and integrates into services like Foursquare to provide you information. Cortana also lets you set contact specific reminders which pop-up when you’re communicating with the contact or depending on location as well. And if the thought of a virtual assistant constantly listening to you does feel slightly unnerving to you, then you can set do not disturb hours.

On PC, Cortana also integrates with Microsoft Edge to provide you features like Restaurant opening timings, reservations etc., on a restaurant website, coupons on a retail website etc. It has also been included into Skype as a bot to order food, provide info, transcribe videos and schedule appointments. On the Windows Mobile app, there is a constant effort from Cortana’s side to keep the interactions within the app. For example, when we asked Cortana to book a cab, it showed us ten cab providers around us within the Cortana app itself, allowing us to ask Cortana to call one based on their position on the results.

Halo-inspired Cortana is relatively new into the voice assistant category

On mobile, Cortana does indeed keep Microsoft’s reputation consistent with the discrepancy in performance and setup process between its Android app (in early preview) and the Windows Mobile app. While the Android preview for India did not need any language configuration and automatically started with an Indian english accent and even told us Bollywood inspired jokes in Hindi, although it refused to understand Hindi itself. The Windows Mobile app, on the other hand, wouldn’t launch with language set to English (India) and it had to be set to English (United Kingdom).

Cortana is expected to get a slew of new features with certain builds of the Windows 10 Anniversary update, and we can fairly say that the grounds are heating up.

Alternate platforms (Cars, Console)

Although the major focus of developing virtual assistants with voice recognition capabilities has been on smartphone developers, other product categories are also heating up when it comes to incorporating speech as an interface. One such category is connected cars, or more appropriately, smart cars. Even moderate range luxury cars had certain voice operated features since quite a while and the same goes for Bluetooth connectivity to your phone. Now, major software giants and car manufacturers are collaborating to create and incorporate operating systems for smart cars that enable them to be much closer to the virtual assistants that we are implementing on our phones. Android Auto, which works by connecting an Android phone to a compatible car, allowing you to use Google Now in your car. Apple wasn’t far behind and had launched Apple Carplay with iOS 9 that essentially gives you the same capabilities with your iPhone. Both platforms have been picked up by a large number of manufacturers and soon, you will be talking to more than your co-passengers in your car.

Microsoft has also announced that Cortana is coming to Xbox. This opens up a lot more doors for voice interaction. Maybe you can pause your next Halo game just by asking Cortana to do so. This is in line with Microsoft’s goal to unify the Xbox and the PC. And since Microsoft is doing it, we can expect Sony to catch up pretty soon.

Amazon’s Echo with Alexa will soon understand your emotional state as well

A smart home wouldn’t really be smart if each time you had to interact with it. Or you would have to deal with a mind boggling array of switches on a highly complicated panel, would it? That is exactly what makes home automation the perfect use-case for voice interfaces. Hence it is no surprise that Amazon’s online store is actually running out of Echo. Its smart-speaker can talk to you, a number of apps and services and any smart device that it is connected to, thanks to Alexa, the onboard virtual assistant. Even Google has jumped into the fray and announced Google Home, which sounds quite similar to Amazon’s Echo. It is yet to be seen what Google’s expertise with Google Now does to this product.

Beyond platform restrictions

Microsoft might have launched Cortana for Android and iOS, but there is no doubt that it works best on Windows. Apart from that, neither Apple nor Google have truly launched their virtual assistants beyond their own ecosystem. So if you are looking for a platform agnostic virtual assistant to talk to, you might have to look harder. The Google Play store might have a few interesting options. We tried Assistant and it definitely has a sense of humour. Even though it is quite capable, the robotic voice and the multiple redirect to its internal browser with search results was quite dull, compared to the contextual results we get from the standard voice assistants. 

Dag Kittlaus and Adam Chayer, creators of the AI behind Siri, have recently shown off their newest creation. Viv, a platform agnostic virtual assistant is to be launched with certain third-parties towards the end of the year. It’s more similar to Amazon’s Alexa and Facebook’s Messenger bots than Siri or Google Now in its integration with third party services. Their demonstration at the TechCrunch Disrupt NY in May this year showed Viv being able to handle fairly complex queries both in terms of breadth of AI and depth speciality. Due to their strength in AI and lack of a software giant’s pressure behind them, they hope to get a large number of third party vendors on board and make Viv something as ubiquitous as Bluetooth or Location tracking currently.


With all these developments, and the opening up of Siri to third-party developers, we can only predict that speech as an interface is headed towards widespread usage. Because once it gets easy to talk to your devices from across the room, would you really want to play with buttons anymore? Do write in to us and let us know.