Microsoft creates speech recognition system with human-level accuracy

By Deepak Bhadana | Published on Oct 20 2016
Microsoft creates speech recognition system with human-level accuracy
HIGHLIGHTS

The software registered a word error rate of 5.9 on the industry standard Switchboard test

Apple iPhone XR 64GB at Lowest Price Ever

6.1" display | 50% Faster Graphics performance | TrueDepth camera

Click here to know more

In an unprecedented breakthrough a team of researchers and engineers at Microsoft Artificial Intelligence and Research reported that they have created a technology that can recognise words from a conversation just as well as an average human. The team added that that the speech recognition system makes the same number of errors as a human transcriptionist.

"We've reached human parity. This is an historic achievement," Xuedong Huang, Microsoft's chief speech scientist stated in a blog post.

According to a paper published on Monday, October 17 the researchers reported a word error rate (WER) of 5.9 percent against 6.3 percent reported last month. The researchers tested the speech recognition system on the “Switchboard” speech recognition system.

Switchboard is a collection of recorded phone conversations in English, Spanish, and Mandarin, first released by the National Institute of Standards and Technology (NIST) USA, in the early 90s. It has now become the industry standard speech recognition test and companies such as IBM, Google, and Microsoft have used the Switchboard test to test the accuracy of their speech recognition software. “This accomplishment is the culmination of over 20 years of effort,” said Geoffrey Zweig, who manages the Speech & Dialog research group.

The implications of this new development are manifold. The speech recognition tech can augment consumer entertainment devices such as the Xbox and accessibility tools like the instant speech-to-text transcription and voice assistants like Microsoft Cortana. It can also be used to help people suffering from speech-related issues.

Now the team is looking at ways to ensure that the speech recognition system works just as well in a real world setting, including places where there is a lot of background noise, such as a concert or an echoing room. The team will also try to develop software that will not just recognise words but also understand them. "The next frontier is to move from recognition to understanding," stated Zweig.

logo
Deepak Bhadana

Digit caters to the largest community of tech buyers, users and enthusiasts in India. The all new Digit in continues the legacy of Thinkdigit.com as one of the largest portals in India committed to technology users and buyers. Digit is also one of the most trusted names when it comes to technology reviews and buying advice and is home to the Digit Test Lab, India's most proficient center for testing and reviewing technology products.

We are about leadership-the 9.9 kind! Building a leading media company out of India.And,grooming new leaders for this promising industry.