At the end of the previous millenium, speech recognition was seen as the holy grail. In my country there was even a Flanders Language Valley, sort of a Silicon Valley for tech firms focused on language, the most (in)famous one being Lernout & Hauspie.
These days speech recognition is no longer in the spotlight but Microsoft just made an interesting announcement as the company claims to have achieved human parity. The software giant's new speech recognition technology reportedly has a word error rate of just 5.9 percent, which is said to be about equal to what's achieved by professional transcriptionists.
The software giant notes this is the lowest ever recorded word error rate against the industry standard Switchboard speech recognition task. It's 0.4 percent better than what the company reported in September, and 1 percent better than the score IBM's Watson registered in April.
Microsoft says the achievement is the culmination of over twenty years of effort, and was made possible by using the latest deep neural network technology.
Geoffrey Zweig, who is in charge of the Speech & Dialog research group, remarks the next big step is to move from recognition to understanding:
Moving forward, Zweig said the researchers are working on ways to make sure that speech recognition works well in more real-life settings. That includes places where there is a lot of background noise, such as at a party or while driving on the highway. They’ll also focus on better ways to help the technology assign names to individual speakers when multiple people are talking, and on making sure that it works well with a wide variety of voices, regardless of age, accent or ability.
In the longer term, researchers will focus on ways to teach computers not just to transcribe the acoustic signals that come out of people’s mouths, but instead to understand the words they are saying. That would give the technology the ability to answer questions or take action based on what they are told.
“The next frontier is to move from recognition to understanding,” Zweig said.
Microsoft will utilize the technology for its Cortana personal assisant, as well as for speech-to-text applications.