Skype will soon get real-time speech translation based on deep learning

Microsoft(s msft) will by the end of 2014 start offering on-the-fly language translation within Skype, firstly in a Windows 8 beta app and then hopefully as a full commercial product within the coming two and a half years.

A couple years back, Microsoft and the University of Toronto demonstrated the rough ability to have someone speak into a microphone in English and find their words translated into spoken Mandarin. Microsoft’s researchers claimed low error rates through the use of deep neural networks — basically, artificial “brains” that have some capacity to learn features of voice, text and image data. IBM(s ibm)’s Watson also has deep learning in its arsenal of artificial intelligence techniques, and Google(s goog) recently paid $400 million for a British company called DeepMind that plays in the same area.

Now Microsoft’s research is about to pay off in the form of Skype Translate. At Code Conference 2014 on Tuesday, CEO Satya Nadella and Gurdeep Pall, the head of Microsoft’s Skype and Lync division, showed off a similar technique embedded in the popular videoconferencing service.

English-speaking Pall held a conversation with a German colleague speaking in her native tongue — as a speaker finished a sentence, Skype would then read out the translation in the other speaker’s language. It wasn’t perfect and, unlike in the 2012 demo, the translation wasn’t read out in the speaker’s own voice, but it was certainly accurate enough to be useful.

Nadella expressed slight bemusement at certain capabilities of the new technology, particularly its capacity for “transfer learning”:

“You teach in English, it learns English. Then you teach it Mandarin — it learns Mandarin, but it becomes better at English. And then you teach it Spanish and it gets good at Spanish, but it gets great at both Mandarin and English. And quite frankly none of us know exactly why.”

In a Microsoft Research blog post accompanying Nadella’s announcement, the company lays out a timeline of major advances in speech recognition and machine translation. Among the major points were the advent of deep learning in 2006 (thanks to the work of University of Toronto professor and Google distinguished researcher Geoffrey Hinton) and Microsoft’s adoption of the technology in 2009. Other companies actively pursuing deep learning research include Facebook and Baidu.

I can’t help but be reminded of Google no longer understanding how its systems are learning to identify objects in photos so accurately — the technology is hugely impressive and, in developing a mind of its own, kind of disturbing.

There’s no word yet on whether Microsoft wants to charge for this feature or not. Either way, though, it would prove very useful in both consumer-focused Skype and in Lync, Skype’s business-focused equivalent. The core technology already powers Bing Translator and the speech recognition capabilities of the Cortana personal assistant in Windows Phone 8.1, so Microsoft’s deep neural network should have plenty to learn from.