Microsoft banks on speech to help WP7 catch up

Microsoft’s Windows Phone 7 (s msft) platform, if you haven’t noticed, hasn’t really been a player in the smartphone races yet, registering an estimated 1.5 million in unit shipped in the second quarter. But Microsoft’s leaders have high hopes for the upcoming Mango software update and in particular, are looking to the improved speech technology in the update to help WP7 claw back into the game.

I sat down with Ilya Bukshteyn, senior director of speech marketing, sales and solutions for Microsoft and he said that while Apple (s aapl) helped define the modern user interface of smartphones with the iPhone, expectations are shifting to an interface that moves beyond touch to deeply integrated speech. And that, he said, favors Microsoft, which had integrated speech into WP7 at its launch and is upping the ante with Mango improvements including voice-to-text messaging, which allows people to hear text messages read aloud to them and respond back with a dictated text message, all without pressing a button. Microsoft will also enable map navigations, application launches or turning on the speakerphone with voice commands and is adding support for more languages.

“If you look at the success of the iPhone, it was about the usability it brought to the screen and touch experience and that resonated,” said Bukshteyn. “We’re now hitting another inflection point where speech is going to matter just as much. Moving forward, having some sort of speech natural user interface does become something users expect. And how well you do it will determine what sells and what doesn’t.”

Now Apple appears poised to announce new speech technology integration in iOS 5. 9to5Mac reported recently that iOS 5 contains a text-to-speech system powered by Nuance(s nuan), which will allow users to switch from keyboard to speech input in any text field. That, along with other learnings Apple picked up from its purchase of personal assistant app Siri, could help Apple catch up quickly on the speech front compared to Windows Phone 7 and Android(s goog), which also has some impressive speech capabilities embedded.

But Bukshteyn said it’s not just about building speech into iOS, it’s about the intelligence behind it. He said Microsoft’s cloud-based approach to speech is able to learn from all billions of utterances Microsoft gathers from Xbox Kinect, its Bing Mobile apps, Windows Phone 7 and its other productivity software, to constantly improve the performance of the overall speech product, so the system as a whole is continually learning. The volume and diversity of utterances helps train Tellme and creates a product that learns over time. He said Apple will also be behind in perfecting and implementing speech, something that takes years to tune.

“Apple doesn’t have the IP, the patents; they can gain that over time but they’re starting at a different point,” Bukshteyn said. “Speech is something we’ve been investing in for 10 years and the stuff we’re seeing now was planted a decade ago.”

Bukshteyn is quick to point out that smartphone success involves a lot of factors, not just speech. But he believes that as smartphones start to move down into the masses, the need for elegant interfaces with speech will only grow. He said designs that are intuitive and not intimidating will be what wins over new smartphone users.

We still have to see what Apple does with its Nuance integration, which it has to yet to announce. And as my colleague Darrell wrote, it could be a moment for Apple to leap frog its competitors. We know that Google is only going to continue to push speech into its Android platform.

But Microsoft’s heavy emphasis on speech shows that the technology will be, if not a decider in the smartphone race, a key component that all modern smartphone platforms will need to compete. As Vlad Sejnoha, CTO of Nuance told me yesterday, speech technology is really becoming mobile technology and will grow more powerful as it gets further embedded into smartphones.

It’s not surprising that Microsoft would tout its speech technology. It’s one area in which it’s trying to leverage its assets to help WP7 catch up. I’m skeptical that speech will be a major catalyst for WP7 all by itself but it is a differentiator for now and another way to show how polished the platform is becoming.

And I do think that speech will become more of a popular tool over time. Currently, Google and Microsoft say about 25 percent of their mobile searches are conducted by voice. Bukshteyn said it’s actually higher than that for Microsoft when you look at just Windows Phone 7 devices. With the way people are learning to talk to their devices, including powerful examples like Xbox Kinect, it’s seems like a good bet for Microsoft to push speech. But if all the platforms start to emphasize the technology, as we’re seeing more and more, WP7 may enjoy a temporary boost but it will need more than that to really close the gap on iOS and Android.