Why Apple Could — and Should — Bring Voice Recognition Technology to our Phones

Perhaps no other technology in mobile has been as overhyped yet underwhelming as voice recognition. Long billed as a kind of holy grail of mobile computing, the actual reality is that voice recognition has been awkward, inaccurate and often unusable, resulting in misdialed phone calls and incomprehensible messages. Naturally, consumers have failed to embrace the stuff, but Apple may be about to change that.

How? Apple has reportedly been in discussions to license voice recognition technology, dubbed Dragon, from Nuance, a longtime player in the space. The iPhone maker may integrate Dragon with iOS 5, which will be discussed by Apple execs next week at WWDC in San Francisco. Dragon has drawn acclaim for its performance in a variety of Nuance’s PC apps and is the muscle behind Siri, a highly touted iPhone app and the flagship product behind the startup of the same name that Apple  acquired last year. Apple could make the technology available to developers as a built-in API in iOS 5, handing app creators a valuable new tool. Such a move would not only give voice recognition a much-needed push into the mobile mainstream, it would give Apple the chance to once again transform the way we interact with our phones. Here’s why:

1) Voice recognition technology is finally ready for prime time. Dragon has received rave reviews from mobile users, and Nuance’s FlexT9 dictation app for Android, powered by Dragon, sells for a mere $5 and enjoys a four-star user rating after more than 1,100 reviews. Nuance isn’t the only player drawing attention: Slate’s Farhad Manjoo recently praised Google’s voice recognition technology for Android, saying that it “actually works,” even when you throw a curveball like asking how many angstroms are in a mile. And there is no shortage of compelling use cases, from having a navigation app while driving (when your hands should be on the wheel) to dictating lengthy messages rather than typing them on the shrunken keyboard.

2) Apple knows how to educate the consumer. Voice recognition has come a long way, but it still isn’t all that simple to use. Nuance directs FlexT9 users to a series of online videos to demonstrate how to use the app, for instance, and Manjoo points out that Google’s technology requires users to say the words “period” or “comma” if they want to add punctuation to their messages, which isn’t exactly intuitive. But Apple’s marketing genius lies in showing consumers how to use technology: The first iPhone commercials were a tutorial in how to surf the Web, access email and find nearby businesses on the phone. A similar marketing campaign could illustrate how to do all those things and more by talking, not typing. And that could eliminate the need for users to hunt down online how-to videos just to use the technology.

3) Apple is a master of the user interface. There were plenty of devices with touchscreens before the iPhone came along; Apple’s true innovation was in simplifying the technology with an intuitive user interface to make it easy for users to navigate their phones. The move was so effective that it changed the way many of us interact with our handsets, and we eschewed physical keyboards and trackballs in favor of one-touch navigation and virtual keyboards. Apple could do the same with voice by integrating Dragon closely with iOS, making it easy to send text messages or navigate the Safari browser by speaking. And the legions of iOS developers will surely find innovative new ways to leverage voice in use cases like messaging, gaming, navigation and social networking.

It’s still not clear that Apple will come to terms with Nuance, but a licensing deal would throw the door open for voice recognition technology in mobile. A few years ago, Apple changed the way we use our phones. Now the company might just do it again.

Question of the week

Can Apple and Nuance change the way we interact with our phones?
Relevant Analyst
Colin Gibbs

Colin Gibbs

Founder and Principal Peak Mobile Insights

Do you want to speak with Colin Gibbs about this topic?

Learn More
You must be logged in to post a comment.
7 Comments Subscribers to comment
  1. ricphillips Monday, June 13, 2011

    User interfaces live and die by cognitive affordance. (And not just computer interfaces.0

    With speech recognition there are two ways cognitive affordance is degraded. The obvious being the fidelity of the algorithms. (And for those of us who do not speak with either an American or British accent, mine is Australian, it always seems like we are about two years behind in usability. My voice recognition software has actually trained me to speak with a slight American accent. I now say ‘cah’ and ‘knotty’ instead of ‘car’ and ‘naughty’ for example when using the software.)

    More important, and often unconsidered, is the mental strain of writing-verbally. We actually use our brains differently when writing to how we use them when speaking. It takes a lot of practice and patience to be able to compose long passages of text verbally.

    Speech recognition will be fine for short informal compositions, and extremely useful as a command interface for mobile platforms.

    To take it further, as a real alternative to keyboard entry will take a lot of training on the part of users. It will be more like learning to type or play an instrument. It will require a neurological adjustment that cannot be rushed.

  2. Paul Zagaeski Monday, June 13, 2011

    @Colin: Have you heard anything post-WWDC about VR in iOS5?

    @Mark: VR sounds great for quick composition, do you use it for more elaborate content creation, where you’re not just banging out a quick message?

    1. I haven’t heard anything yet about VR in iOS5, outside of the standard rumors.

      I’ve used dragon dictate, and while it’s a cool party trick, the integration isn’t in-depth enough to allow me to use it without thinking. I still have to do a lot of work before I can send the message (think number of clicks as compared to a regular message). The frustration is correcting VR mistakes, and it has to get to a 4 9’s level of reliability before it reaches mass market adoption.

  3. Mark Kelley Monday, June 13, 2011

    The Android voice recognition is truly phenomenal. I have not typed a text message in a year. I assume the filtering on the microphone is much wider than the 5 Khz standard in in the phone system for year that makes, for example, Google Voice-to-text from your VM laughable.

    If you can type >50 wpm (like many of us) you may believe that laptops will always be the main platform for creating word-based content. I did…and believed that tablet based devices would never supplant the “creation” segment of laptops. The VR on Android is turning me around.

  4. Nuance is legacy. Voice recognition is solved by brute-force statistical techniques and a huge voice corpus. That’s the way Android phones do it, which is why they have pretty amazing voice recognition across lots of apps today.

    (BTW, I’m legacy too, being an old NL guy who parsed text with grammar rules; all Gone With the Wind)

Explore Related Topics

Latest Research

Latest Webinars

Want to conduct your own Webinar?
Learn More

Learn about our services or Contact us: Email / 800-292-3024