After solely using a Google Android phone for nearly two years, I recently added an Apple iPhone 4S to my stable of smartphones. Not because of the new dual-core A5 processor, advanced 8 megapixel camera sensor or retooled antenna design, however. What sold me on Apple’s latest handset wasn’t hardware at all: It was the promise of an “invisible interface” through Siri, the iPhone’s personal assistant software. Siri is arguably the first working example of how everyday people will interact with connected devices in the near future. The ability to speak to our phones, televisions or homes and have them respond or take action is no longer a far-fetched concept.
What Siri is
Siri started out as a third-party application, launching on the iTunes App Store in 2010. The software was demonstrated a year prior, showing that the promise of natural language speech recognition could be added to mobile devices. Other applications and mobile platforms have used voice input to control certain functions, but Siri removed a key barrier toward the adoption of voice control: Instead of requiring users to memorize specific commands, the software uses everyday conversational language to understand context.
That means the same question or command can be asked different ways, making the software far more versatile. While I could use a traditional voice command to open a weather application, it’s easy to say, “Siri, show me the weather,” and the software will display a six-day forecast. I can pose this as a question, however, and gain a specific answer. Asking “Siri, will I need an umbrella on Wednesday?” will check the forecast for me; without my looking at my phone, the software will answer the question. But that’s only part of the benefit Siri brings to mobile interfaces.
Siri’s developers took the software a step further by integrated the ability for the program to use semantic context on a limited basis. Siri knows my wife’s name, for example, because it has access to the contact records on my phone. So I can ask, “Where is my wife?” and Siri will use the Find My Friends app to locate her. What if her phone is off and the app can’t find her? I can tell Siri “Send her a text. ‘Where are you?’” and Siri will do it. The software knows that I’m still talking about my wife, so when I say “her,” Siri understands that; I don’t need to specify any further as long as I maintain this conversation. Siri can even learn about you and your family to better understand you.
That’s just an example of the natural language processing, but it’s a powerful concept. Combining contextual understanding of voice input with data on a mobile device, Siri is more useful and more “intelligent” than traditional voice commands. Microsoft first introduced Voice Commands for PDAs back in 2003 and later used it for Pocket PC Phone devices, so clearly Siri isn’t the first solution. But Microsoft Voice Commands, followed by a number of efforts from Nuance, Vlingo, reQall and others, are generally one-dimensional by comparison, because they simply replace a touch menu with a voice menu. (It’s worth noting that Siri uses Nuance on Apple’s servers for voice recognition; Nuance has part of the puzzle solved, but the Siri engineers took it a big step forward.)
What Siri isn’t
Based on my limited description, Siri may sound like it uses a type of artificial intelligence (AI). Indeed, when I first used Siri, I thought exactly that. But the reality is that Siri is just a step forward for voice recognition and input. My colleague Colin Gibbs thinks that Siri is more evolutionary than revolutionary, perhaps because Siri isn’t truly an “intelligent” system by definition: There’s no AI involved, at least not yet. Jon Pielak, the Studio Director at Vectorform and the lead developer for Siri, explains:
With Natural language processing in the mix it feels more human, like it understands you. This in itself adds to the mystique surrounding Artificial Intelligence. But real AI can’t fit on a phone in our world . . . yet.
Siri is basically a contextual, semantic, personalized search engine. We affectionately called it a “Do” engine. A search engine can evaluate text strings and look for matching results. A “Do” engine maintains awareness of the user and everything it knows about that user and processes strings in the context of the user.
The “do” aspect that Pielak mentions is what makes the service appear magical. Early and even recent efforts to integrate voice interaction with computers has meant a single threaded, command-by-command experience. Examples are “Open Music Player” and then “Play my ‘long run’ playlist.” But that’s not how humans interact with one another.
We “do” by having conversations with others (or ourselves, for that matter), making evaluations and then taking action. I can tell Siri, “I feel like listening to some music,” and it will begin playing my iTunes library. A few seconds later, if the music doesn’t have me jazzed, I can say, “Can you just play New Age?” and Siri immediately takes the appropriate action like an invisible but helpful butler.
So clearly, then, Siri can handle evaluation and action, but the service shows early signs of understanding context as well. Even so, Siri is still limited: It requires a connection back to Apple’s servers, and it has a limited set of applications that it can interact with. You can’t tell Siri to shut off your iPhone’s Wi-Fi radio, for example, but you can use it with Reminders, Contacts, Calendar, messaging and other core apps.
Siri also isn’t proactive: It won’t cross-check daily travels on your calendar with the weather and remind you to bring an umbrella, for example. You’d have to ask Siri if you need an umbrella on a particular day in the near future. That’s a glaringly obvious difference between a true AI system and what Siri is. In other words, the use cases are limited at the moment. As it’s a beta version that is integrated within the mobile platform, however, I expect Apple to broaden Siri’s intelligence over time.
Apple hasn’t yet provided any data on how many are using Siri, but anecdotal evidence suggests a popular product. The company sold 4 million iPhone 4S handsets in the first weekend, all with Siri integrated in iOS 5, making for a large potential audience. As mentioned prior, the service requires a connection back to Apple’s servers, and in the first week alone, I’ve already noticed a handful of times when Apple’s Siri service was unavailable, possibly due to high demand. And Vlingo, a competing solution available prior to Siri, saw the number of voice actions on its service rise 50 percent in the first five days of Siri availability; clearly voice interaction with computers is desirable to many consumers.
Get ready for the invisible interface
The umbrella example best explains not just what Siri is but what it could become. At the very least, Apple could continue to mature Siri so that it works with other apps. Siri already has hooks into web services such as Google, Bing, Yahoo, Yelp and Wolfram Alpha, allowing for Siri to have a wide array of “intelligence” as it seeks answers from these data stores. Siri can search the first four engines like any normal web search, while Wolfram Alpha provides answers to specific questions asked in natural language; it’s a perfect fit as part of Siri’s brain.
Aside from web searches, Siri is limited to Apple iOS 5 apps, so it’s not available for developers to integrate. At least not yet. Given Apple’s history of controlling the user experience, third-party application use with Siri may never happen. But I expect Apple to keep maturing Siri so that it can be used with other core applications or become even more useful with the currently supported apps.
At its core, Siri is an interface, so I don’t think we’ll see new applications tailor-made for Siri. But just as the mouse and graphical user interface changed computing to make it easier to use, Siri has the potential to do the same, not just for smartphones but for any number of web-connected appliances. Since most of Siri’s processing is handled in the cloud, connected devices can listen for spoken commands and then tap into Siri for an understanding of the requests. Instead of tapping a microwave keypad, for example, we might ask the appliance to boil water for tea.
Because Siri is an interface, it can interact both with data stores in the cloud as well as multiple applications or services at the same time. That last bit is key and is exemplified by how Siri works with Reminders. I recently left home and forgot to take medicine I needed prior to leaving. I realized it on the road. Since Siri knows where I live (from my Contacts record), I simply said, “Siri, remind me to take my medicine when I get home.” The software created the reminder as well as a geo-fence around my home address. Using GPS, my iPhone proactively realized when I returned home and promptly reminded me to take my meds when my house was in sight. The integrated use of multiple apps and services makes Siri appear smart.
Of course, speaking to a “personal assistant” on our phones may not be for everyone, nor may it be considered a mainstream activity. But it easily could be. Apple is known for pushing technology into a device. With Siri, Apple has an opportunity to create what I call the “invisible interface,” and it won’t be limited to phones.
Connected computing components are now finding their way into new products everywhere. In my home, the thermostat, my rooftop solar panels, some light switches and a television set all connect back to the web. Each has its own unique interface. Imagine for a second that a Siri-like solution could be used to control all of these devices and even allow them to interact together: “Siri, turn down the air-conditioning by two degrees once we’re creating a power surplus.”
The entire concept of an intelligent but invisible interface ties in with the “Internet of things,” in that more devices are tapping into the vast intelligent stores of the web. Wouldn’t it make sense if all of these devices were controlled through a natural user interface instead of their own custom or proprietary methods involving touches, taps, apps and keyboards?
There’s no clicking on a web page for the invisible interface, no taking out a smartphone and firing up an application. Instead, this interface uses plain spoken words for the system at large to evaluate and then “do.” Siri does this now in a small sense, but as more devices become connected and Siri learns to be smarter, the touch-and-click interfaces of today could be replaced by invisible interfaces tomorrow.
Over the new few years, I could easily see Apple’s integrating Siri into the Mac OS X platform and its Apple TV products. After that, it’s a wide open market to bring the invisible interface into nontraditional computing devices and appliances.
We haven’t had many major computer interface changes in the past few decades, although the touchscreen smartphone is certainly the most recent. Now that more devices require interaction, it’s time for the next big change in computing interfaces. Ironically Siri, as the newest change, is based on spoken voice — one of the oldest methods of interaction that has ever existed.