Most people will never have the computer science knowledge to become deep-learning researchers, but now they can test out the results of that work with a simple computer vision iPhone app called Deep Belief. iOS developers can take Deep Belief a step further by downloading an open source software development kit and working its object-recognition capabilities into their own apps.
Deep Belief was built by Jetpac, a startup that launched in 2012 and creates travel guides based on the content of Instagram photos. It’s an implementation of the convolutional neural network approach developed by University of Toronto researchers to win the ImageNet object-recognition competition in 2012. The approach was so accurate that Google bought a company founded by one of the researchers, Geoffrey Hinton, and brought him on part-time as a distinguished researcher.
Jetpac has been using the Deep Belief code on its own servers for a while, co-founder and CTO Pete Warden said, and has seen a marked improvement in how fast it can move. Previously, creating a new type of guide (e.g., restaurants with tacos) would require weeks of building custom code to extract the features of a taco from images. A deep-learning approach is much more efficient, he explained, because “you’re able to hand off a lot of those decisions to deep-learning algorithms.”
What’s more, anyone in the company can help train models simply by taking pictures of the things around which Jetpac wants to build new guides, Warden said.
Bringing deep learning to the iOS masses
By open sourcing the Deep Belief SDK and releasing the app, Warden hopes to take some of the mystery out of deep learning and also open up a very powerful computing model to to a new class of users.
“For anyone working in computer vision, the question has become ‘Why wouldn’t you use a convolutional neural network approach for this project?'” he explained. However, he also noted, “If you’re just an ordinary developer, it’s kind of hard to get started [with deep learning].”
Releasing an SDK should change that letting people experiment — at least on the iPhone — with all sorts of new applications beyond just the object-spotting app and the Deep Belief app (which is essentially a homing device for a single object on which you’ve trained the app) that Jetpac has released.
“I’m really interested in seeing how people are using it,” he added.
And in “miniaturizing” it. Warden, who has been working for more than a decade on various image processing and computer vision problems — including during a five-year stint at at Apple and as a recipient of a National Science Foundation grant investigating real-time feature-recognition algorithms for mobile-phone videos — said the time has finally come to put this type of capability into a mobile phone.
The research in getting neural network architectures and deep-learning models right was a big part of it (and that did require larger computer systems), but advances in data compression now allow these types of algorithms to be carried out on low-end GPUs (GPUs are a popular processing platform for machine learning workloads like this) and even in a phone’s cache. It doesn’t hurt, Warden noted, that “the iPhone … is actually fairly beefy.”
Many attempts to bring deep learning or other forms of artificial intelligence to smartphones involve APIs or some other method of processing images and other data on cloud servers. However, running algorithms locally is important if you want to have computer vision capabilities without relying on a network connection, or when you want to ensure privacy.
Jetpac isn’t alone in trying to bring deep learning to smartphones, though. Google is using its quantum computer to research object-recognition algorithms that could run on a smartphone rather than on its cloud servers. I’ve seen a demo of a facial-recognition app designed to run locally on a device, and it seems safe to assume every mobile device manufacturer and scores of app startups are considering how they might incorporate advanced object- or facial-recognition capabilities into their products in order to help users bring order to their troves of photos and videos.
Some training required
However, it’s important to understand the current limitations of Deep Belief in order to set realistic expectations and, actually, to do a better job training it. “It’s way better than a computer has ever been. … It’s still nowhere near as good as a human,” Warden said.
First and foremost, Warden explained, is that Deep Belief is designed for object recognition (e.g., dogs, coffee mugs or sunglasses) rather than facial recognition a la Facebook’s DeepFace system. Also, because Deep Belief is built upon the architecture designed for the ImageNet competition, it’s particularly well suited to recognizing the roughly 1,000 categories included in that competition.
Training the Deep Belief app to recognize objects not in those categories happens “up” the stack of neural networks that comprise the architecture. The key to doing it well is getting lots of images of the target object from all sorts of angles, as well as lots of images of things that aren’t the target object, so the app can really learn what features it’s looking for. Jetpac, for example, has used this higher-level network to train its system on “themes” such as “beach,” “driving” or “sailing.”
Here’s a demo video of Warden training the Deep Belief app to recognize his cat:
[protected-iframe id=”8784b39b0c547dbe369e2b100a919aba-14960843-6578147″ info=”//player.vimeo.com/video/91460768″ width=”500″ height=”281″ webkitallowfullscreen=”” mozallowfullscreen=”” allowfullscreen=””]
Still, he said, mistakes will happen. Because it was originally trained on many images of food for the ImageNet competition, the Deep Belief algorithm is prone to labeling pictures of toilet bowls as plates of food, Warden said. There’s a white circular shape on the exterior, something darker in the center. This video of Jetpac’s Spotter app, which has been trained on a million images and will try to recognize what’s in users’ photos, highlights some more mistakes.
With some work, it’s possible to train around these assumptions, but Warden thinks they’re also valuable learning tools. Aside from proving that deep learning models are still far from perfect, mistaking toilet bowls for plated food can help quell those fears of a forthcoming robot apocalypse.
“It’s a bit less scary,” Warden said, “when you see it’s good, but it’s not magic.”
Feature image courtesy of deviantART user rajasegar.