Denver-based deep learning startup AlchemyAPI is expanding outside its text-analysis roots and into the world of image recognition. The company has developed a system it says is as accurate as classifying images as Google’s vaunted image-recognition system and, what’s more, AlchemyAPI is going to offer it up for public consumption via an API.
AlchemyAPI has been letting customers analyze their text via its deep learning API since 2011, and has lured in a number of big-name customers such as Salesforce.com, Jive Software, Shutterstock, Livefyre and “the world’s largest brick-and-mortar retailer,” Founder and CEO Elliot Turner told me during a recent interview. They’re using it for everything from metadata analysis to consumer targeting to legal e-discovery. I’ve ran into numerous companies also using AlchemyAPI, and each one has been impressed.
This week, however, the company is taking the wraps off its image-recognition system via a demonstration at a startup event taking place in Denver. After text, the addition of image recognition was a natural evolution of the service, Turner said. As I explained in a post last week, there are natural applications for image recognition (and even a foray into video recognition) such as search, but Turner expects there’ll be “all sorts of really cool applications that … none of us are even thinking about now.”
Getting to Google-grade
Despite his company’s startup status, Turner said AlchemyAPI is able to do deep learning on the same level as Google — thanks in part to the world of web content Google has made available. AlchemyAPI built its language models by crawling huge portions of the public internet and the dataset now spans more than 15 billion pages, about 100 billion words and around 30 billion tweets. All of this adds up to petabytes of data.
Neural networks (the data-analysis technique on which deep learning is based), he explained, “are data-hungry beasts.”
They’re also hungry for power. And while Google is running its models over hundreds or thousands of machines in its data centers, AlchemyAPI is using the cloud. Using Amazon Web Services’ spot-instance pricing, it’s able to get a large number of GPU cores at a relatively low cost. Turned said a single one of AlchemyAPI’s servers can analyze 10 million images in a day.
When it comes to image recognition specifically, Turner said AlchemyAPI’s dataset is about 10 times larger than the 15 million images on which University of Toronto professor and part-time Google Distinguished Engineer Geoffrey Hinton optimized his lauded deep learning methods. Hinton used the ImageNet dataset while, again, AlchemyAPI has been scraping the web.
In terms of the deep learning system itself, Turner said AlchemyAPI’s initial approach was similar to what Hinton and his team have been pushing — a 12-layer neural network utilizing a technique called “dropout” to minimize overfitting — but has it has since evolved to include “bleeding-edge” techniques such as stochastic pooling and “maxout” networks.
Deep learning — and neural networks, in general — are often described as a black box, Turner said, but these new techniques “help us visualize things going on inside the network.” Instead of just looking at the output of the models, AlchemyAPI’s researchers are able to “essentially debug” the model by seeing what’s going on within the network and improving any funky aspects.
Is AlchemyAPI for real?
However, whatever techniques AlchemyAPI is using don’t matter unless its service is actually accurate enough to justify using it. Based on the demo that Turner let me play with, beauty might be in the eye of the beholder.
Sometimes it knows it’s right:
Sometimes it thinks it knows it’s right:
Sometimes it’s wrong and it knows it:
If there are two images in an image, it’s pretty good at recognizing both, even if it that means it can’t say with confidence the image is of either. What I didn’t notice were any instances where AlchemyAPI’s system classified images incorrectly with more than 50 percent confidence. (I’m curious to find out how well it did against human competitors during the live demo at Denver Startup Week.)
At this point, though, it probably doesn’t need to by anywhere near 100 percent correct all the time. Even Google’s system isn’t perfect; it describes its successes and limitations in this June blog post. What AlchemyAPI needs to be is good enough for some relatively non-critical classification tasks and for potential users to get enough value to keep thinking of new use cases.
Presumably, AlchemyAPI, Google and other researchers will keep improving the underlying technologies, and innovation will happen even faster as more of this stuff is released as open source (like Google’s word2vec tool). At some point, image recognition (and video recognition after that) might be a very valuable tool, as speech recognition and text analysis have already become. Until then, well, it’s still a very hard, very promising technology.
To hear a little more about the promise of deep learning and image recognition, and the types of systems necessary to do them at scale, check out my discussion with Google Fellow Jeff Dean at Structure 2013.
[protected-iframe id=”cfff1d55949ce206bba86475c7742b7d-14960843-6578147″ info=”http://new.livestream.com/accounts/74987/events/2117818/videos/21957034/player?autoPlay=false&height=360&mute=false&width=640″ width=”640″ height=”360″ frameborder=”0″ scrolling=”no”]