Denver-based startup AlchemyAPI is keeping proactive in the world of artificial intelligence, launching on Monday night a new service that lets users perform computer vision tasks such as image-tagging and photo search via API. The product, called AlchemyVision, is the company’s first foray outside the natural-language processing space where it has focused since 2011. It also probably foreshadows a spate of computer vision services yet to come.
AlchemyAPI first demonstrated its object recognition service in September but Turner said the company has done a lot of work in the meantime to get it ready for commercial use. Among the big differences is the sheer scale of the new system, which is running unsupervised across millions of online images and using context from the pages they’re housed on in order to determine what they are. Whereas the demo AlchemyAPI showed off last year was trained on about 1,000 images, the new one knows more than 10,000 concepts and is adding more all the time.
If it doesn’t recognize an object today, Turner said, it very well might in a few days after having seen it in more images.
However, it’s actually a testament to AlchemyAPI’s work that AlchemyVision will try to avoid classifying images it doesn’t recognize. Traditional deep neural networks — like those developed by researchers to compete in certain object-recognition competitions — he explained, are designed to classify specific categories of images but can go a little wonky when they come across things on which they haven’t been trained. For example, Turner said AlchemyAPI ran an image of a disco ball against another deep learning system and it was classified as a tiger shark.
“Some of the academic systems, that had certainly been breaking world records, also had a number of pretty key limitations,” he said.
Another important feature of AlchemyVision is that it can recognize multiple concepts within the same image. Rather than honing in on a fish, for example, it will recognize the fish, the man holding it, and possibly what he’s wearing or that they’re in a boat on a lake. In a photo of an athlete, AlchemyAPI’s system might recognize the type of sport she’s playing, the color of her uniform and perhaps even the numbers on her jersey.
I tested it (using a demo version available here) on this photo I took during a recent trip to Atlanta.
Here were are resulting tag and similar images.
There are all sorts of potential applications for this type of service, although Turner acknowledges AlchemyAPI is focusing on low-hanging fruit to start out — image tagging (i.e., recognizing and labeling the objects or concepts in an image) and image search (i.e., feeding it an image and searching for similar images). Those are potentially powerful tools for retailers that want to help shoppers find content based on how it looks, for example, or for publishers that have collections of photos but still license ones from Getty because they can’t search their own collections (“An asset they can’t find is no asset at all,” Turner said).
These are, by and large, the types of uses Yahoo, Pinterest and Dropbox had in mind when they acquired a handful of computer vision startups over the past couple years. Non-commercial uses might range from medicine (analyzing various types of scans or other images) to law enforcement (crawling the web to find incidents of criminal activity, or perhaps even missing people).
In the next few quarters, though, AlchemyAPI plans to roll out new features for AlchemyVision, including facial recognition, logo recognition and text extraction. Turner said the latter capability will be particularly unique because it will extract text from anywhere within an image — on labels, clothing, street signs or wherever.
However, just like AlchemyAPI has growing competition in the text analysis space, it’s likely to soon have company in the computer vision space, too. As we detailed in numerous sessions at our Structure Data conference in March (including one, embedded below, with AlchemyAPI’s Turner and IBM Watson VP Stephen Gold), smart companies are increasingly viewing nearly anything — from content to sensor readings — as new sources of data from which they can learn about their users, improve products and generally automate classification tasks that used to require lots of human attention.
Many of these capabilities will come from application providers such as Facebook, Pinterest or Dropbox integrating them into their products, although it’s conceivable fellow deep learning experts such as Google and Microsoft could aim to deliver object recognition as a business service, too. Big data and analytics are lucrative fields right now, and the types of analysis that deep learning enables are potentially very useful. Because of its relative difficulty and the advantages that come from access to huge datasets, it’s also an ideal application to deliver via the cloud.