What we read about deep learning is just the tip of the iceberg

The artificial intelligence technique known as deep learning is white hot right now, as we have noted numerous times before. It’s powering many of the advances in computer vision, voice recognition and text analysis at companies including Google, Facebook, Microsoft and Baidu, and has been the technological foundation of many startups (some of which were acquired before even releasing a product). As far as machine learning goes, these public successes receive a lot of media attention.

But they’re only the public face of a field that appears to be growing like mad beneath the surface. So much research is happening at places that are not large web companies, and even most of the large web companies’ work goes unreported. Big breakthroughs and ImageNet records get the attention, but there’s progress being made all the time.

Just recently, for example, Google’s DeepMind team reported on initial efforts to build algorithm-creating systems that it calls “Neural Turing Machines”; Facebook showed off a “generic” 3D feature for analyzing videos; and Microsoft researchers concluded that quantum computing could prove a boon for certain types of deep learning algorithms.

We’ll talk more about some of these efforts at our Structure Data conference in March, where speakers include a senior researcher from Facebook’s AI lab, as well as prominent AI and robotics researchers from labs at Stanford and MIT.


But anyone who really needs to know what’s happening in deep learning was probably at the Neural Information Processing Systems, or NIPS, conference that happened last week in Montreal, Quebec. It’s a long-running conference that’s increasingly dominated by deep learning. Of the 411 papers accepted to this year’s conference, 46 of them included the word “deep” among their 100 most-used words (according to a topic model by Stanford Ph.D. student Andrej Karpathy). That doubles last year’s number of 23, which itself was 65 percent more than the 15 in 2012.

At the separate deep learning workshop co-located with the NIPS conference, the number of poster presentations this year shot up to 47 from last year’s 28. While some of the bigger research breakthroughs presented at NIPS have already been written about (e.g., the combination of two types of neural networks to automatically produce image captions, research on which Karpathy worked), other potentially important work goes largely unnoticed by the general public.

Yoshua Bengio — a University of Montreal researcher well known in deep learning circles, who has so far resisted the glamour of corporate research labs — and his team appear very busy. Bengio is listed as a coauthor on five of this year’s NIPS papers, and another seven at the workshop, but his name doesn’t often come up in stories about Skype Translate or Facebook trying to check users posting drunken photos.


In a recent TEDx talk, Enlitic CEO Jeremy Howard talked about advances in translation and medical imaging that have flown largely under the radar, and also showed off how software like the stuff his company is building could help doctors train computers to classify medical images in just minutes.

The point here is not just to say, “Wow! Look how much research is happening.” Nor is it to warn of an impending AI takeover of humanity. It’s just a heads-up that there’s a lot going on underneath the surface that goes largely underreported by the press, but of which certain types of people should try to keep abreast nonetheless.

Lawmakers, national security agents, ethicists and economists (Howard touches on the economy in that TEDx talk and elaborates in a recent Reddit Ask Me Anything session) need to be aware of what’s happening and what’s possible if our foundational institutions are going to be prepared for the effects of machine intelligence, however it’s defined. (In another field of AI research, Paul Allen is pumping money into projects that are trying to give computers actual knowledge.)

Some example results of Stanford's system. Source: Andrej Karpathy and Li Fei-Fei / Stanford
Results of Karpathy’s research on image captions. Beyond automating that process, imagine combing through millions of unlabeled images to learn about what’s happening in them.

But CEOs, product designers and other business types also need to be aware. We’re seeing a glut of companies claiming they can analyze the heck out of images, text and databases, and others delivering capabilities such as voice interaction and voice search as a service. Even research firm IDC is predicting video, audio and image analytics “will at least triple in 2015 and emerge as the key driver for [big data and analytics] technology investment.”

Smart companies investing in these technologies will see deep learning as much more than a way to automatically tag images for search or analyze sentiment. They’ll see it as a way to learn a whole lot more about their businesses and the customers buying their products.

In deep learning, especially, we’re talking about a field where operational systems exist, techniques are being democratized rapidly and research appears to be increasing exponentially. It’s not just a computer science project anymore; two and a half years later, jokes about Google’s cat-recognizing computers already seem dated.