It doesn’t matter if deep learning mimics the brain or Watson is cognitive. It matters if they work

I spent half an hour speaking with IBM Watson VP John Gordon on Tuesday, and no matter how many ways I asked it, he would not acknowledge a gap between peoples’ perceptions of Watson and what the “cognitive computing” system is actually capable of doing. I know there’s misunderstanding out there — I just know it — but Gordon spun his responses to focus on inspiration rather than disappointment, about how easy it is to learn Watson and build new applications now that it’s available as a set of software products and cloud APIs.

It annoyed me at first, but the more I think about it, the less I can fault his strategy. It wasn’t so long ago, he noted, that it was still only Ph.Ds. in IBM Research programming Watson systems for early users; today, pretty much anybody with an application and some data can start experimenting with it. There’s not a lot of point in dwelling on whether or not people really get the artificial intelligence, because anyone willing to give the cloud services a shot will soon figure out all they really need to know about it — that it works or doesn’t work for what they want to do.

The first set of Watson cloud APIs on IBM Bluemix.
The first set of Watson cloud APIs on IBM Bluemix.

I think that’s becoming true too for deep learning, the red-hot AI field that, like Watson, is also the subject of lofty claims and more than a modicum of hyperbole. IEEE Spectrum recently published an interview with machine learning expert Michael Jordan, of the University of California, Berkeley, in which he addresses some of the bigger misconceptions about the technology.

His comments boil down to this: Deep learning is not that revolutionary, it’s only really useful for a limited number of things and deep learning models certainly do not mimic real-life brain activity. The latter is a lazy and inaccurate metaphor.

(Jordan later clarified some misinterpretations of his opinions — particularly regarding big data, which is a topic for a whole other day — in a blog post. Earlier, in September, he did an Ask Me Anything session on Reddit where he also elaborates on where he sees promise and hype in deep learning.)

But even if Jordan is largely correct, it might be a waste of energy for most of us to stress too much over explanations of how exactly the models work or whether researchers are overstating the import of their results. Yann LeCun, a New York University researcher and director of Facebook’s AI efforts, probably said it best in a (largely supportive) response to Jordan’s comments via Facebook wall post:

There is nothing wrong with deep learning as a topic of investigation, and there is definitely nothing wrong with models that work well, such as convolutional nets.

… Yes, most of the ideas behind some of the most successful deep learning models have been around since the 80’s. That doesn’t make them less useful.

People are so excited by deep learning right now because it has proven so useful on certain pattern-recognition tasks. I’ve argued before that “[w]e could call it anything; it could be modeled after the interlocking joints in my laminate flooring rather than neurons in the brain” and deep learning would still fascinate people as long as it still resulted in better speech recognition, image search, text messaging and recommendations.

About 200 people showed up at our Future of AI event to watch Andrew Ng of Baidu, and others, talk about deep learning. Credit: Biz Carson / Gigaom
About 200 people showed up at our Future of AI event to watch Andrew Ng of Baidu, and others, talk about deep learning. Credit: Biz Carson / Gigaom

Even more promising is how fast deep learning models are making their way out of research labs and into the hands of mere mortals who’ll presumably turn it toward all sorts of new applications. There are now numerous open source tools (including word2vec, deeplearning4j, H20 and Caffe), commercial software products (from companies such as GraphLab, Nvidia, Ersatz Labs and Microsoft) and a growing number of task-specific APIs (including those from AlchemyAPI and Clarifai) targeting application developers.

For what it’s worth, IBM’s Gordon suggested that speech recognition and computer vision will eventually make their way into Watson’s set of capabilities, too.

Even the results of academic studies probably receive more attention than they did several years ago because of the pace at which those techniques can be picked up by open-source projects, and the fact that so much research takes place within companies such as Google, Facebook, Microsoft and Baidu. Even Twitter, Dropbox, Pinterest and Yahoo have deep learning teams. There’s often a pretty clear line between advances in deep learning and products ranging from search to wearable technology to robots.

But of course Jordan is right to point out that deep learning is not some holy grail of artificial intelligence that will render all other approaches obsolete. If it was, why put any more resources into trying to build truly brain-like systems or quantum computers? At Gigaom’s Future of AI event in September, Allen Institute for Artificial Intelligence director Oren Etzioni gave a great talk on the difference between creating better classification algorithms using deep learning and creating algorithms that actually know stuff — that can pass short-answer exams or understand what’s going to happen in an image.

And in the end, all of these efforts will be judged — just like Watson and deep learning — based on how useful they actually are once they’re out of the labs and into developers’ hands.


Consider all the breath some folks have wasted trying to define cloud computing or big data while the people who actually build applications, many of whom are not distributed systems experts, just keep trying out new things as they come available and latching onto what works. Platforms such as Amazon Web Services and Hadoop faced lots of skepticism early on and still do, but although imperfect they underpin major portions of the consumer web experience because they provide better ways to perform necessary functions. Technologies such as Docker and Kafka have been largely ignored by the mainstream tech press, but they’re gaining users by the day.

The point is that users today have an unprecedented ability to decide what approaches win and lose because they have an unprecedented ability to actually use them. So while academic debate over new techniques certainly has a place (for example, too many false starts is not good for funding, which is not good for cutting-edge research), maybe it doesn’t matter too much whether most people understand that Watson isn’t Skynet or that the ideas behind deep learning are neither new nor tightly tied to neuroscience.

If technologies are useful, people will figure out how to make the most of them in whatever form they’re available. If they’re not useful, or if something better comes along, maybe they’ll be relegated to smaller roles in the data-processing pipelines. Maybe people will move on altogether. But it won’t be because deep learning, Watson or whatever comes next didn’t live up to the hype; it will be because they didn’t work.