Why we’re all so obsessed with deep learning

You might have noticed the flurry of activity lately around deep learning. It’s an approach to data analysis centered around stacks of artificial neural networks that, for lack of a more succinct definition, can teach themselves to understand complex patterns and the many little features that comprise the data they’re on which they’re trained. It’s the talk of the town among media types, entrepreneurs and computer scientists not just because it sounds so cool, but mostly because it works.

We’ve covered many of its early applications already — recognizing what’s in pictures, who’s in pictures, how words are related in text and what people are saying. A lot of the research being done in universities, which then gets trained on massive amounts of web data inside places like Google, Microsoft and Facebook, is already making its way into consumer services and even commercial software near you. Google is using neural networks to understand and improve data center efficiency. Some believe deep learning could also be used to analyze time-series data for algorithmic trading models or better understand medical records.

Microsoft’s Skype Translate, a demo of which is below, is an example of applied deep learning.

[protected-iframe id=”55ac2922f7e5c6ea4fd4830c02f48848-14960843-6578147″ info=”” width=”448″ height=”252″ frameborder=”0″ allowfullscreen=””]

Perhaps deep learning methods could even help answer the U.S. Secret Service’s request for a software package that can recognize sarcasm in social media posts. That capability, which was part of a broader request for social-media analysis software, was the subject of a fair amount of skepticism and downright derision this week for a variety of reasons. Some questioned the agency’s motives, others its sense of humor and others yet the feasibility of automatically detecting sarcasm.

However, Richard Socher, a Stanford Ph.D. candidate who specializes in applying deep learning models to sentiment analysis (he was lead author of this paper and helped launch a web service called etcML), thinks that given the right model and the right training data, sarcasm detection might be more possible than some think. He explained how via email:

“For example: If the algorithm knew that certain things are negative (from a training set) it could find sarcasm in “I love getting up early” or “Sure, I enjoy spending all day on my homework” because they have a stark contrast and a similar pattern (saying something positive about something negative). Other indicators and structures could be picked up by an algorithm as potential indicators like “sure, yea, totally” or patterns like “something positive followed by FML”. Models like recursive neural networks that have an understanding of word order could learn such patterns if they are trained on some such examples.”

The tweet below is an example of a stark contrast between positive and negative phrases.

However, he added, other types of sarcasm are nearly impossible for computer to detect, or even many humans, absent deeper knowledge about the speaker. A statement such as “I love coding on the weekends” doesn’t include any inherently negative language and might well be true for many people.

When I spoke with an Australian researcher David Milne recently about an effort, called We Feel, to track the use of emotional words on Twitter, he noted that same concern with a separate project he’s working on to determine whether tweets about depression or suicide are legitimate or sarcastic. Because standard natural-language processing techniques won’t always pick up on sarcasm in language, Milne explained how his team tries to add context to questionable tweets by analyzing that users’ previous and subsequent tweets, as well as any replies and the users’ connections.

Socher suggested that another problem — especially for the Secret Service — might be in detecting outliers, because so few people actually follow through on dumb statements or even threats. It’s the same problem the FBI had when trying to discern patterns that signal someone might be an insider threat. When the majority of people don’t do something, it can be difficult even for machine learning algorithms to detect meaningful patterns among the few who do. So the FBI focuses on the behaviors of individual employees and looks for deviations from their individual baselines.

With more research and some more advanced models, though, Socher thinks it might also be possible to automate this type of assessment:

“I think eventually algorithms that incorporate a user “vector” or other kinds of user models may be able to distinguish sarcastic statements for very prolific users. One such indicator that could help would be how “out of the ordinary” a certain statement would be for a given user. But for this to work well, we would need a LOT of training data.”

But that’s also the beauty of deep learning at this moment in time where we have access to exponentially growing amounts of digital data and cheap, powerful processors. It’s not perfect, it’s not always easy and it’s certainly not the right tool for every job. Where it works, though, deep learning has proven to work remarkably well compared with previous approaches at solving some very challenging problems. And it’s pointing researchers in the right direction to solve others.

We could call it anything; it could be modeled after the interlocking joints in my laminate flooring rather than neurons in the brain. Yet as long as it keeps producing results, we’re going to see a lot more deep learning research, a lot more startups trying to capitalize on it, and a lot more press writing about the field that’s taking us beyond nebulous discussions about “big data” and “uncovering insights” and into discussions about actually putting intelligent systems to work for us.

Feature image courtesy of Shutterstock user Sebastian Kau.