In this episode, Byron and Soumith talk about transfer learning, child development, pain, neural networks, and adversarial networks.
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today our guest is Soumith Chintala. He is an Artificial Intelligence Research Engineer over at Facebook. He holds a Master’s of Science and Computer Science from NYU. Welcome to the show, Soumith.
Soumith Chintala: Thanks, Byron. I am glad to be on the show.
So let’s start out with your background. How did you get to where you are today? I have been reading over your LinkedIn, and it’s pretty fascinating.
It’s almost accidental that I got into AI. I wanted to be an artist, more of a digital artist, and I went to intern at a visual effects studio. After the summer, I realized that I had no talent in that direction, so I instead picked something closer to where my core strength lies, which is programming.
I started working in computer vision, but just on my own in undergrad. And slowly and steadily, I got to CMU to do robotics research. But this was back in 2009, and still deep learning wasn’t really a thing, and AI wasn’t like a hot topic. I was doing stuff like teaching robots to play soccer and doing face recognition and stuff like that.
And then I applied for master’s programs at a bunch of places. I got into NYU, and I didn’t actually know what neural networks were or anything. Yann LeCun, in 2010, was more accessible than he is today, so I went, met with him, and I asked him what kind of computer vision work he could give me to do as a grad student. And he asked me if I knew what neural networks were, and I said no.
This was a stalwart in the field who I’m sitting in front of, and I’m like, “I don’t know, explain neural networks to me.” But he was very kind, and he guided me in the right direction. And I went on to work for a couple of years at NYU as a master’s student and simultaneously as a junior research scientist. I spent another year, almost a year there as a research scientist while also separately doing my startup.
I was part of a music and machine learning startup where we were trying to teach machines to understand and play music. That startup went south, and I was looking for new things. And at the same time, I’d started maintaining this tool called Torch, which was the industry-wide standard for deep learning back then. And so Yann asked me if I wanted to come to Facebook, because they were using a lot of Torch, and they wanted some experts in there.
That’s how I came about, and once I was at Facebook, I did a lot of things—research on adversarial networks, engineering, building PyTorch, etc.
Let’s go through some of that stuff. I’m curious about it. With regard to neural nets, in what way do you think they are similar to how the brain operates, and in what way are they completely different?
I’d say they’re completely different, period. We think they’re similar in very high-level and vague terms like, “Oh, they do hierarchical learning, like humans seem to think as well.” That’s pretty much where the similarity ends. We think, and we hypothesize, that in some very, very high-level way, artificial neural networks learn like human brains, but that’s about it.
So, the effort in Europe—the well-funded effort—The Human Brain Project, which is deliberately trying to build an AGI based on the human brain… Do you think that’s a worthwhile approach or not?
I think all scientific approaches, all scientific explorations are worthwhile, because unless we know… And it’s a reasonably motivated effort, right? It’s not like some random people with bad ideas are trying to put this together; it’s a very well-respected effort with a lot of experts.
I personally wouldn’t necessarily take that direction, because there are many approaches to these things. One is to reverse-engineer the brain at a very fundamental level, and try to put it back together exactly as it was. It’s like investigating a car engine… not knowing how it works, but taking X-ray scans of it and all that, and trying to put it back together and hoping it works.
I’m not sure if that would work with as complicated a system as the brain. So, in terms of the approach, I’m not sure I would do it the same way. But I think it’s always healthy to explore various different directions.
Some people speculate that a single neuron is as complicated in its operations as a supercomputer, which either implies we won’t get to an AGI, or we certainly won’t get it by building something like the human brain.
Let’s talk about vision for just a minute. If I show a person just one sample of some object, a statue of a raven, and then I show them a hundred photos with it partially obscured, on its side, in the dark or half underwater, weirdly lit—a person could just boom, boom, boom, pick it all out.
But you can’t train computers anything like that. They need so many symbols, so many examples. What do you think is going on? What are humans doing that we haven’t taught computers how to do?
I think it’s just the diversity of tasks we handle every day. If we had a machine learning model that was also handling so many diverse tasks as humans do, it would be able to just pick out a raven out of a complicated image just fine. It’s just that when machines are being trained to identify ravens, they’re being trained to identify ravens from a database of images that don’t look very much like the complicated image that they’ve been given.
And because they don’t handle a diverse set of tasks, they’re doing very specific things. They kind of over-fit to that dataset they have been given, in some way. I think this is just a matter of increasing the number of tasks we can make a single machinery model do, and over time, they will get as smart. Of course, the hard problem is we haven’t figured out how to make the same model do a wide variety of tasks.
So that’s transfer learning, and it’s something humans seem to do very well.
Yes.
Does it hinder us that we take such an isolated, domain-specific view when we’re building neural AIs? We say, “Well, we can’t teach it everything, so let’s just teach it how to spot ravens,” and we reinvent the wheel each time? Do you have a gut intuition where the core, the secret of transfer learning at scale is hiding?
Yeah. It’s not that we don’t want to build models that can do a wide variety of tasks. It’s just that we haven’t figured it out yet. The most popular research that you see in media, that’s being highlighted, is the research that gets superhuman abilities in some specific niche task.
But there’s a lot of research that we deal with day-to-day, that we read about, that is not highlighted in popular media, which tries to do one-shot learning, and smarter transfer learning and stuff. And as a field, we’re still trying to figure out how to do this properly. I don’t think, as a community of AI researchers, we’re restricting ourselves to just do the expert systems. It’s just like we haven’t figured out as well how to do more diverse systems.
Well, you said neural nets aren’t much like the human brain. Would you say just in general, mechanical intelligence is different than human intelligence? Or should one watch how children learn things, or study how people recognize what they do, and cognitive biases and all of that?
I think there is a lot of value in doing cognitive science, like looking at how child development happens, and we do that a lot. A lot of inspiration and ideas, even in machine learning and neural networks, does come from looking at such aspects of human learning and human intelligence. And it’s being done.
We collaborate, for example at FAIR—Facebook AI Research—with a few researchers who do try to understand child development and child learning. We’ve been building projects in that direction. For example, children learn things like object permanence between certain ages. If you hide something from a child and then make it reappear, does the child understand that you just put it behind your back and then just showed it to them again? Or does a child think that that object actually just disappeared and then appeared again?
So, these kinds of things are heavily-studied, and we try to understand how the mechanisms of learning are… And we’ve been trying to replicate these for neural networks as well. Can a neural network understand what object permanence is? Can a neural network understand how physics works? Children learn how physics works by playing a lot, playing with blocks, playing with various things in their environment. And we’re trying to see if neural networks can do the same.
There’s a lot of inspiration that can be taken from how humans learn. But there is slight separation between whether we should exactly replicate how neurons work in a human brain, versus neurons work in a computer thing; because human brain neurons, their learning mechanisms and their activation mechanisms are using very different chemicals, different acids and proteins.
And the fundamental building blocks in a computer are very different. You have transistors, and they work bit-wise and so on. At a fundamental block level, we shouldn’t really look for exact inspirations, but at a very high level, we should definitely look for inspiration.
You used the word ‘understand’ several times, in that “Does the computer understand?” Do computers actually understand anything? Is that maybe the problem, that they don’t actually have an experiencing self that understands?
There’s—as they say in the field—‘nobody home’, and therefore there are just going to be these limits of things that come easy to us because we have a self, and we do understand things. But all a computer can do is sense things. Is that a meaningful distinction?
We can sense things, and a computer can sense things in the sense that you have a sensor. You can consume visual inputs, audio inputs, stuff like that. But understanding can be as simple as statistical understanding. You see something very frequently, and you associate that frequency with this particular association of a term or an object. Humans have a statistical understanding of things, and they have a causal understanding of things. We have various different understanding approaches.
And machines can, at this point, with neural networks and stuff… We take a statistical or frequentist approach to things, and we can do them really well. There’s other aspects of machine learning research as well that try to do different kinds of understanding. Causal models try to consume data and see if there’s a causal relationship between two sets of variables and so on.
There’s various levels of understanding, and understanding itself is not a magical word that can be broken down. I think we can break it down into what kinds and what approaches of understanding. Machines can do certain types of understanding, and humans can do certain more types of understanding that machines can’t.
Well, I want to explore that for just a moment. You’re probably familiar with Searle’s Chinese Room thought experiment, but for the benefit of the listeners…
The philosopher [Searle] put out this way to think about that word [‘understanding’]. The setup is that there’s a man who speaks no Chinese, none at all, and he’s in this giant room full of all these very special books. And people slide questions written in Chinese under the door. He picks them up, and he has what I guess you’d call an algorithm.
He looks at the first symbol, he finds the book with that symbol on the spine, he looks up the second symbol that directs him to a third book, a fourth book, a fifth book. He works his way all the way through until he gets to the last character, and he copies down the characters for the answer. Again, he doesn’t know what they are talking about at all. He slides it back under the door. The Chinese speaker [outside] picks it up, reads it, and it’s perfect Chinese. It’s a perfect answer. It rhymes, and it’s insightful and pithy.
The question that Searle is trying to pose is… Obviously, that’s all a computer does. It’s a deterministic system that runs these canned algorithms, that doesn’t understand whether it’s talking about cholera or coffee beans or what have you. That there really is something to understanding.
And Weizenbaum, the man who wrote ELIZA, went so far as to say that when a computer says, “I understand,” that it is just a lie. Because not only is there nothing to understand, there’s just not even an ‘I’ there to understand. So, in what sense would you say a computer understands something?
I think the Chinese Room thing is an interesting puzzle. It’s a thought-provoking situation, rather. But I don’t know about the conclusions you can come to. Like, we’ve seen a lot of historical manuscripts and stuff that we’ve excavated from various regions of the world, and we didn’t understand that language at all. But, over time, through certain statistical techniques, or certain associations, we did understand which words—what the fundamental letters in these languages are, or what these words mean, and so on.
And no one told us exactly what these words mean, or what this language exactly implies. We definitely don’t know how those languages are actually pronounced. But we do understand them by making frequentist associations with certain words to other words, or certain words to certain symbols. And we understand what the word for a ‘man’ is in a certain historical language, or what the word for a ‘woman’ is.
With statistical techniques, you can actually understand what a certain word is, even if you don’t understand the underlying language beforehand. There is a lot of information you can gain, and you can actually understand and learn concepts by using statistical techniques.
If you look at one example in recent machine learning time… is this thing called word2vec. It’s a system, and what it does is you give it a sentence, and it replaces the center word of the sentence with a random other word from the dictionary… And it uses that sentence with this random word in the middle as a negative example, and without replacing the random word—[using] the sentence as-is—is a positive example.
Just using this simple technique, you’ll learn embeddings of words; that is, numbers associated with each word that will try to give some statistical structure to the word. With just a simple model which doesn’t understand anything about what these words mean, or in what context these words are used, you can do simple things like [ask], “Can you tell me what ‘king’, minus ‘man’, plus ‘woman’ is?”
So, when you think of ‘king’, you think, “Okay, it’s a man, a head of state.” And then you say “minus man,” so “king minus man” will try to give you a neutral character of a head of state; and then you add ‘woman’ up, and then you expect ‘queen’… And that’s exactly what the system returns, without actually understanding what each of these words specifically mean, or how they’re spelled, or what context they’re in.
So I think there is more to the story than we actually understand. That is, I think there is a certain level of understanding we can get [to] even without the prior context of knowing how things work. In the same way, computers, I think, can learn and associate certain things without knowing about the real world.
One of the common arguments is like, “Well, but computers haven’t been there and seen that, just like humans did, so they can’t actually make full associations.” That’s probably true. They can’t make full associations, but I think with partial information, they can understand certain concepts and infer certain things just with statistical and causal models that they have to learn [from].
Let me try my question a little differently, and we will get back to the here and now… But this, to me, is really germane because it speaks to how far we’re going to be able to go—in terms of using our present techniques and our present architectures, to build things that we deem to be intelligent.
In your mind, could a computer ever feel pain? Surely, you can put a sensor on a computer that can take the temperature, and then you write a program so that when it hits 500 degrees, it should start playing this mp3 of somebody screaming in agony. But could a computer ever feel pain? Could it ever experience anything?
I don’t think so. Pain is something that’s been baked into humans. If you bake pain into computers, then yeah, maybe, but not without it evolving to learn what pain is, or like baking that in ourselves. I don’t think it will—
—But is knowing what pain is really the same thing as experiencing it? You can know everything about it, but the experience of stubbing your toe is something different than the knowledge of what pain is.
Yeah, it probably doesn’t know exactly what pain is. It just knows how to associate with certain things about pain. But, there are certain aspects of humans that a computer probably can’t exactly relate to… But a computer, at this stage of machines, has a visual sensor, has an audio sensor, has a speaker, and has a touch sensor. Now we’re getting to smell sensors.
Yes, the computer probably can experience every single thing that humans experience, in the same way; but I think that’s largely dissociative from what we need for intelligence. I think a computer can have its own specific intelligence, but not necessarily have all [other] aspects of humans covered. We’re not trying to replicate a human; we’re trying to replicate intelligence that the human has.
Do you believe that the techniques that we’re using today, the way we look at machine learning, the algorithms we use, basic architectures… How long is that going to fuel the advance of AI? Do you think the techniques we have now—if just given more data, faster computers, tweaked algorithms—we’ll eventually get to something as versatile as a human?
Or do you think to get to an AGI or something like it, something that really can effortlessly move between domains, is going to require some completely unknown and undiscovered technology?
I think what you’re implying is: Do we need a breakthrough that we don’t know about yet, that we need AGI for?
And my honest answer is we probably do. I just don’t know what that thing looks like, because we just don’t know ahead of time, I guess. I think we are going in certain directions that we think can get us to better intelligence. Right now, where we are is that we collect a very, very large dataset, and then we throw it into a neural network model; and then it will learn something of significance.
But we are trying to reduce the amount of data the neural network needs to learn the same thing. We are trying to increase the number of tasks the same neural network can learn, and we don’t know how to do either of [those] things properly yet. Not as properly as [we do] if we want to train some dog detector by throwing large amounts of dog pictures at it.
I think through scientific process, we will get to a place where we understand better what we need. Over this process, we’ll probably have some unknown models that will come up, or some breakthroughs that will happen. And I think that is largely needed for us to get to a general AI. I definitely don’t know what the timelines are like, or what that looks like.
Talk about adversarial AI for a moment. I watched a talk you gave on the topic. Can you give us a broad overview of what the theory is, and where we are at with it?
Sure. Adversarial networks are these very simple ways of [using] neural networks that we built.
We’ve realized that one of the most common ways we have been training neural networks is: You give a neural network some data, and then you give it an expected output; and if the neural network gives an output that is slightly off from your expected output, you train the neural network to get better at this particular task. Over time, as you give it more data, and you tune it to give the correct output, the neural network gets better.
But adversarial networks are these slightly different formulations of machines, where you have two neural networks. And one neural network tries to synthesize some data. It takes in no inputs, or it takes some random noise as input, and then it tries to generate some data. And you have another neural network that takes in some data, whether it’s real data or data that is generated by this generator neural network. And this [second] neural network, its job is to discriminate between the real data and the generated data. This is called a discriminator network.
[So] you have two networks: the generator network that tries to synthesize artificial data; and you have a discriminator network that tries to tell apart the real data and the artificially-generated data. And the way these things are trained, is that the generator network gets rewards if it can fool the discriminator—if it can make the discriminator think that the data it synthesized is real. And the discriminator only gets rewards when it can accurately separate out the fake data from the real data.
There’s just a slightly different formulation in how these neural networks learn; and we call this an unsupervised learning algorithm, because they’re not really hooking onto any aspects of what the task at hand is. They just want to play this game between each other, regardless of what data is being synthesized. So that’s adversarial networks in short.
It sounds like a digital Turing test, where one computer is trying to fool the other one to think that it’s got the real data.
Yeah, you could see it that way.
Where are we at, practically speaking… because it’s kind of the hot thing right now. Has this established itself? And what kinds of problems is it good at solving? Just general unsupervised learning problems?
Adversarial networks have gotten very popular because they seem to be a promising method to do unsupervised learning. And we think unsupervised learning is one of the biggest things we need to crack before we get to more intelligent machines. That’s basically the primary reason. They are a very promising method to do unsupervised learning.
Even without an AGI, there’s a lot of fear wrapped up in people about the effects of artificial intelligence, specifically automation, on the job market.
People fall into one of three groups: There’s people who think that we’re going to enter kind of a permanent Great Depression, where there’s a substantial portion of the population that’s not able to add economic value.
And then another group says, “Well, actually that’s going to happen to all of us. Anything a human can do, we’re going to be able to build a machine to do.”
And then there are people who say, “No, we’ve had disruptive technologies come along, like electricity and machines and steam power, and it’s never bumped unemployment. People have just used these new machines to increase productivity and therefore wages.”
Of those three camps, where do you find yourself? Or is there a fourth one? What are your thoughts on that?
I think it’s a very important policy and social question on how to deal with AI. Yes, we have in the past had technology disruptions and adapted to them, but they didn’t happen just by market forces, right? You had certain policy changes and certain incentives and short-term boosts for the Depression. And you had certain parachutes that you had to give to people during these drastically-changing times.
So it’s a very, very important policy question on how to deal with the progress that AI is making, and what that means for the job market. I follow the camp of… I don’t think it will just solve itself, and there’s a big role that government and companies and experts have to play in understanding what kind of changes are coming, and how to deal with them.
Organizations like the UN could probably help with this transition, but also, there’s a lot of non-profit companies and organizations coming up who have the mission of doing AI for good, and they also have policy research going on. And I think this will play more and more of a big role, and this is very, very important to deal with—our transition into a technology world where AI becomes the norm.
So, to be clear, it sounds like you’re saying you do think that automation or AI will be substantially disruptive to the job market. Am I understanding you correctly? And that we ought to prepare for it?
That is correct. I think, even if we have no more breakthroughs in AI as of now, like if we have literally no significant progress in AI for the next five years or ten years, we will still—just with the current AI technology that we [already] have—we will still be disrupting large domains and fields and markets—
—What do you mean, specifically? Such as?
One of the most obvious is transportation, right? We largely solved the fundamental challenges in building self-driving vehicles—
—Let me interrupt you real quickly. You just said in the next five years. I mean, clearly, you’re not going to have massive displacement in that industry in five years, because even if we get over the technological hurdle, there’s still the regulatory hurdle, there’s still retrofitting machinery. That’s twenty years of transition, isn’t it?
Umm, what I—
—In which time, everybody will retire who’s driving a truck now, and few people will enter into the field—
—What I specifically said was that even if we have no AI breakthroughs in the next five or ten years. I’m not saying that the markets themselves will change in five years. What I specifically said and meant is that even if you have no AI research breakthroughs in five years, we will still see large markets be disrupted, regardless. We don’t need another AI breakthrough to disrupt certain markets.
I see, but don’t you take any encouragement from the past? You can say transportation, but when you look at something like the replacement of animal power with mechanical power, and if you just think of all of the technology, all of the people that displaced… Or you think of the assembly line, which is—if you think about it—a kind of AI, right?
If you’re a craftsperson who makes cars or coaches or whatever one at a time, and this new technology comes along—the assembly line—that can do it for a tenth of the price and ten times the quality. That’s incredibly disrupting. And yet, in those two instances, we didn’t have upticks in unemployment.
Yes,—
—So why would AI be different?
I think it’s just the scale of things, and the fact that we don’t understand fully how things are going to change. Yes, we can try to associate something similar in the past with something similar that’s happening right now, but I think the scale and magnitude of things is very different. You’re talking about in the past over… like over [the course of] thirty years, something has changed.
And now you’re talking about in the next ten years something will change, or something even sooner. So, the scale of things and the number of jobs that are affected, all these things are very different. It’s going to be a hard question that we have to thoroughly investigate and take proper policy change. Because of the scale of things, I don’t know if market forces will just fix things.
So, when you weigh all of the future, as you said—with the technology we have now—and you look to the future and you see, in one column, a lot of disruption in the job market; and then you see all the things that artificial intelligence can do for us, in all its various fields.
To most people, is AI therefore a good thing? Are you overall optimistic about the future with regard to this technology?
Absolutely. I think AI provides us benefits that we absolutely need as humans. There’s no doubt that the upsides are enormous. You accelerate drug discovery, you accelerate how healthcare works, you accelerate how humans transport from one place to another. The magnitude of benefits is enormous if the promises are kept, or the expectations are kept.
And dealing with the policy changes is essential. But my definite bullish view is that the upsides are so enormous that it’s totally worth it.
What would you think, in an AI world, is a good technology path to go [on], from an employment status? Because I see two things. I saw pretty compelling things that say ‘data scientist’ is a super in-demand thing right now, but that’ll be one of the first things we automate, because we can just build tools that do a lot of what that job is.
Right.
And you have people like Mark Cuban, who believes, by the way, [that] the first trillionaires will come from this technology. He said if he had it to do all over again, if he were coming up now, he would study philosophy and liberal arts, because those are the things machines won’t be able to do.
What’s your take on that? If you were getting ready to enter university right now, and you were looking for something to study, that you think would be a field that you can make a career in long-term, what would you pick?
I wouldn’t pick something based on what’s going to be hot. The way I picked my career now, and I think the way people should pick their careers is really what they’re interested in. Now if their only goal is to find a job, then maybe they should pick what Mark Cuban says.
But I also think just being a technologist of some kind, whether they try to become a scientist, or just being an expert in something technology-wise, or being a doctor… I think these things will still be helpful. I don’t know how to associate…
The question is slightly weird to me, because it’s like, “How do I make the most successful career?” And I’ve never thought about it. I’ve just thought about what do I want to do, that’s most interesting. And so I don’t have a good answer, because I’ve never thought about it deeply.
Do you enjoy science fiction? Is there anything in the science fiction world, like movies or books or TV shows, that you think represents how the future is going to turn out? You look at it and think, “Oh, yes, things could happen that way.”
I do enjoy science fiction. I don’t necessarily have specific books or movies that exactly would depict how the future looks. But I think you can take various aspects from various movies and say, “Huh, that does seem like a possibility,” but you don’t necessarily have to buy into the full story.
For example, if you look at the movie Her: You have an OS that talks to you by voice, has a personality, and evolves with its experience and all that. And that seems very reasonable to me. You probably will have voice assistance that will be smarter, and will be programmed to develop a personality and evolve with their experiences.
Now, will they go and make their own OS society? I don’t know, that seems a bit weird. In popular culture, there are various examples like this that seem like they’re definitely plausible.
Do you keep up with the OpenAI initiative, and what are your thoughts on that?
Well, OpenAI seems to be a very good research lab that does fundamental AI research, tries to make progress in the field, just like all of the others are doing. They seem to have a specific mission to be non-profit, and whatever research they do, they want to try to not tie it to a particular company. I think they’re doing good work.
I guess the traditional worry about it is that an AGI, if we built one is, is of essentially limitless value, if you can make digital copies of it. If you think about it, all value is created, in essence, by technology—by human thought and human creativity—and if you somehow capture that genie in that bottle, you can use it for great good or great harm.
I think there are people who worry that by kind of giving ninety-nine percent of the formula away to everybody, no matter how bad their intentions are, you increase the likelihood that there’ll be one bad actor who gets that last little bit and has, essentially, control of this incredibly powerful technology.
It would be akin to the Manhattan Project being open source, except for the very last step of the bomb. I think that’s a worry some people have expressed. What do you think?
I think AI is not going to be able to be developed in isolation. We will have to get to progress in AI collectively. I don’t think it will happen in a way where you just have a bunch of people secretly trying to develop AI, and suddenly they come up with this AGI that’s eternally powerful and something that will take over humanity, or something like that.
I don’t think that fantasy—which is one of the most popular ways you see things in fiction and in movies—will happen. The way I think it will happen is: Researchers will incrementally publish progress, and at some point… It will be gradual. AI will get smarter and smarter and smarter. Not just like some extra magic bit that will make it inhumanly smart. I don’t think that will happen.
Alright. Well, if people want to keep up with you, how do they follow you personally, and that stuff that you’re working on?
I have a Twitter account. That’s how people usually follow what I’ve been up to. It’s twitter.com/soumithchintala.
Alright, I want to thank you so much for taking the time to be on the show.
Thank you, Byron.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.