So welcome Andrej, I'm really glad you could join me today. >> Yeah, thank you for having me. >> So a lot of people already know your work in deep learning, but not everyone knows your personal story. So let us start by telling us, how did you end up doing all these work in deep learning? >> Yeah, absolutely. So I think my first exposure to deep learning once when I was an undergraduate at the University of Toronto. And so Geoff Hinton was there, and he was teaching a class on deep learning. And at that time, it was restricted both from machines trained on and these digits. And I just really like the way Geoff talked about training the network, like the mind of the network, and he was using these terms. And I just thought there was a flavor of something magical happening when this was training on those digits. And so that's my first exposure to it, although I didn't get into it in a lot of detail at that time. And then when I was doing my master's degree at University of British Columbia, I took a class with and that was again on machine learning. And that's the first time I delved deeper into these networks and so on. And what was interesting is that I was very interested in artificial intelligence, and so I took classes in artificial intelligence. But a lot of what I was seeing there was just very not satisfying. It was a lot of depth-first search, breadth-first search, alpha-beta pruning, and all these things. And I was not understanding how, I was not satisfied. And so when I was seeing neural networks for the first time in machine learning, which is this term that I think is more technical and not as well known in most people talk about artificial intelligence. Machine learning was more a technical term, I would almost say. And so I was dissatisfied with artificial intelligence. When I saw machine learning, I was like, this is the AI that I want to spend time on, this is what's really interesting. And that's what took me down those directions is that this is almost a new computing paradigm, I would say. Because normally, humans write code, but here in this case, the optimization writes code. And so you're creating the input/out specification, and then you have lots of examples of it, and then the optimization writes code, and sometimes it can write code better than you. And so I thought that was just a very new way of thinking about programming, and that's what intrigued me about it. >> Then through your work, one of the things you've come to be known for is that you're now this human benchmark for the image classification competition. How did that come about? >> So basically, their ImageNet challenge is it's sometimes compared to the world cup of computer vision. So whether people are going to care about this benchmark and number, our error rate goes down over time. And it was not obvious to me where a human would be on this scale. I've done a similar smaller scale experiment on CIFAR-10 dataset earlier. So what I did in CIFAR-10 dataset is I was just looking at these 32 x 32 images, and I was trying to classify them myself. At the time, this was only ten categories, so it's fairly simple to create an interface for it. And I think I had an error rate of about 6% on that. And then based on what I was seeing and how hard a task was, I think I predicted that the lowest error rate we'd achieve would be. Look, okay, I can't remember the exact numbers. I think, I guess, 10%, and we're now down to 3 or 2% or something crazy. So that was my first fun experiment of human baseline. And I thought it was really important for the same purposes that you point out in some of your lectures. I mean, you really want that number to understand how well humans are doing it, so we can compare machine learning algorithms to it. And for ImageNet, it seems that there was a discrepancy between how important this benchmark was and how much focus there was on getting a lower number and us not even understanding how humans are doing on this benchmark. So I created this JavaScript interface, and I was showing myself the images, and then the problem with ImageNet is you don't have just 10 categories, you have 1,000. It was almost like a UI challenge. Obviously, I can't remember 1,000 categories, so how do I make it so that it's something fair? And so I listed out all the categories, and I gave myself examples of them. And so for each image, I was scrolling through 1,000 categories and just trying to see, based on the examples I was seeing for each category, what this image might be. And I thought it was an extremely instructed exercise by itself. I mean, I did not understand that a third of ImageNet is dogs and dog species, and so that was interesting to see that network spends a huge amount of time caring about dogs, I think. A third of its performance comes from dogs. And yeah, so this was something that I did for maybe a week or two. I put everything else on hold. I thought it was a very fun exercise. I got a number in the end, and then I thought that one person is not enough. I wanted to have multiple other people, and so I was trying to organize within the lab to get other people to do the same thing. And I think people are not as willing to contribute, say like a week or two of pretty painstaking work, just like yeah sitting down for five hours and trying to figure out which dog breed this is. And so I was not able to get enough data in that respect, but we got at least some approximate performance, which I thought was fun. And then this was picked up, and it wasn't obvious to me at the time. I just wanted to know the number, but this became like a thing. [LAUGH] And people really liked the fact that this happened, and refer to jokingly as the reference human. And of course, that's hilarious to me, yeah. [LAUGH] >> Were you surprised when software, finally surpassed your performance? >> Absolutely. So yeah, absolutely. I mean, especially, sometimes it's really hard to see in the image what it is. It's just like a tiny blob of a black dot is obviously somewhere there. And I'm not seeing. I'm guessing between like 20 categories, and the network just gets it, and I don't understand how that comes about. So there's some superhumanness to it. But also, I think the network is extremely good at these kind of statistics of work types and textures. I think in that respect, I was not surprised that the network could better measure those fine statistics across lots of images. In many cases, I was surprised because some of the images require you to read. It's just a bottle, and you can't see what it is, but it actually tells you what it is in text. And so as a human, I can read it, and it's fine, but the network would have to learn to read to identify the object, because it wasn't obvious from it. >> One of the things you've become well-known for, and that the deep learning community has been grateful to you for, has been your teaching the class and putting that online. Tell me a little bit about how that came about. >> Yeah, absolutely. So I think I felt very strongly that basically, this technology was transformative in that a lot of people want to use it. It's almost like a hammer. And what I wanted to do, I was in a position to randomly hand out this hammer to a lot of people. And I just found that very compelling. It's not necessarily advisable from the perspective of the PhD student, because you're putting your research on hold. I mean, this became like 120% of my time. And I had to put all of research on hold for maybe, I mean, I thought the class twice, and each time, it's maybe four months. And so that time is basically spent entirely on the class, so it's not super advisable from that perspective, but it was basically the highlight of my PhD. It's not even related to research. I think teaching a class was definitely the highlight of my PhD. Just seeing the students, just the fact that they're real excited, it was a very different class. Normally, you're being taught things that were discovered in 1800 or something like that. But we were able to come to class and say, look, there's this paper from a week ago, or even yesterday. And there's new results, and I think the undergraduate students and the other students, they just really enjoyed that aspect of the class and the fact that they actually understood. So this is not nuclear physics or rocket science. This is you need to know calculus, and then your algebra, and you can actually understand everything that happens under the hood. So I think just the fact that it's so powerful, the fact that it keeps changing on a daily basis, people felt right they're on the forefront of something big. And I think that's why people really enjoy that class a lot. >> And you've really helped a lot of people and had a lot of hammers. >> Yeah. >> As someone that's been doing deep learning for quite some time now, the field is evolving rapidly. I'd be curious to hear, how has your own thinking, how has your understanding of deep learning changed over these many years? >> Yeah, it's basically like when I was seeing Restricted Boltzmann machines for the first time on DIGITS. >> It wasn't obvious to me how this technology was going to be used and how big of a deal it would be. And also, when I was starting to work in computer vision, convolutional networks, they were around, but they were not something that a lot of the computer vision community participated using anytime soon. I think the perception was that this works for small cases but would never scale for large images. >> And that was just extremely incorrect. [LAUGH] And so basically, I'm just surprised by how general technology is and how good the results are. That was largest surprise, I would say, and it's not only that. So that's one thing that it worked so well on, say, like ImageNet. But the other thing that I think no one saw coming, or at least for sure I did not see coming, is that you can take these pretrained networks and that you can transfer. You can fine tune them on arbitrary other tasks. Because now, you're not just solving ImageNet, and you need millions of examples. This also happens to be very general feature extractor, and I think that's a second insight that I think fewer people saw coming. And there were these papers, they are just locked here. All the things that people have been working on in computer vision. Sync classification, actual recognition, object recognition, base attributes and so on. And people are just crushing each task just by fine tuning the network. And so that, to me, was very surprising. >> Yes, and somehow I guess supervised learning gets most of the press, and even though pretrained fine-tuning or transfer learning is actually working very well, people seem to talk less about that for some reason. >> Right, exactly. Yeah, I think what has not worked as much is some of these hopes are on unsupervised learning, which I think has been really why a lot of researchers have gotten into the field in around 2007 and so on. And I think the promise of that has still not been delivered, and I think I find that also surprising is that the supervised learning part worked so well. And the enterprise learning, it's still in a state of, yeah, it's still not obvious how it's going to be used or how that's going to work, even though a lot of people are still deep believers, I would say to use the term, in this area >> So I know that you're one of the persons who's been thinking a lot about the long-term future of AI. Do you want to share your thoughts on that? >> So I spent the last maybe year and a half at OpenAI thinking a lot about these topics, and it seems to me like the field will split into two trajectories. One will be applied AI, which is just making these neural networks, training them, mostly with supervised learning, potentially unsupervised learning. And getting better, say, image recognizers or something like that. And I think the other will be artificial general intelligence directions, which is how do you get neural networks that are entirely dynamical system that thinks and speaks and can do everything that a human can do and has intelligent in that way. And I think that what's been interesting is that, for example in computer vision. The way we approached it in the beginning, I think, was wrong in that we tried to break it down by different parts. So we were like, okay, humans recognize people, humans recognize scenes, humans recognize objects. So we're just going to do everything that humans do, and then once we have all those things, and now we have different areas. And once we have all those things, we're going to figure out how to put them together. And I think that was a wrong approach, and we've seen how that going to played out historically. And so I think there's something similar that's going on that's likely on a higher level with AI. So people are asking, well, okay, people plan, people do experiments to figure out how the world works, or people talk to other people, so we need language. And people are trying to decompose it by function, accomplish each piece, and then put it together into some kind of brain. And I just think it's just incorrect approach. And so what I've been a much bigger fan of is not decomposing that way but having a single kind of neural network that is the complete dynamical system that you're always working with a full agent. And then the question is, how do you actually create objectives such that when you optimize over the weights that make up that brain, you get intelligent behavior out? And so that's been something that I've been thinking about a lot at OpenAI. I think there are a lot of different ways that people have thought about approaching this problem. For example, going in a supervised learning direction, I have this essay online. It's not an essay, it's a short story that I wrote. And the short story tries to come up with a hypothetical world of what it might look like if the way we approach this AGI is just by scaling up supervised learning, which we know works. And so that gets into something that looks like Amazon Mechanical Turk where people associates into lots of robot bodies, and they perform tasks, and then we train on that as a supervised learning dataset to imitate humans and what that might look like, and so on. And so then there are other directions, like unsurprised learning from algorithmic information theory, things like AIXI, or from artificial life, things that'll look more like artificial evolution. And so that's what I spend my time thinking a lot about. And I think I had the correct answer, but I'm not willing to reveal it here. [LAUGH] >> I can at least learn more by reading your blog post. >> Yeah, absolutely. >> So you've already given out a lot of advice, and today, there are a lot of people still wanting to enter the field of AI into deep learning. So for people in that position, what advice do you have for them? >> Yeah, absolutely. So I think when people talk to me about CS231n and why they thought it was a very useful course, what I keep hearing again and again is just people appreciate the fact that we got all the way through the low-level details. And they were not working with the library, they saw the real code. And they saw how everything was implemented, and implemented chunks of it themselves. And so just going all the way down and understanding everything under you, it's really important to not abstract away things. You need to have a full understanding of the whole stack. And that's where I learned the most myself as well when I was learning this stuff is just implementing it myself from scratch was the most important. It was the piece that I felt gave me the best kind of bang for the buck in terms of understanding. So I wrote my own library. It's called ConvNetJS. It was written in Javascript, and it implements convolutional neural network. That was my way of learning about application. And so that's something that I keep advising people is that you not work with flow or something else. You can work with it once you have written at something yourself on the lowest detail, you understand everything under you, and now you are comfortable to. Now, it's possible to use some these frameworks that abstract some of it away from you, but you know what's under the hood. And so that's been something that helped me the most. That's something that people appreciate the most when they take 231n, and that's what I would advise a lot of people. >> So rather than run neural network, and it'll all happen like that. >> Yeah, and in some kind of sequence of layers, and I know that when I add some dropout layers, it makes it work better, like that's not what you want. In that case, you're not going to be able to debug effectively, you're not going to be able to improve on models effectively. >> Yeah, with that answer, I'm really glad that deep learning course got AI course. It starts a lot with many weeks of Python programming first and then [INAUDIBLE]. >> Yeah, good, good. >> Thank you very much for sharing your insights and advice. You're already heroes of many people in the deep learning world, so I'm really glad, really grateful you could join us here today. >> Yeah, thank you for having me.