1 00:00:02,977 --> 00:00:06,068 So welcome Andrej, I'm really glad you could join me today. 2 00:00:06,068 --> 00:00:08,180 >> Yeah, thank you for having me. 3 00:00:08,180 --> 00:00:12,170 >> So a lot of people already know your work in deep learning, but 4 00:00:12,170 --> 00:00:14,600 not everyone knows your personal story. 5 00:00:14,600 --> 00:00:17,100 So let us start by telling us, 6 00:00:17,100 --> 00:00:20,510 how did you end up doing all these work in deep learning? 7 00:00:20,510 --> 00:00:21,590 >> Yeah, absolutely. 8 00:00:21,590 --> 00:00:25,100 So I think my first exposure to deep learning once when I was an undergraduate 9 00:00:25,100 --> 00:00:26,520 at the University of Toronto. 10 00:00:26,520 --> 00:00:28,696 And so Geoff Hinton was there, and he was teaching a class on deep learning. 11 00:00:28,696 --> 00:00:32,346 And at that time, it was restricted both from machines trained on and these digits. 12 00:00:32,346 --> 00:00:35,966 And I just really like the way Geoff talked about training the network, 13 00:00:35,966 --> 00:00:38,999 like the mind of the network, and he was using these terms. 14 00:00:38,999 --> 00:00:42,333 And I just thought there was a flavor of something 15 00:00:42,333 --> 00:00:46,870 magical happening when this was training on those digits. 16 00:00:46,870 --> 00:00:50,130 And so that's my first exposure to it, 17 00:00:50,130 --> 00:00:52,440 although I didn't get into it in a lot of detail at that time. 18 00:00:52,440 --> 00:00:55,540 And then when I was doing my master's degree at University of British Columbia, 19 00:00:57,000 --> 00:00:59,850 I took a class with and that was again on machine learning. 20 00:00:59,850 --> 00:01:03,665 And that's the first time I delved deeper into these networks and so on. 21 00:01:03,665 --> 00:01:07,790 And what was interesting is that I was very interested in 22 00:01:07,790 --> 00:01:10,430 artificial intelligence, and so I took classes in artificial intelligence. 23 00:01:10,430 --> 00:01:12,690 But a lot of what I was seeing there was just very not satisfying. 24 00:01:12,690 --> 00:01:16,277 It was a lot of depth-first search, breadth-first search, alpha-beta pruning, 25 00:01:16,277 --> 00:01:17,220 and all these things. 26 00:01:17,220 --> 00:01:20,190 And I was not understanding how, I was not satisfied. 27 00:01:20,190 --> 00:01:23,700 And so when I was seeing neural networks for the first time in machine learning, 28 00:01:23,700 --> 00:01:25,720 which is this term that I think is more technical and 29 00:01:25,720 --> 00:01:29,280 not as well known in most people talk about artificial intelligence. 30 00:01:29,280 --> 00:01:32,550 Machine learning was more a technical term, I would almost say. 31 00:01:32,550 --> 00:01:34,940 And so I was dissatisfied with artificial intelligence. 32 00:01:34,940 --> 00:01:36,520 When I saw machine learning, I was like, 33 00:01:36,520 --> 00:01:41,498 this is the AI that I want to spend time on, this is what's really interesting. 34 00:01:41,498 --> 00:01:45,200 And that's what took me down those directions is that this is 35 00:01:45,200 --> 00:01:47,000 almost a new computing paradigm, I would say. 36 00:01:48,210 --> 00:01:50,730 Because normally, humans write code, but 37 00:01:50,730 --> 00:01:54,263 here in this case, the optimization writes code. 38 00:01:54,263 --> 00:01:56,260 And so you're creating the input/out specification, 39 00:01:56,260 --> 00:01:59,460 and then you have lots of examples of it, and then the optimization writes code, and 40 00:01:59,460 --> 00:02:01,370 sometimes it can write code better than you. 41 00:02:01,370 --> 00:02:06,300 And so I thought that was just a very new way of thinking about programming, and 42 00:02:06,300 --> 00:02:08,150 that's what intrigued me about it. 43 00:02:08,150 --> 00:02:11,580 >> Then through your work, one of the things you've come to be known for 44 00:02:11,580 --> 00:02:18,030 is that you're now this human benchmark for the image classification competition. 45 00:02:18,030 --> 00:02:19,280 How did that come about? 46 00:02:19,280 --> 00:02:23,620 >> So basically, their ImageNet challenge is it's sometimes compared to 47 00:02:23,620 --> 00:02:24,870 the world cup of computer vision. 48 00:02:24,870 --> 00:02:27,820 So whether people are going to care about this benchmark and number, 49 00:02:27,820 --> 00:02:29,330 our error rate goes down over time. 50 00:02:29,330 --> 00:02:33,390 And it was not obvious to me where a human would be on this scale. 51 00:02:33,390 --> 00:02:37,054 I've done a similar smaller scale experiment on CIFAR-10 dataset earlier. 52 00:02:37,054 --> 00:02:40,437 So what I did in CIFAR-10 dataset is I was just looking at these 32 x 32 images, and 53 00:02:40,437 --> 00:02:42,290 I was trying to classify them myself. 54 00:02:42,290 --> 00:02:43,751 At the time, this was only ten categories, so 55 00:02:43,751 --> 00:02:45,371 it's fairly simple to create an interface for it. 56 00:02:45,371 --> 00:02:48,341 And I think I had an error rate of about 6% on that. 57 00:02:48,341 --> 00:02:52,057 And then based on what I was seeing and how hard a task was, 58 00:02:52,057 --> 00:02:56,634 I think I predicted that the lowest error rate we'd achieve would be. 59 00:02:56,634 --> 00:02:58,812 Look, okay, I can't remember the exact numbers. 60 00:02:58,812 --> 00:03:03,530 I think, I guess, 10%, and we're now down to 3 or 2% or something crazy. 61 00:03:03,530 --> 00:03:09,113 So that was my first fun experiment of human baseline. 62 00:03:09,113 --> 00:03:11,870 And I thought it was really important for 63 00:03:11,870 --> 00:03:13,950 the same purposes that you point out in some of your lectures. 64 00:03:13,950 --> 00:03:18,790 I mean, you really want that number to understand how well humans are doing it, 65 00:03:18,790 --> 00:03:20,868 so we can compare machine learning algorithms to it. 66 00:03:20,868 --> 00:03:24,610 And for ImageNet, it seems that there was a discrepancy between how important this 67 00:03:24,610 --> 00:03:27,600 benchmark was and how much focus there was on getting a lower number and 68 00:03:27,600 --> 00:03:31,800 us not even understanding how humans are doing on this benchmark. 69 00:03:31,800 --> 00:03:35,770 So I created this JavaScript interface, and I was showing myself the images, 70 00:03:35,770 --> 00:03:38,340 and then the problem with ImageNet is you don't have just 10 categories, 71 00:03:38,340 --> 00:03:39,340 you have 1,000. 72 00:03:39,340 --> 00:03:41,085 It was almost like a UI challenge. 73 00:03:41,085 --> 00:03:44,443 Obviously, I can't remember 1,000 categories, so how do I make it so 74 00:03:44,443 --> 00:03:45,708 that it's something fair? 75 00:03:45,708 --> 00:03:48,955 And so I listed out all the categories, and I gave myself examples of them. 76 00:03:48,955 --> 00:03:52,679 And so for each image, I was scrolling through 1,000 categories and 77 00:03:52,679 --> 00:03:56,401 just trying to see, based on the examples I was seeing for each category, 78 00:03:56,401 --> 00:03:57,739 what this image might be. 79 00:03:57,739 --> 00:04:01,880 And I thought it was an extremely instructed exercise by itself. 80 00:04:01,880 --> 00:04:06,539 I mean, I did not understand that a third of ImageNet is dogs and dog species, 81 00:04:06,539 --> 00:04:10,125 and so that was interesting to see that network spends a huge 82 00:04:10,125 --> 00:04:12,800 amount of time caring about dogs, I think. 83 00:04:12,800 --> 00:04:14,588 A third of its performance comes from dogs. 84 00:04:14,588 --> 00:04:20,809 And yeah, so this was something that I did for maybe a week or two. 85 00:04:20,809 --> 00:04:21,837 I put everything else on hold. 86 00:04:21,837 --> 00:04:24,250 I thought it was a very fun exercise. 87 00:04:24,250 --> 00:04:26,703 I got a number in the end, and then I thought that one person is not enough. 88 00:04:26,703 --> 00:04:28,770 I wanted to have multiple other people, and so 89 00:04:28,770 --> 00:04:32,275 I was trying to organize within the lab to get other people to do the same thing. 90 00:04:32,275 --> 00:04:36,544 And I think people are not as willing to contribute, say like a week or 91 00:04:36,544 --> 00:04:41,402 two of pretty painstaking work, just like yeah sitting down for five hours and 92 00:04:41,402 --> 00:04:44,222 trying to figure out which dog breed this is. 93 00:04:44,222 --> 00:04:47,876 And so I was not able to get enough data in that respect, but 94 00:04:47,876 --> 00:04:53,430 we got at least some approximate performance, which I thought was fun. 95 00:04:53,430 --> 00:04:57,530 And then this was picked up, and it wasn't obvious to me at the time. 96 00:04:57,530 --> 00:04:59,946 I just wanted to know the number, but this became like a thing. 97 00:04:59,946 --> 00:05:03,800 [LAUGH] And people really liked the fact that this happened, and 98 00:05:03,800 --> 00:05:06,140 refer to jokingly as the reference human. 99 00:05:06,140 --> 00:05:10,775 And of course, that's hilarious to me, yeah. 100 00:05:10,775 --> 00:05:15,892 [LAUGH] >> Were you surprised when software, 101 00:05:15,892 --> 00:05:18,785 finally surpassed your performance? 102 00:05:18,785 --> 00:05:19,787 >> Absolutely. 103 00:05:19,787 --> 00:05:21,968 So yeah, absolutely. 104 00:05:21,968 --> 00:05:26,851 I mean, especially, sometimes it's really hard to see in the image what it is. 105 00:05:26,851 --> 00:05:30,125 It's just like a tiny blob of a black dot is obviously somewhere there. 106 00:05:30,125 --> 00:05:31,335 And I'm not seeing. 107 00:05:31,335 --> 00:05:34,897 I'm guessing between like 20 categories, and the network just gets it, and 108 00:05:34,897 --> 00:05:37,420 I don't understand how that comes about. 109 00:05:37,420 --> 00:05:39,506 So there's some superhumanness to it. 110 00:05:39,506 --> 00:05:44,040 But also, I think the network is extremely good at these kind of statistics of 111 00:05:44,040 --> 00:05:45,050 work types and textures. 112 00:05:46,090 --> 00:05:49,950 I think in that respect, I was not surprised that the network could better 113 00:05:49,950 --> 00:05:53,680 measure those fine statistics across lots of images. 114 00:05:53,680 --> 00:05:57,720 In many cases, I was surprised because some of the images require you to read. 115 00:05:57,720 --> 00:05:59,544 It's just a bottle, and you can't see what it is, but 116 00:05:59,544 --> 00:06:00,904 it actually tells you what it is in text. 117 00:06:00,904 --> 00:06:04,382 And so as a human, I can read it, and it's fine, but the network would have to learn 118 00:06:04,382 --> 00:06:07,292 to read to identify the object, because it wasn't obvious from it. 119 00:06:07,292 --> 00:06:10,204 >> One of the things you've become well-known for, and 120 00:06:10,204 --> 00:06:13,557 that the deep learning community has been grateful to you for, 121 00:06:13,557 --> 00:06:17,115 has been your teaching the class and putting that online. 122 00:06:17,115 --> 00:06:20,049 Tell me a little bit about how that came about. 123 00:06:20,049 --> 00:06:20,740 >> Yeah, absolutely. 124 00:06:20,740 --> 00:06:26,160 So I think I felt very strongly that basically, 125 00:06:26,160 --> 00:06:29,490 this technology was transformative in that a lot of people want to use it. 126 00:06:29,490 --> 00:06:30,700 It's almost like a hammer. 127 00:06:30,700 --> 00:06:31,490 And what I wanted to do, 128 00:06:31,490 --> 00:06:36,170 I was in a position to randomly hand out this hammer to a lot of people. 129 00:06:36,170 --> 00:06:37,785 And I just found that very compelling. 130 00:06:37,785 --> 00:06:40,501 It's not necessarily advisable from the perspective of the PhD student, 131 00:06:40,501 --> 00:06:42,280 because you're putting your research on hold. 132 00:06:42,280 --> 00:06:44,940 I mean, this became like 120% of my time. 133 00:06:44,940 --> 00:06:46,480 And I had to put all of research on hold for 134 00:06:46,480 --> 00:06:50,140 maybe, I mean, I thought the class twice, and each time, it's maybe four months. 135 00:06:50,140 --> 00:06:52,880 And so that time is basically spent entirely on the class, so it's not 136 00:06:52,880 --> 00:06:56,570 super advisable from that perspective, but it was basically the highlight of my PhD. 137 00:06:56,570 --> 00:06:57,838 It's not even related to research. 138 00:06:57,838 --> 00:07:01,110 I think teaching a class was definitely the highlight of my PhD. 139 00:07:01,110 --> 00:07:02,350 Just seeing the students, 140 00:07:02,350 --> 00:07:06,360 just the fact that they're real excited, it was a very different class. 141 00:07:06,360 --> 00:07:08,518 Normally, you're being taught things that were discovered in 1800 or 142 00:07:08,518 --> 00:07:09,167 something like that. 143 00:07:09,167 --> 00:07:12,614 But we were able to come to class and say, look, there's this paper from a week ago, 144 00:07:12,614 --> 00:07:13,453 or even yesterday. 145 00:07:13,453 --> 00:07:16,112 And there's new results, and I think the undergraduate students and 146 00:07:16,112 --> 00:07:18,904 the other students, they just really enjoyed that aspect of the class and 147 00:07:18,904 --> 00:07:20,496 the fact that they actually understood. 148 00:07:20,496 --> 00:07:25,868 So this is not nuclear physics or rocket science. 149 00:07:25,868 --> 00:07:28,454 This is you need to know calculus, and then your algebra, and 150 00:07:28,454 --> 00:07:31,860 you can actually understand everything that happens under the hood. 151 00:07:31,860 --> 00:07:36,465 So I think just the fact that it's so powerful, the fact that it keeps changing 152 00:07:36,465 --> 00:07:39,395 on a daily basis, people felt right they're on the forefront of something big. 153 00:07:39,395 --> 00:07:42,395 And I think that's why people really enjoy that class a lot. 154 00:07:42,395 --> 00:07:47,695 >> And you've really helped a lot of people and had a lot of hammers. 155 00:07:47,695 --> 00:07:48,892 >> Yeah. 156 00:07:48,892 --> 00:07:52,182 >> As someone that's been doing deep learning for 157 00:07:52,182 --> 00:07:56,712 quite some time now, the field is evolving rapidly. 158 00:07:56,712 --> 00:07:59,222 I'd be curious to hear, how has your own thinking, 159 00:07:59,222 --> 00:08:02,889 how has your understanding of deep learning changed over these many years? 160 00:08:02,889 --> 00:08:06,444 >> Yeah, it's basically like when I was seeing Restricted Boltzmann machines for 161 00:08:06,444 --> 00:08:07,599 the first time on DIGITS. 162 00:08:08,620 --> 00:08:11,360 >> It wasn't obvious to me how this technology was going to be used and 163 00:08:11,360 --> 00:08:12,240 how big of a deal it would be. 164 00:08:12,240 --> 00:08:15,270 And also, when I was starting to work in computer vision, convolutional networks, 165 00:08:15,270 --> 00:08:18,240 they were around, but they were not something that a lot of the computer 166 00:08:18,240 --> 00:08:21,770 vision community participated using anytime soon. 167 00:08:21,770 --> 00:08:25,400 I think the perception was that this works for small cases but 168 00:08:25,400 --> 00:08:27,210 would never scale for large images. 169 00:08:27,210 --> 00:08:28,820 >> And that was just extremely incorrect. 170 00:08:28,820 --> 00:08:34,283 [LAUGH] And so basically, I'm just surprised by how general technology is and 171 00:08:34,283 --> 00:08:36,139 how good the results are. 172 00:08:36,139 --> 00:08:39,234 That was largest surprise, I would say, and it's not only that. 173 00:08:39,234 --> 00:08:42,780 So that's one thing that it worked so well on, say, like ImageNet. 174 00:08:42,780 --> 00:08:45,100 But the other thing that I think no one saw coming, or at least for 175 00:08:45,100 --> 00:08:48,310 sure I did not see coming, is that you can take these pretrained networks and 176 00:08:48,310 --> 00:08:49,290 that you can transfer. 177 00:08:49,290 --> 00:08:51,390 You can fine tune them on arbitrary other tasks. 178 00:08:51,390 --> 00:08:52,806 Because now, you're not just solving ImageNet, and 179 00:08:52,806 --> 00:08:53,648 you need millions of examples. 180 00:08:53,648 --> 00:08:56,748 This also happens to be very general feature extractor, and 181 00:08:56,748 --> 00:09:00,349 I think that's a second insight that I think fewer people saw coming. 182 00:09:00,349 --> 00:09:04,340 And there were these papers, they are just locked here. 183 00:09:04,340 --> 00:09:06,840 All the things that people have been working on in computer vision. 184 00:09:06,840 --> 00:09:10,900 Sync classification, actual recognition, object recognition, 185 00:09:10,900 --> 00:09:13,070 base attributes and so on. 186 00:09:13,070 --> 00:09:16,900 And people are just crushing each task just by fine tuning the network. 187 00:09:16,900 --> 00:09:21,136 And so that, to me, was very surprising. 188 00:09:21,136 --> 00:09:25,665 >> Yes, and somehow I guess supervised learning gets most of the press, and 189 00:09:25,665 --> 00:09:30,564 even though pretrained fine-tuning or transfer learning is actually working 190 00:09:30,564 --> 00:09:34,528 very well, people seem to talk less about that for some reason. 191 00:09:34,528 --> 00:09:35,460 >> Right, exactly. 192 00:09:36,560 --> 00:09:39,670 Yeah, I think what has not worked as much is some of these hopes are on unsupervised 193 00:09:39,670 --> 00:09:44,680 learning, which I think has been really why a lot of researchers have gotten into 194 00:09:44,680 --> 00:09:48,310 the field in around 2007 and so on. 195 00:09:48,310 --> 00:09:52,350 And I think the promise of that has still not been delivered, and I think I 196 00:09:52,350 --> 00:09:56,090 find that also surprising is that the supervised learning part worked so well. 197 00:09:56,090 --> 00:09:59,350 And the enterprise learning, it's still in a state of, yeah, 198 00:09:59,350 --> 00:10:03,600 it's still not obvious how it's going to be used or how that's going to work, 199 00:10:03,600 --> 00:10:05,570 even though a lot of people are still deep believers, 200 00:10:05,570 --> 00:10:10,525 I would say to use the term, in this area >> So I know that you're 201 00:10:10,525 --> 00:10:14,495 one of the persons who's been thinking a lot about the long-term future of AI. 202 00:10:14,495 --> 00:10:16,175 Do you want to share your thoughts on that? 203 00:10:16,175 --> 00:10:18,902 >> So I spent the last maybe year and 204 00:10:18,902 --> 00:10:23,548 a half at OpenAI thinking a lot about these topics, and 205 00:10:23,548 --> 00:10:29,410 it seems to me like the field will split into two trajectories. 206 00:10:29,410 --> 00:10:34,010 One will be applied AI, which is just making these neural networks, training 207 00:10:34,010 --> 00:10:37,450 them, mostly with supervised learning, potentially unsupervised learning. 208 00:10:37,450 --> 00:10:40,296 And getting better, say, image recognizers or something like that. 209 00:10:40,296 --> 00:10:45,210 And I think the other will be artificial general intelligence directions, which 210 00:10:45,210 --> 00:10:50,191 is how do you get neural networks that are entirely dynamical system that thinks and 211 00:10:50,191 --> 00:10:54,990 speaks and can do everything that a human can do and has intelligent in that way. 212 00:10:54,990 --> 00:10:58,550 And I think that what's been interesting is that, for example in computer vision. 213 00:10:58,550 --> 00:10:59,990 The way we approached it in the beginning, I think, 214 00:10:59,990 --> 00:11:02,990 was wrong in that we tried to break it down by different parts. 215 00:11:02,990 --> 00:11:05,862 So we were like, okay, humans recognize people, humans recognize scenes, 216 00:11:05,862 --> 00:11:06,910 humans recognize objects. 217 00:11:06,910 --> 00:11:09,318 So we're just going to do everything that humans do, 218 00:11:09,318 --> 00:11:12,740 and then once we have all those things, and now we have different areas. 219 00:11:12,740 --> 00:11:13,850 And once we have all those things, 220 00:11:13,850 --> 00:11:16,080 we're going to figure out how to put them together. 221 00:11:16,080 --> 00:11:17,710 And I think that was a wrong approach, 222 00:11:17,710 --> 00:11:21,120 and we've seen how that going to played out historically. 223 00:11:21,120 --> 00:11:24,173 And so I think there's something similar that's going on that's likely on a higher 224 00:11:24,173 --> 00:11:24,737 level with AI. 225 00:11:24,737 --> 00:11:28,363 So people are asking, well, okay, people plan, people do experiments to 226 00:11:28,363 --> 00:11:32,463 figure out how the world works, or people talk to other people, so we need language. 227 00:11:32,463 --> 00:11:35,684 And people are trying to decompose it by function, accomplish each piece, and 228 00:11:35,684 --> 00:11:37,522 then put it together into some kind of brain. 229 00:11:37,522 --> 00:11:40,490 And I just think it's just incorrect approach. 230 00:11:40,490 --> 00:11:45,111 And so what I've been a much bigger fan of is not decomposing that way but 231 00:11:45,111 --> 00:11:50,041 having a single kind of neural network that is the complete dynamical system 232 00:11:50,041 --> 00:11:53,690 that you're always working with a full agent. 233 00:11:53,690 --> 00:11:54,760 And then the question is, 234 00:11:54,760 --> 00:11:58,990 how do you actually create objectives such that when you optimize over 235 00:11:58,990 --> 00:12:02,330 the weights that make up that brain, you get intelligent behavior out? 236 00:12:02,330 --> 00:12:05,610 And so that's been something that I've been thinking about a lot at OpenAI. 237 00:12:05,610 --> 00:12:08,330 I think there are a lot of different ways that 238 00:12:08,330 --> 00:12:10,320 people have thought about approaching this problem. 239 00:12:11,550 --> 00:12:12,050 For example, 240 00:12:12,050 --> 00:12:15,510 going in a supervised learning direction, I have this essay online. 241 00:12:15,510 --> 00:12:17,750 It's not an essay, it's a short story that I wrote. 242 00:12:17,750 --> 00:12:20,960 And the short story tries to come up with a hypothetical world of what it might look 243 00:12:20,960 --> 00:12:25,350 like if the way we approach this AGI is just by scaling up supervised learning, 244 00:12:25,350 --> 00:12:27,270 which we know works. 245 00:12:27,270 --> 00:12:32,135 And so that gets into something that looks like Amazon Mechanical Turk where people 246 00:12:32,135 --> 00:12:35,310 associates into lots of robot bodies, and they perform tasks, and 247 00:12:35,310 --> 00:12:38,640 then we train on that as a supervised learning dataset to imitate humans and 248 00:12:38,640 --> 00:12:39,790 what that might look like, and so on. 249 00:12:39,790 --> 00:12:41,924 And so then there are other directions, 250 00:12:41,924 --> 00:12:46,511 like unsurprised learning from algorithmic information theory, things like AIXI, 251 00:12:46,511 --> 00:12:50,872 or from artificial life, things that'll look more like artificial evolution. 252 00:12:50,872 --> 00:12:54,804 And so that's what I spend my time thinking a lot about. 253 00:12:54,804 --> 00:12:57,882 And I think I had the correct answer, but I'm not willing to reveal it here. 254 00:12:57,882 --> 00:13:01,109 [LAUGH] >> I can at least learn more by reading 255 00:13:01,109 --> 00:13:02,050 your blog post. 256 00:13:02,050 --> 00:13:02,550 >> Yeah, absolutely. 257 00:13:03,670 --> 00:13:08,520 >> So you've already given out a lot of advice, and today, 258 00:13:08,520 --> 00:13:13,280 there are a lot of people still wanting to enter the field of AI into deep learning. 259 00:13:13,280 --> 00:13:17,340 So for people in that position, what advice do you have for them? 260 00:13:17,340 --> 00:13:18,390 >> Yeah, absolutely. 261 00:13:18,390 --> 00:13:22,225 So I think when people talk to me about CS231n and why they thought it was a very 262 00:13:22,225 --> 00:13:25,947 useful course, what I keep hearing again and again is just people appreciate 263 00:13:25,947 --> 00:13:29,054 the fact that we got all the way through the low-level details. 264 00:13:29,054 --> 00:13:31,946 And they were not working with the library, they saw the real code. 265 00:13:31,946 --> 00:13:34,415 And they saw how everything was implemented, and 266 00:13:34,415 --> 00:13:36,750 implemented chunks of it themselves. 267 00:13:36,750 --> 00:13:40,550 And so just going all the way down and understanding everything under you, 268 00:13:42,140 --> 00:13:44,020 it's really important to not abstract away things. 269 00:13:44,020 --> 00:13:46,890 You need to have a full understanding of the whole stack. 270 00:13:46,890 --> 00:13:49,390 And that's where I learned the most myself as well when I was learning this stuff 271 00:13:49,390 --> 00:13:52,470 is just implementing it myself from scratch was the most important. 272 00:13:52,470 --> 00:13:57,880 It was the piece that I felt gave me the best kind of bang for 273 00:13:57,880 --> 00:13:59,450 the buck in terms of understanding. 274 00:13:59,450 --> 00:14:00,107 So I wrote my own library. 275 00:14:00,107 --> 00:14:01,438 It's called ConvNetJS. 276 00:14:01,438 --> 00:14:03,663 It was written in Javascript, and it implements convolutional neural network. 277 00:14:03,663 --> 00:14:06,410 That was my way of learning about application. 278 00:14:06,410 --> 00:14:11,238 And so that's something that I keep advising people is that you not work with 279 00:14:11,238 --> 00:14:12,671 flow or something else. 280 00:14:12,671 --> 00:14:15,763 You can work with it once you have written at something yourself on the lowest 281 00:14:15,763 --> 00:14:19,066 detail, you understand everything under you, and now you are comfortable to. 282 00:14:19,066 --> 00:14:21,985 Now, it's possible to use some these frameworks that abstract some of it away 283 00:14:21,985 --> 00:14:23,766 from you, but you know what's under the hood. 284 00:14:23,766 --> 00:14:26,444 And so that's been something that helped me the most. 285 00:14:26,444 --> 00:14:29,174 That's something that people appreciate the most when they take 231n, and 286 00:14:29,174 --> 00:14:30,691 that's what I would advise a lot of people. 287 00:14:30,691 --> 00:14:35,540 >> So rather than run neural network, and it'll all happen like that. 288 00:14:35,540 --> 00:14:38,390 >> Yeah, and in some kind of sequence of layers, and I know that when I add some 289 00:14:38,390 --> 00:14:41,500 dropout layers, it makes it work better, like that's not what you want. 290 00:14:41,500 --> 00:14:45,320 In that case, you're not going to be able to debug effectively, 291 00:14:45,320 --> 00:14:47,610 you're not going to be able to improve on models effectively. 292 00:14:48,630 --> 00:14:52,184 >> Yeah, with that answer, I'm really glad that deep learning course got AI course. 293 00:14:52,184 --> 00:14:56,235 It starts a lot with many weeks of Python programming first and then [INAUDIBLE]. 294 00:14:56,235 --> 00:14:57,625 >> Yeah, good, good. 295 00:14:57,625 --> 00:15:00,947 >> Thank you very much for sharing your insights and advice. 296 00:15:00,947 --> 00:15:04,355 You're already heroes of many people in the deep learning world, so 297 00:15:04,355 --> 00:15:07,598 I'm really glad, really grateful you could join us here today. 298 00:15:07,598 --> 00:15:09,585 >> Yeah, thank you for having me.