1 00:00:02,550 --> 00:00:05,830 Hi, Ian. Thanks a lot for joining us today. 2 00:00:05,830 --> 00:00:06,860 Thank you for inviting me, 3 00:00:06,860 --> 00:00:08,775 Andrew. I am glad to be here. 4 00:00:08,775 --> 00:00:11,920 Today, you are one of the world's most visible deep learning researchers. 5 00:00:11,920 --> 00:00:14,450 Let us share a bit about your personal story. 6 00:00:14,450 --> 00:00:16,810 So, how do you end up doing this work that you now do? 7 00:00:16,810 --> 00:00:19,150 Yeah. That sounds great. 8 00:00:19,150 --> 00:00:24,287 I guess I first became interested in machine learning right before I met you, actually. 9 00:00:24,287 --> 00:00:29,705 I had been working on neuroscience and my undergraduate adviser, 10 00:00:29,705 --> 00:00:34,600 Jerry Cain, at Stanford encouraged me to take your Intro to AI class. 11 00:00:34,600 --> 00:00:35,790 Oh, I didn't know that. Okay. 12 00:00:35,790 --> 00:00:39,885 So I had always thought that AI was a good idea, 13 00:00:39,885 --> 00:00:42,590 but that in practice, the main, I think, 14 00:00:42,590 --> 00:00:44,483 idea that was happening was like game AI, 15 00:00:44,483 --> 00:00:47,375 where people have a lot of hard-coded rules for 16 00:00:47,375 --> 00:00:49,700 non-player characters in games to say 17 00:00:49,700 --> 00:00:52,085 different scripted lines at different points in time. 18 00:00:52,085 --> 00:00:56,750 And then, when I took your Intro to AI class and you covered topics like 19 00:00:56,750 --> 00:01:02,815 linear regression and the variance decomposition of the error of linear regression, 20 00:01:02,815 --> 00:01:06,665 I started to realize that this is a real science and I could actually 21 00:01:06,665 --> 00:01:10,970 have a scientific career in AI rather than neuroscience. 22 00:01:10,970 --> 00:01:12,730 I see. Great. And then what happened? 23 00:01:12,730 --> 00:01:15,290 Well, I came back and I'd TA to your course later. 24 00:01:15,290 --> 00:01:17,815 Oh, I see. Right. Like a TA. 25 00:01:17,815 --> 00:01:22,595 So a really big turning point for me was while I was TA-ing that course, 26 00:01:22,595 --> 00:01:23,720 one of the students, 27 00:01:23,720 --> 00:01:25,310 my friend Ethan Dreifuss, 28 00:01:25,310 --> 00:01:28,689 got interested in Geoff Hinton's deep belief net paper. 29 00:01:28,689 --> 00:01:29,022 I see. 30 00:01:29,022 --> 00:01:35,660 And the two of us ended up building one of the first GPU CUDA-based machines at 31 00:01:35,660 --> 00:01:43,280 Stanford in order to run Watson machines in our spare time over winter break. 32 00:01:43,280 --> 00:01:43,817 I see. 33 00:01:43,817 --> 00:01:46,295 And at that point, I started to have 34 00:01:46,295 --> 00:01:50,720 a very strong intuition that deep learning was the way to go in the future, 35 00:01:50,720 --> 00:01:53,660 that a lot of the other algorithms that I was working with, 36 00:01:53,660 --> 00:01:56,285 like support vector machines, 37 00:01:56,285 --> 00:01:58,845 didn't seem to have the right asymptotics, 38 00:01:58,845 --> 00:02:01,400 that you add more training data and they get slower, 39 00:02:01,400 --> 00:02:03,476 or for the same amount of training data, 40 00:02:03,476 --> 00:02:08,240 it's hard to make them perform a lot better by changing other settings. 41 00:02:08,240 --> 00:02:13,065 At that point, I started to focus on deep learning as much as possible. 42 00:02:13,065 --> 00:02:18,595 And I remember Richard Reyna's very old GPU paper 43 00:02:18,595 --> 00:02:21,585 acknowledges you for having done a lot of early work. 44 00:02:21,585 --> 00:02:25,850 Yeah. Yeah. That was written using some of the machines that we built. 45 00:02:25,850 --> 00:02:26,656 Yeah. 46 00:02:26,656 --> 00:02:30,755 The first machine I built was just something that Ethan and I built at 47 00:02:30,755 --> 00:02:35,120 Ethan's mom's house with our own money, 48 00:02:35,120 --> 00:02:39,835 and then later, we used lab money to build the first two or three for the Stanford lab. 49 00:02:39,835 --> 00:02:42,965 Wow that's great. I never knew that story. That's great. 50 00:02:42,965 --> 00:02:45,830 And then, today, one of 51 00:02:45,830 --> 00:02:48,365 the things that's really taken 52 00:02:48,365 --> 00:02:51,645 the deep learning world by storm is your invention of GANs. 53 00:02:51,645 --> 00:02:54,085 So how did you come up with that? 54 00:02:54,085 --> 00:02:56,885 I've been studying generative models for a long time, 55 00:02:56,885 --> 00:02:59,000 so GANs are a way of doing 56 00:02:59,000 --> 00:03:02,570 generative modeling where you have a lot of training data and you'd like 57 00:03:02,570 --> 00:03:08,420 to learn to produce more examples that resemble the trading data, but they're imaginary. 58 00:03:08,420 --> 00:03:13,265 They've never been seen exactly in that form before. 59 00:03:13,265 --> 00:03:16,220 There were several other ways of doing generative models that had been 60 00:03:16,220 --> 00:03:19,780 popular for several years before I had the idea for GANs. 61 00:03:19,780 --> 00:03:24,860 And after I'd been working on all those other methods throughout most of my Ph.D., 62 00:03:24,860 --> 00:03:29,000 I knew a lot about the advantages and disadvantages of all the other frameworks like 63 00:03:29,000 --> 00:03:32,630 Boltzmann machines and sparse coding 64 00:03:32,630 --> 00:03:35,955 and all the other approaches that have been really popular for years. 65 00:03:35,955 --> 00:03:40,265 I was looking for something that avoid all these disadvantages at the same time. 66 00:03:40,265 --> 00:03:44,110 And then finally, when I was arguing about generative models with my friends in a bar, 67 00:03:44,110 --> 00:03:45,845 something clicked into place, 68 00:03:45,845 --> 00:03:47,540 and I started telling them, You need to do, 69 00:03:47,540 --> 00:03:49,510 this, this, and this and I swear it will work. 70 00:03:49,510 --> 00:03:52,890 And my friends didn't believe me that it would work. 71 00:03:52,890 --> 00:03:55,410 I was supposed to be writing the deep learning textbook at the time, 72 00:03:55,410 --> 00:03:55,790 I see. 73 00:03:55,790 --> 00:03:57,620 But I believed strongly enough that it would work that I 74 00:03:57,620 --> 00:03:59,870 went home and coded it up the same night and it worked. 75 00:03:59,870 --> 00:04:02,920 So it take you one evening to implement the first version of GANs? 76 00:04:02,920 --> 00:04:06,050 I implemented it around midnight 77 00:04:06,050 --> 00:04:09,530 after going home from the bar where my friend had his going-away party. 78 00:04:09,530 --> 00:04:10,086 I see. 79 00:04:10,086 --> 00:04:11,784 And the first version of it worked, 80 00:04:11,784 --> 00:04:13,275 which is very, very fortunate. 81 00:04:13,275 --> 00:04:15,825 I didn't have to search for hyperparameters or anything. 82 00:04:15,825 --> 00:04:17,840 There was a story, I read it somewhere, 83 00:04:17,840 --> 00:04:21,851 where you had a near-death experience and that reaffirmed your commitment to AI. 84 00:04:21,851 --> 00:04:24,160 Tell me that one. 85 00:04:24,160 --> 00:04:30,215 So, yeah. I wasn't actually near death but I briefly thought that I was. 86 00:04:30,215 --> 00:04:33,170 I had a very bad headache and some of 87 00:04:33,170 --> 00:04:37,571 the doctors thought that I might have a brain hemorrhage. 88 00:04:37,571 --> 00:04:39,740 And during the time that I was waiting for 89 00:04:39,740 --> 00:04:43,180 my MRI results to find out whether I had a brain hemorrhage or not, 90 00:04:43,180 --> 00:04:47,810 I realized that most of the thoughts I was having were about making 91 00:04:47,810 --> 00:04:49,910 sure that other people would eventually 92 00:04:49,910 --> 00:04:52,750 try out the research ideas that I had at the time. 93 00:04:52,750 --> 00:04:53,224 I see. I see. 94 00:04:53,224 --> 00:04:55,820 In retrospect, they're all pretty silly research ideas. 95 00:04:55,820 --> 00:04:56,553 I see. 96 00:04:56,553 --> 00:04:58,700 But at that point, 97 00:04:58,700 --> 00:05:02,325 I realized that this was actually one of my highest priorities in life, 98 00:05:02,325 --> 00:05:05,780 was carrying out my machine learning research work. 99 00:05:05,780 --> 00:05:07,910 I see. Yeah. That's great, 100 00:05:07,910 --> 00:05:10,055 that when you thought you might be dying soon, 101 00:05:10,055 --> 00:05:12,265 you're just thinking how to get the research done. 102 00:05:12,265 --> 00:05:12,649 Yeah. 103 00:05:12,649 --> 00:05:15,690 Yeah. That's commitment. 104 00:05:15,690 --> 00:05:17,850 Yeah. 105 00:05:17,850 --> 00:05:21,808 Yeah. Yeah. So today, you're still at the center of a lot of the activities with GANs, 106 00:05:21,808 --> 00:05:24,560 with Generative Adversarial Networks. 107 00:05:24,560 --> 00:05:27,710 So tell me how you see the future of GANs. 108 00:05:27,710 --> 00:05:32,930 Right now, GANs are used for a lot of different things, like semi-supervised learning, 109 00:05:32,930 --> 00:05:39,185 generating training data for other models and even simulating scientific experiments. 110 00:05:39,185 --> 00:05:43,850 In principle, all of these things could be done by other kinds of generative models. 111 00:05:43,850 --> 00:05:47,695 So I think that GANs are at an important crossroads right now. 112 00:05:47,695 --> 00:05:50,210 Right now, they work well some of the time, 113 00:05:50,210 --> 00:05:55,890 but it can be more of an art than a science to really bring that performance out of them. 114 00:05:55,890 --> 00:05:59,870 It's more or less how people felt about deep learning in general 10 years ago. 115 00:05:59,870 --> 00:06:01,430 And back then, we were using 116 00:06:01,430 --> 00:06:05,330 deep belief networks with Boltzmann machines as the building blocks, 117 00:06:05,330 --> 00:06:07,420 and they were very, very finicky. 118 00:06:07,420 --> 00:06:11,945 Over time, we switched to things like rectified linear units and batch normalization, 119 00:06:11,945 --> 00:06:14,635 and deep learning became a lot more reliable. 120 00:06:14,635 --> 00:06:18,470 If we can make GANs become as reliable as deep learning has become, 121 00:06:18,470 --> 00:06:20,840 then I think we'll keep seeing GANs used in 122 00:06:20,840 --> 00:06:24,110 all the places they're used today with much greater success. 123 00:06:24,110 --> 00:06:29,060 If we aren't able to figure out how to stabilize GANs, 124 00:06:29,060 --> 00:06:32,960 then I think their main contribution to the history of deep learning is 125 00:06:32,960 --> 00:06:35,060 that they will have shown people how to 126 00:06:35,060 --> 00:06:37,590 do all these tasks that involve generative modeling, 127 00:06:37,590 --> 00:06:41,505 and eventually, we'll replace them with other forms of generative models. 128 00:06:41,505 --> 00:06:47,870 So I spend maybe about 40 percent of my time right now working on stabilizing GANs. 129 00:06:47,870 --> 00:06:50,780 I see. Cool. Okay. Oh, and so just as a lot of people 130 00:06:50,780 --> 00:06:53,765 that joined deep learning about 10 years ago, such as yourself, 131 00:06:53,765 --> 00:06:54,963 wound up being pioneers, 132 00:06:54,963 --> 00:06:57,360 maybe the people that join GANs today, 133 00:06:57,360 --> 00:07:00,120 if it works out, could end up the early pioneers. 134 00:07:00,120 --> 00:07:04,220 Yeah. A lot of people already are early pioneers of GANs, 135 00:07:04,220 --> 00:07:09,105 and I think if you wanted to give any kind of history of GANs so far, 136 00:07:09,105 --> 00:07:12,740 you'd really need to mention other groups like Indico 137 00:07:12,740 --> 00:07:17,280 and Facebook and Berkeley for all the different things that they've done. 138 00:07:17,280 --> 00:07:19,735 So in addition to all your research, 139 00:07:19,735 --> 00:07:24,300 you also coauthored a book on deep learning. How is that going? 140 00:07:24,300 --> 00:07:26,897 That's right, with Yoshua Bengio and Aaron Courville, 141 00:07:26,897 --> 00:07:29,900 who are my Ph.D. co-advisers. 142 00:07:29,900 --> 00:07:35,465 We wrote the first textbook on the modern version of deep learning, 143 00:07:35,465 --> 00:07:38,615 and that has been very popular, 144 00:07:38,615 --> 00:07:42,920 both in the English edition and the Chinese edition. 145 00:07:42,920 --> 00:07:48,915 We've sold about, I think around 70,000 copies total between those two languages. 146 00:07:48,915 --> 00:07:54,730 And I've had a lot of feedback from students who said that they've learned a lot from it. 147 00:07:54,730 --> 00:07:58,940 One thing that we did a little bit differently than some other books is we start with 148 00:07:58,940 --> 00:08:03,905 a very focused introduction to the kind of math that you need to do in deep learning. 149 00:08:03,905 --> 00:08:07,670 I think one thing that I got from your courses at Stanford is 150 00:08:07,670 --> 00:08:11,570 that linear algebra and probability are very important, 151 00:08:11,570 --> 00:08:15,230 that people get excited about the machine learning algorithms, 152 00:08:15,230 --> 00:08:18,500 but if you want to be a really excellent practitioner, 153 00:08:18,500 --> 00:08:26,055 you've got to master the basic math that underlies the whole approach in the first place. 154 00:08:26,055 --> 00:08:27,290 So we make sure to give 155 00:08:27,290 --> 00:08:31,345 a very focused presentation of the math basics at the start of the book. 156 00:08:31,345 --> 00:08:34,153 That way, you don't need to go ahead and learn all that linear algebra, 157 00:08:34,153 --> 00:08:35,900 that you can get 158 00:08:35,900 --> 00:08:37,770 a very quick crash course in the pieces of 159 00:08:37,770 --> 00:08:40,540 linear algebra that are the most useful for deep learning. 160 00:08:40,540 --> 00:08:44,660 So even someone whose math is a little shaky or haven't seen the math for 161 00:08:44,660 --> 00:08:47,000 a few years will be able to start from the beginning of your book and 162 00:08:47,000 --> 00:08:49,790 get that background and get into deep learning? 163 00:08:49,790 --> 00:08:52,175 All of the facts that you would need to know are there. 164 00:08:52,175 --> 00:08:59,520 It would definitely take some focused effort to practice making use of them. 165 00:08:59,520 --> 00:08:59,684 Yeah. Yeah. Great. 166 00:08:59,684 --> 00:09:01,370 If someone's really afraid of math, 167 00:09:01,370 --> 00:09:03,700 it might be a bit of a painful experience. 168 00:09:03,700 --> 00:09:08,323 But if you're ready for the learning experience and you believe you can master it, 169 00:09:08,323 --> 00:09:11,360 I think all the tools that you need are there. 170 00:09:11,360 --> 00:09:15,470 As someone that worked in deep learning for a long time, 171 00:09:15,470 --> 00:09:18,710 I'd be curious, if you look back over the years. 172 00:09:18,710 --> 00:09:21,050 Tell me a bit about how you're thinking 173 00:09:21,050 --> 00:09:24,650 of AI and deep learning has evolved over the years. 174 00:09:24,650 --> 00:09:28,595 Ten years ago, I felt like, as a community, 175 00:09:28,595 --> 00:09:31,580 the biggest challenge in machine learning was just how 176 00:09:31,580 --> 00:09:34,715 to get it working for AI-related tasks at all. 177 00:09:34,715 --> 00:09:39,440 We had really good tools that we could use for simpler tasks, 178 00:09:39,440 --> 00:09:44,555 where we wanted to recognize patterns in how to extract features, 179 00:09:44,555 --> 00:09:47,000 where a human designer could do a lot of 180 00:09:47,000 --> 00:09:51,965 the work by creating those features and then hand it off to the computer. 181 00:09:51,965 --> 00:09:54,170 Now, that was really good for different things 182 00:09:54,170 --> 00:09:56,750 like predicting which ads a user would click 183 00:09:56,750 --> 00:10:01,895 on or different kinds of basic scientific analysis. 184 00:10:01,895 --> 00:10:07,505 But we really struggled to do anything involving millions of pixels in an image or 185 00:10:07,505 --> 00:10:10,150 a raw audio wave form where 186 00:10:10,150 --> 00:10:13,950 the system had to build all of its understanding from scratch. 187 00:10:13,950 --> 00:10:18,880 We finally got over the hurdle really thoroughly maybe five years ago. 188 00:10:18,880 --> 00:10:22,180 And now, we're at a point where there are 189 00:10:22,180 --> 00:10:26,268 so many different paths open that someone who wants to get involved in AI, 190 00:10:26,268 --> 00:10:31,060 maybe the hardest problem they face is choosing which path they want to go down. 191 00:10:31,060 --> 00:10:35,500 Do you want to make reinforcement learning work as well as supervised learning works? 192 00:10:35,500 --> 00:10:40,410 Do you want to make unsupervised learning work as well as supervised learning works? 193 00:10:40,410 --> 00:10:44,333 Do you want to make sure that machine learning algorithms are fair 194 00:10:44,333 --> 00:10:48,460 and don't reflect biases that we'd prefer to avoid? 195 00:10:48,460 --> 00:10:54,565 Do you want to make sure that the societal issues surrounding AI work out well, 196 00:10:54,565 --> 00:10:58,535 that we're able to make sure that AI benefits everyone 197 00:10:58,535 --> 00:11:03,440 rather than causing social upheaval and trouble with loss of jobs? 198 00:11:03,440 --> 00:11:04,600 I think right now, 199 00:11:04,600 --> 00:11:08,025 there's just really an amazing amount of different things that can be done, 200 00:11:08,025 --> 00:11:11,380 both to prevent downsides from AI but also to make sure 201 00:11:11,380 --> 00:11:14,965 that we leverage all of the upsides that it offers us. 202 00:11:14,965 --> 00:11:19,800 And so today, there are a lot of people wanting to get into AI. 203 00:11:19,800 --> 00:11:23,285 So, what advice would you have for someone like that? 204 00:11:23,285 --> 00:11:26,950 I think a lot of people that want to get into AI start thinking that 205 00:11:26,950 --> 00:11:32,200 they absolutely need to get a Ph.D. or some other kind of credential like that. 206 00:11:32,200 --> 00:11:35,155 I don't think that's actually a requirement anymore. 207 00:11:35,155 --> 00:11:40,285 One way that you could get a lot of attention is to write good code and put it on GitHub. 208 00:11:40,285 --> 00:11:43,380 If you have an interesting project that solves 209 00:11:43,380 --> 00:11:47,320 a problem that someone working at the top level wanted to solve, 210 00:11:47,320 --> 00:11:49,840 once they find your GitHub repository, 211 00:11:49,840 --> 00:11:53,450 they'll come find you and ask you to come work there. 212 00:11:53,450 --> 00:11:56,140 A lot of the people that I've hired or 213 00:11:56,140 --> 00:12:00,010 recruited at OpenAI last year or at Google this year, 214 00:12:00,010 --> 00:12:02,755 I first became interested in working with them because of 215 00:12:02,755 --> 00:12:06,895 something that I saw that they released in an open-source forum on the Internet. 216 00:12:06,895 --> 00:12:11,275 Writing papers and putting them on Archive can also be good. 217 00:12:11,275 --> 00:12:12,745 A lot of the time, 218 00:12:12,745 --> 00:12:16,750 it's harder to reach the point where you have something polished enough to really be 219 00:12:16,750 --> 00:12:20,860 a new academic contribution to the scientific literature, 220 00:12:20,860 --> 00:12:27,885 but you can often get to the point of having a useful software product much earlier. 221 00:12:27,885 --> 00:12:30,022 So read your book, 222 00:12:30,022 --> 00:12:33,930 practice the materials and post on GitHub and maybe on Archive. 223 00:12:33,930 --> 00:12:36,100 I think if you learned by reading the book, 224 00:12:36,100 --> 00:12:39,454 it's really important to also work on a project at the same time, 225 00:12:39,454 --> 00:12:42,730 to either choose some way of 226 00:12:42,730 --> 00:12:46,555 applying machine learning to an area that you are already interested in. 227 00:12:46,555 --> 00:12:50,500 Like if you're a field biologist and you want to get into deep learning, 228 00:12:50,500 --> 00:12:53,255 maybe you could use it to identify birds, 229 00:12:53,255 --> 00:12:56,905 or if you don't have an idea for how you'd like to use machine learning in your own life, 230 00:12:56,905 --> 00:13:01,600 you could pick something like making a Street View house numbers classifier, 231 00:13:01,600 --> 00:13:05,580 where all the data sets are set up to make it very straightforward for you. 232 00:13:05,580 --> 00:13:07,330 And that way, you get to exercise all of 233 00:13:07,330 --> 00:13:09,700 the basic skills while you read the book or while 234 00:13:09,700 --> 00:13:14,105 you watch Coursera videos that explain the concepts to you. 235 00:13:14,105 --> 00:13:15,670 So over the last couple of years, 236 00:13:15,670 --> 00:13:20,045 I've also seen you do one more work on adversarial examples. 237 00:13:20,045 --> 00:13:21,535 Tell us a bit about that. 238 00:13:21,535 --> 00:13:24,490 Yeah. I think adversarial examples are 239 00:13:24,490 --> 00:13:29,835 the beginning of a new field that I call machine learning security. 240 00:13:29,835 --> 00:13:33,250 In the past, we've seen computer security issues 241 00:13:33,250 --> 00:13:38,275 where attackers could fool a computer into running the wrong code. 242 00:13:38,275 --> 00:13:40,890 That's called application-level security. 243 00:13:40,890 --> 00:13:46,300 And there's been attacks where people can fool a computer into believing that 244 00:13:46,300 --> 00:13:52,545 messages on a network come from somebody that is not actually who they say they are. 245 00:13:52,545 --> 00:13:55,025 That's called network-level security. 246 00:13:55,025 --> 00:13:57,230 Now, we're starting to see that you can also fool 247 00:13:57,230 --> 00:13:59,920 machine-learning algorithms into doing things they shouldn't, 248 00:13:59,920 --> 00:14:06,010 even if the program running the machine-learning algorithm is running the correct code, 249 00:14:06,010 --> 00:14:07,960 even if the program running 250 00:14:07,960 --> 00:14:10,025 the machine-learning algorithm knows 251 00:14:10,025 --> 00:14:13,605 who all the messages on the network really came from. 252 00:14:13,605 --> 00:14:17,050 And I think, it's important to build security 253 00:14:17,050 --> 00:14:20,830 into a new technology near the start of its development. 254 00:14:20,830 --> 00:14:27,065 We found that it's very hard to build a working system first and then add security later. 255 00:14:27,065 --> 00:14:30,640 So I am really excited about the idea that if 256 00:14:30,640 --> 00:14:34,705 we dive in and start anticipating security problems with machine learning now, 257 00:14:34,705 --> 00:14:37,600 we can make sure that these algorithms are secure from 258 00:14:37,600 --> 00:14:41,650 the start instead of trying to patch it in retroactively years later. 259 00:14:41,650 --> 00:14:43,111 Thank you. That was great. 260 00:14:43,111 --> 00:14:46,090 There's a lot about your story that I thought was fascinating and that, 261 00:14:46,090 --> 00:14:47,470 despite having known you for years, 262 00:14:47,470 --> 00:14:49,935 I didn't actually know, so thank you for sharing all that. 263 00:14:49,935 --> 00:14:53,090 Oh, very welcome. Thank you for inviting me. It was a great shot. 264 00:14:53,090 --> 00:14:53,630 Okay. Thank you. 265 00:14:53,630 --> 00:14:55,010 Very welcome.