So, thanks a lot, Pieter, for joining me today. I think a lot of people know you as a well-known machine learning and deep learning and robotics researcher. I'd like to have people hear a bit about your story. How did you end up doing the work that you do? That's a good question and actually if you would have asked me as a 14-year-old, what I was aspiring to do, it probably would not have been this. In fact, at the time, I thought being a professional basketball player would be the right way to go. I don't think I was able to achieve it. I feel the machine learning lucked out, that the basketball thing didn't work out. Yes, that didn't work out. It was a lot of fun playing basketball but it didn't work out to try to make it into a career. So, what I really liked in school was physics and math. And so, from there, it seemed pretty natural to study engineering which is applying physics and math in the real world. And actually then, after my undergrad in electrical engineering, I actually wasn't so sure what to do because, literally, anything engineering seemed interesting to me. Understanding how anything works seems interesting. Trying to build anything is interesting. And in some sense, artificial intelligence won out because it seemed like it could somehow help all disciplines in some way. And also, it seemed somehow a little more at the core of everything. You think about how a machine can think, then maybe that's more the core of everything else than picking any specific discipline. I've been saying AI is the new electricity, sounds like the 14-year-old version of you; had an earlier version of that even. You know, in the past few years you've done a lot of work in deep reinforcement learning. What's happening? Why is deep reinforcement learning suddenly taking off? Before I worked in deep reinforcement learning, I worked a lot in reinforcement learning; actually with you and Durant at Stanford, of course. And so, we worked on autonomous helicopter flight, then later at Berkeley with some of my students who worked on getting a robot to learn to fold laundry. And kind of what characterized the work was a combination of learning that enabled things that would not be possible without learning, but also a lot of domain expertise in combination with the learning to get this to work. And it was very interesting because you needed domain expertise which was fun to acquire but, at the same time, was very time-consuming for every new application you wanted to succeed of; you needed domain expertise plus machine learning expertise. And for me it was in 2012 with the ImageNet breakthrough results from Geoff Hinton's group in Toronto, AlexNet showing that supervised learning, all of a sudden, could be done with far less engineering for the domain at hand. There was very little engineering by vision in AlexNet. It made me think we really should revisit reinforcement learning under the same kind of viewpoint and see if we can get the diversion of reinforcement learning to work and do equally interesting things as had just happened in the supervised learning. It sounds like you saw earlier than most people the potential of deep reinforcement learning. So now looking in to the future, what do you see next? What are your predictions for the next several ways to come in deep reinforcement learning? So, I think what's interesting about deep reinforcement learning is that, in some sense, there is many more questions than in supervised learning. In supervised learning, it's about learning an input output mapping. In reinforcement learning there is the notion of: Where does the data even come from? So that's the exploration problem. When you have data, how do you do credit assignment? How do you understand what actions you took early on got you the reward later? And then, there is issues of safety. When you have a system autonomously collecting data, it's actually rather dangerous in most situations. Imagine a self-driving car company that says, we're just going to run deep reinforcement learning. It's pretty likely that car would get into a lot of accidents before it does anything useful. You needed negative examples of that, right? You do need some negative examples somehow, yes; and positive ones, hopefully. So, I think there is still a lot of challenges in deep reinforcement learning in terms of working out some of the specifics of how to get these things to work. So, the deep part is the representation, but then the reinforcement learning itself still has a lot of questions. And what I feel is that, with the advances in deep learning, somehow one part of the puzzle in reinforcement learning has been largely addressed, which is the representation part. So, if there is a pattern we can probably represent it with a deep network and capture that pattern. And how to tease apart the pattern is still a big challenge in reinforcement learning. So I think big challenges are, how to get systems to reason over long time horizons. So right now, a lot of the successes in deep reinforcement learning are a very short horizon. There are problems where, if you act well over a five second horizon, you act well over the entire problem. And so a five second scale is something very different from a day long scale, or the ability to live a life as a robot or some software agent. So, I think there's a lot of challenges there. I think safety has a lot of challenges in terms of, how do you learn safely and also how do you keep learning once you're already pretty good? So, to give an example again that a lot of people would be familiar with, self-driving cars, for a self-driving car to be better than a human driver, should human drivers maybe get into bad accidents every three million miles or something. And so, that takes a long time to see the negative data; once you're as good as a human driver. But you want your self-driving car to be better than a human driver. And so, at that point the data collection becomes really really difficult to get that interesting data that makes your system improve. So, it's a lot of challenges related to exploration, that tie into that. But one of the things I'm actually most excited about right now is seeing if we can actually take a step back and also learn the reinforcement learning algorithm. So, reinforcement is very complex, credit assignment is very complex, exploration is very complex. And so maybe, just like how deep learning for supervised learning was able to replace a lot of domain expertise, maybe we can have programs that are learned, that are reinforcement learning programs that do all this, instead of us designing the details. During the reward function or during the whole program? So, this would be learning the entire reinforcement learning program. So, it would be, imagine, you have a reinforcement learning program, whatever it is, and you throw it out some problem and then you see how long it takes to learn. And then you say, well, that took a while. Now, let another program modify this reinforcement learning program. After the modification, see how fast it learns. If it learns more quickly, that was a good modification and maybe keep it and improve from there. Well, I see, right. Yes, and pace the direction. I think it has a lot to do with, maybe, the amount of compute that's becoming available. So, this would be running reinforcement learning in the inner loop. For us right now, we run reinforcement learning as the final thing. And so, the more compute we get, the more it becomes possible to maybe run something like reinforcement learning in the inner loop of a bigger algorithm. Starting from the 14-year-old, you've worked in AI for some 20 plus years now. So, tell me a bit about how your understanding of AI has evolved over this time. When I started looking at AI, it's very interesting because it really coincided with coming to Stanford to do my master's degree there, and there were some icons there like John McCarthy who I got to talk with, but who had a very different approach to, and in the year 2000, for what most people were doing at the time. And also talking with Daphne Koller. And I think a lot of my initial thinking of AI was shaped by Daphne's thinking. Her AI class, her probabilistic graphical models class, and kind of really being intrigued by how simply a distribution of her many random variables and then being able to condition on some subsets variables and draw on conclusions about others could actually give you so much if you can somehow make it computationally attractable, which was definitely the challenge to make it computable. And then from there, when I started my Ph.D. And you arrived at Stanford, and I think you give me a really good reality check, that that's not the right metric to evaluate your work by, and to really try to see the connection from what you're working on to what impact they can really have, what change it can make rather than what's the math that happened to be in your work. Right. That's amazing. I did not realize, I've forgotten that. Yes, it's actually one of the things, aside most often that people asking, if you going to cite only one thing that has stuck with you from Andrew's advice, it's making sure you can see the connection to where it's actually going to do something. You've had and you're continuing to have an amazing career in AI. So, for some of the people listening to you on video now, if they want to also enter or pursue a career in AI, what advice do you have for them? I think it's a really good time to get into artificial intelligence. If you look at the demand for people, it's so high, there is so many job opportunities, so many things you can do, researchwise, build new companies and so forth. So, I'd say yes, it's definitely a smart decision in terms of actually getting going. A lot of it, you can self-study, whether you're in school or not. There is a lot of online courses, for instance, your machine learning course, there is also, for example, Andrej Karpathy's deep learning course which has videos online, which is a great way to get started, Berkeley who has a deep reinforcement learning course which has all of the lectures online. So, those are all good places to get started. I think a big part of what's important is to make sure you try things yourself. So, not just read things or watch videos but try things out. With frameworks like TensorFlow, Chainer, Theano, PyTorch and so forth, I mean whatever is your favorite, it's very easy to get going and get something up and running very quickly. To get to practice yourself, right? With implementing and seeing what does and seeing what doesn't work. So, this past week there was an article in Mashable about a 16-year-old in United Kingdom, who is one of the leaders on Kaggle competitions. And it just said, he just went out and learned things, found things online, learned everything himself and never actually took any formal course per se. And there is a 16-year-old just being very competitive in Kaggle competition, so it's definitely possible. We live in good times. If people want to learn. Absolutely. One question I bet you get all sometimes is if someone wants to enter AI machine learning and deep learning, should they apply for a Ph.D. program or should they get the job with a big company? I think a lot of it has to do with maybe how much mentoring you can get. So, in a Ph.D. program, you're such a guaranteed, the job of the professor, who is your adviser, is to look out for you. Try to do everything they can to, kind of, shape you, help you become stronger at whatever you want to do, for example, AI. And so, there is a very clear dedicated person, sometimes you have two advisers. And that's literally their job and that's why they are professors, most of what they like about being professors often is helping shape students to become more capable at things. Now, it doesn't mean it's not possible at companies, and many companies have really good mentors and have people who love to help educate people who come in and strengthen them, and so forth. It's just, it might not be as much of a guarantee and a given, compared to actually enrolling in a Ph.D. program or that's the crooks of the program is that you're going to learn and somebody is there to help you learn. So it really depends on the company and depends on the Ph.D. program. Absolutely, yes. But I think it is key that you can learn a lot on your own. But I think you can learn a lot faster if you have somebody who's more experienced, who is actually taking it up as their responsibility to spend time with you and help accelerate your progress. So, you've been one of the most visible leaders in deep reinforcement learning. So, what are the things that deep reinforcement learning is already working really well at? I think, if you look at some deep reinforcement learning successes, it's very, very intriguing. For example, learning to play Atari games from pixels, processing this pixels which is just numbers that are being processed somehow and turned into joystick actions. Then, for example, some of the work we did at Berkeley was, we have a simulated robot inventing walking and the reward that it's given is as simple as the further you go north the better and the less hard you impact with the ground the better. And somehow it decides that walking slash running is the thing to invent whereas, nobody showed it, what walking is or running is. Or robot playing with children's stories and learn to kind of put them together, put a block into matching opening, and so forth. And so, I think it's really interesting that in all of these it's possible to learn from raw sensory inputs all the way to raw controls, for example, torques at the motors. But at the same time. So it is very interesting that you can have a single algorithm. For example, you know thrust is impulsive and you can learn, can have a robot learn to run, can have a robot learn to stand up, can have instead of a two legged robot, now you're swapping a four legged robot. You run the same reinforcement algorithm and it still learns to run. And so, there is no change in the reinforcement algorithm. It's very, very general. Same for the Atari games. DQN was the same DQN for every one of the games. But then, when it actually starts hitting the frontiers of what's not yet possible as well, it's nice it learns from scratch for each one of these tasks but would be even nicer if it could reuse things it's learned in the past; to learn even more quickly for the next task. And that's something that's still on the frontier and not yet possible. It always starts from scratch, essentially. How quickly, do you think, you see deep reinforcement learning get deployed in the robots around us, the robots they're getting deployed in the world today. I think in practice the realistic scenario is one where it starts with supervised learning, behavioral cloning; humans do the work. And I think a lot of businesses will be built that way where it's a human behind the scenes doing a lot of the work. Imagine Facebook Messenger assistant. Assistant like that could be built with a human behind the curtains doing a lot of the work; machine learning, matches up with what the human does and starts making suggestions to human so the humans has a small number of options that we can just click and select. And then over time, as it gets pretty good, you're starting fusing some reinforcement learning where you give it actual objectives, not just matching the human behind the curtains but giving objectives of achievement like, maybe, how fast were these two people able to plan their meeting? Or how fast were they able to book their flight? Or things like that. How long did it take? How happy were they with it? But it would probably have to be bootstrap of a lot of behavioral cloning of humans showing how this could be done. So it sounds behavioral cloning just supervise learning to mimic whatever the person is doing and then gradually later on, the reinforcement learning to have it think about longer time horizons? Is that a fair summary? I'd say so, yes. Just because straight up reinforcement learning from scratch is really fun to watch. It's super intriguing and very few things more fun to watch than a reinforcement learning robot starting from nothing and inventing things. But it's just time consuming and it's not always safe. Thank you very much. That was fascinating. I'm really glad we had the chance to chat. Well, Andrew thank you for having me. Very much appreciate it.