if the basic technical idea is behind deep learning behind your networks have been around for decades why are they only just now taking off in this video let's go over some of the main drivers behind the rise of deep learning because I think this will help you that the spot the best opportunities within your own organization to apply these to over the last few years a lot of people have asked me Andrew why is deep learning certainly working so well and when a marsan question this is usually the picture I draw for them let's say we plot a figure where on the horizontal axis we plot the amount of data we have for a task and let's say on the vertical axis we plot the performance on above learning algorithms such as the accuracy of our spam classifier or our ad click predictor or the accuracy of our neural net for figuring out the position of other calls for our self-driving car it turns out if you plot the performance of a traditional learning algorithm like support vector machine or logistic regression as a function of the amount of data you have you might get a curve that looks like this where the performance improves for a while as you add more data but after a while the performance you know pretty much plateaus right suppose your horizontal lines enjoy that very well you know was it they didn't know what to do with huge amounts of data and what happened in our society over the last 10 years maybe is that for a lot of problems we went from having a relatively small amount of data to having you know often a fairly large amount of data and all of this was thanks to the digitization of a society where so much human activity is now in the digital realm we spend so much time on the computers on websites on mobile apps and activities on digital devices creates data and thanks to the rise of inexpensive cameras built into our cell phones accelerometers all sorts of sensors in the Internet of Things we also just have been collecting one more and more data so over the last 20 years for a lot of applications we just accumulate a lot more data more than traditional learning algorithms were able to effectively take advantage of and what new network lead turns out that if you train a small neural net then this performance maybe looks like that if you train a somewhat larger Internet that's called as a medium-sized internet to fall in something a little bit better and if you train a very large neural net then it's the form and often just keeps getting better and better so couple observations one is if you want to hit this very high level of performance then you need two things first often you need to be able to train a big enough neural network in order to take advantage of the huge amount of data and second you need to be out here on the x axes you do need a lot of data so we often say that scale has been driving deep learning progress and by scale I mean both the size of the neural network we need just a new network a lot of hidden units a lot of parameters a lot of connections as well as scale of the data in fact today one of the most reliable ways to get better performance in the neural network is often to either train a bigger network or throw more data at it and that only works up to a point because eventually you run out of data or eventually then your network is so big that it takes too long to train but just improving scale has actually taken us a long way in the world of learning in order to make this diagram a bit more technically precise and just add a few more things I wrote the amount of data on the x-axis technically this is amount of labeled data where by label data I mean training examples we have both the input X and the label Y I went to introduce a little bit of notation that we'll use later in this course we're going to use lowercase alphabet to denote the size of my training sets or the number of training examples this lowercase M so that's the horizontal axis couple other details to this Tigger in this regime of smaller training sets the relative ordering of the algorithms is actually not very well defined so if you don't have a lot of training data is often up to your skill at hand engineering features that determines the foreman so it's quite possible that if someone training an SVM is more motivated to hand engineer features and someone training even large their own that may be in this small training set regime the SEM could do better so you know in this region to the left of the figure the relative ordering between gene algorithms is not that well defined and performance depends much more on your skill at engine features and other mobile details of the algorithms and there's only in this some big data regime very large training sets very large M regime in the right that we more consistently see largely Ronettes dominating the other approaches and so if any of your friends ask you why are known as you know taking off I would encourage you to draw this picture for them as well so I will say that in the early days in their modern rise of deep learning it was scaled data and scale of computation just our ability to Train very large dinner networks either on a CPU or GPU that enabled us to make a lot of progress but increasingly especially in the last several years we've seen tremendous algorithmic innovation as well so I also don't want to understate that interestingly many of the algorithmic innovations have been about trying to make neural networks run much faster so as a concrete example one of the huge breakthroughs in your networks has been switching from a sigmoid function which looks like this to a railer function which we talked about briefly in an early video that looks like this if you don't understand the details of one about the state don't worry about it but it turns out that one of the problems of using sigmoid functions and machine learning is that there these regions here where the slope of the function would gradient is nearly zero and so learning becomes really slow because when you implement gradient descent and gradient is zero the parameters just change very slowly and so learning is very slow whereas by changing the what's called the activation function the neural network to use this function called the value function of the rectified linear unit our elu the gradient is equal to one for all positive values of input right and so the gradient is much less likely to gradually shrink to zero and the gradient here the slope of this line is zero on the left but it turns out that just by switching to the sigmoid function to the rayleigh function has made an algorithm called gradient descent work much faster and so this is an example of maybe relatively simple algorithm in Bayesian but ultimately the impact of this algorithmic innovation was it really hope computation so the regimen quite a lot of examples like this of where we change the algorithm because it allows that code to run much faster and this allows us to train bigger neural networks or to do so the reason or multi-client even when we have a large network roam all the data the other reason that fast computation is important is that it turns out the process of training your network this is very intuitive often you have an idea for a neural network architecture and so you implement your idea and code implementing your idea then lets you run an experiment which tells you how well your neural network does and then by looking at it you go back to change the details of your new network and then you go around this circle over and over and when your new network takes a long time to Train it just takes a long time to go around this cycle and there's a huge difference in your productivity building effective neural networks when you can have an idea and try it and see the work in ten minutes or maybe ammos a day versus if you've to train your neural network for a month which sometimes does happened because you get a result back you know in ten minutes or maybe in a day you should just try a lot more ideas and be much more likely to discover in your network and it works well for your application and so faster computation has really helped in terms of speeding up the rate at which you can get an experimental result back and this has really helped both practitioners of neuro networks as well as researchers working and deep learning iterate much faster and improve your ideas much faster and so all this has also been a huge boon to the entire deep learning research community which has been incredible with just you know inventing new algorithms and making nonstop progress on that front so these are some of the forces powering the rise of deep learning but the good news is that these forces are still working powerfully to make deep learning even better Tech Data society is still throwing up one more digital data or take computation with the rise of specialized hardware like GPUs and faster networking many types of hardware I'm actually quite confident that our ability to do very large neural networks or should a computation point of view will keep on getting better and take algorithms relative learning research communities though continuously phenomenal at innovating on the algorithms front so because of this I think that we can be optimistic answer the optimistic the deep learning will keep on getting better for many years to come so that let's go on to the last video of the section where we'll talk a little bit more about what you learn from this course