[MUSIC] Next, let's visualize the path the gradient takes, as opposed to stochastic gradient, what I call the convergence paths. And as you will see, the stochastic gradient oscillates a bit more, but gets you close to that optimal solution. So just before in the black line, I'm showing the path of gradient, and you see that that path is very smooth and it does very nicely. In the red line, I show you the path of stochastic gradient. You see that this is a noisier path. It does get us to the right solution but one thing to note though is that It doesn't convergent stop like gradient does, it oscillates around the optimal. And this is going to be one of the practical issues that we're going to address when we talk about how to get stochastic gradient to work in practice but it's a significant issue. Another view of stochastic gradient oscillating around the optimum, can be viewed in this plot. The one we've been using for quite awhile and you see that gradient is smoothly making progress. But stochastic gradient is this noisy curve as it makes progress, and as it converges. It's oscillating around the optimal. Let's summarize. Gradient ascent looks for the direction of greatest improvement and steepest ascent direction and does that by summing over all possible data points. Stochastic gradient on the other hand tries to find some direction which usually makes progress. So, for example by picking one data point to estimate the gradient and on average it makes a ton of progress and because of that it tends to converge much faster but it's noisier than optimum. So, even in that simple example we've been using today It's over a hundred times faster than gradient conversion much much faster to converge. But, it gets noisy in the end.