[MUSIC] Next, let's visualize
the path the gradient takes, as opposed to stochastic gradient,
what I call the convergence paths. And as you will see, the stochastic
gradient oscillates a bit more, but gets you close to that optimal solution. So just before in the black line,
I'm showing the path of gradient, and you see that that path is very
smooth and it does very nicely. In the red line, I show you
the path of stochastic gradient. You see that this is a noisier path. It does get us to the right solution but
one thing to note though is that It doesn't convergent stop like gradient
does, it oscillates around the optimal. And this is going to be one of
the practical issues that we're going to address when we talk about how to get
stochastic gradient to work in practice but it's a significant issue. Another view of stochastic gradient
oscillating around the optimum, can be viewed in this plot. The one we've been using for
quite awhile and you see that gradient is
smoothly making progress. But stochastic gradient is this
noisy curve as it makes progress, and as it converges. It's oscillating around the optimal. Let's summarize. Gradient ascent looks for
the direction of greatest improvement and steepest ascent direction and does that
by summing over all possible data points. Stochastic gradient on the other hand
tries to find some direction which usually makes progress. So, for example by picking one data
point to estimate the gradient and on average it makes a ton of progress and because of that it tends to converge much
faster but it's noisier than optimum. So, even in that simple example we've been
using today It's over a hundred times faster than gradient conversion
much much faster to converge. But, it gets noisy in the end.