[MUSIC] As we saw in our plot, stochastic gradient tends to oscillate around the optimum. And so you should never trust the last parameter it finds, unfortunately. Gradient will eventually stabilize on the optimal solution. So even though it takes a hundred times longer or more, like was shown in this example, so if you look at the x-axis a hundred times more time to converge, you get there. And you feel really good when you get there. Stochastic gradient, when you think it converged, is really that it's oscillating around the optimum, and that can lead to bad practical behavior. So for example here, I'm just giving you some numbers, say w at iteration 1000 might look really, really bad. But maybe W at iteration 1005 looks really, really good and needs some kind of approach to minimize the risk of picking a really bad one or a really good one. And there is a very simple technique which works really well in practice, and theoretically is what you should do. So all the theorems require something like this. And what it says is. When you are outputting w hat, your final self-coefficient, you don't use the last value, w(t), capital T, you use the average of all the values that you've computed. All the coefficients you computed along the way. So, what I'm showing here is what your algorithm should output as it's fitting the solution to make some predictions in the real world. >> [MUSIC]