[MUSIC] So that defines a gradient. But one thing that's gonna be useful is just a different way to visualize the surfaces that we're optimizing over. So instead of looking at these 3D mesh plots that we've been looking at, we can look at a contour plot, where we can kind of think of this as a bird's eye view. [SOUND] It's a bird's eye view of this function that we're examining, where here we've now taken this 3D mesh and just transformed it to a 2D plane. So we have, just in case you can't see it in the slides, this is w0 here. And this is w1, that's a very big zero. Okay, and what each one of these curves is. Again, I'm gonna switch colors to red so it's more visible. So this curve, for example, Represents just taking this mesh on the left hand side and slicing it. So it's just a slice through the 3D function. So. Sorry, the pen is really misbehaving. A slice of the 3D surface where all values here have the same value of this function. Sorry, our functions are called g(w0, w1). So let me just step up and say what I'm trying to say here, which is every w0, w1 pair along this ellipse here Has the same value of the function g because it was just a flat slice through that 3D contour that we're looking at. Okay, so each of these rings have, in this case, they are increasing values of the function as we go from blue, blue means a low value of the function, all the way out to red, that means a high value of the function. We're looking at different slices. We're slicing the function at different values, and that creates these different contours. Okay, so this is what's called a contour plot, and it's useful because it's easier to work with 2D when we're on a 2D surface here. So drawing things will be easier with this representation. Okay, so that was just a little detour into contour plots. So that I can talk about the gradient descent algorithm, which is the analogous algorithm to what I call the hill decent algorithm in 1D. But, in place of the derivative of the function, we've now specified the gradient of the function. And other than that, everything looks exactly the same. So what we're doing, is we're taking we now have a vector of parameters, and we're updating them all at once. We're taking our previous vector and we're updating with our sum, adda times our gradient which was also a vector. So, it's just the vector analog of the hill descent algorithm. But, if I wanna show this a little bit in pictures here. Again switching back to red because it'll be easier to see on this plat. Well if i'm out here at a point the gradient is actually it's pointing in the direction of steepest assent. So that's up hill. It's pointing this way. But we're moving in the negative gradient direction. So let me specify that this thing here is our gradient, gradient direction, but then our steps are gonna be in the opposite direction. So let me actually draw- sorry to take up a little time here but I think it's worthwhile for clarity. Let me just happen to draw the gradient so that it's a purple vector so it's different from the vectors I'm going to be drawing right now. Okay. Cuz the other vectors that I'm gonna be drawing right now are the steps of my gradient descent algorithm. So the actual steps I'm taking are gonna be moving here, Towards this optimal value. So, it's exactly like what we saw in the 1D case, but now we're moving it in a 2D space. Or really any dimensional space but what I'm drawing is just a 2D space. And in terms of assessing convergence in this case well in place of looking at the absolute value of the derivative we're going to look at the magnitude of the gradient. And when the magnitude of the gradient is less that sum epsilon that we're fixing, we're gonna say that the algorithm has converged. >> [MUSIC]