In this video we're going to look at the error surface for a linear neuron. By understanding the shape of this error surface, we can understand a lot about what happens as a linear neuron is learning. We can get a nice geometrical understanding of what's happening when we learn the weights of a linear neuron. By considering a space that's very like the weight space that we use to understand perceptrons, but it has one extra dimension. So we imagine a space in which all the horizontal dimensions correspond to the weights. And there's one vertical dimension that corresponds to the error. So in this space, points on the horizontal plane, correspond to different settings of the weights. And the height corresponds to the error that your making with that set of weights, summed over all training cases. For a linear neuron, the errors you make for each set of weights define error surface. And this error surface is a quadratic bowl. That is, if you take a vertical cross-section, it's always a parabola. And if you take a horizontal cross-section, it's always an ellipse. This is only true for linear systems with a squared error. As soon as we go to a multilayer nonlinear neuron nets, this error surface will get more complicated. As long as the weights aren't too big, the error surface will still be smooth, but it may have many local minimum. Using this error surface we can get a picture of what's happening as we do gradient descent learning using the delta rule. So what the delta rule does is it computes the derivative of the error with respect to the weights. And if you change the weights in proportion to that derivative, that's equivalent to doing steepest descent on the error surface. To put it another way, if we look at the error surface from above, we get elliptical contour lines. And the delta rule is gonna take it at right angles to those elliptical contour lines, as shown in the picture. That's what happens with what's called batch learning, where we get the grayed in, summed overall training cases. But we could also do online learning, where after each training case, we change the weights in proportion to the gradient for that single training case. That's much more like what we do in perceptrons. And, as you can see, the change in the weights moves us towards one of these constraint planes. So in the picture on the right, there are two training cases. To get the first training case correct, we must lie on one of those blue lines. And to get the second training case correct, the two weights must lie on the other blue line. So if we start at one of those red points, and we compute the gradient on the first training case, the delta rule will move us perpendicularly towards that line. If we then consider the other training case, we'll move perpendicularly towards the other line. And if we alternate between the two training cases, we'll zigzag backwards and forwards, moving towards the solution point which is where those two lines intersect. That's the set of weights that is correct for both training cases. Using this picture of the error surface, we can also understand the conditions it will make learning very slow. If that ellipse is very elongated, which is gonna happen if the lines that correspond to training cases is almost parallel, then when we look at the gradient, it's going to have a nasty property. If you look at the red arrow in the picture, the gradient is big in the direction in which we don't want to move very far, and it's small in the direction in which we want to move a long way. So the gradient will quickly take us across the bottom of that ravine. Corresponding to the narrow axis of the ellipse. And will take a long time to take us along the ravine, corresponding to the long Xs of the ellipse. It's just the opposite of what we want. We'd like to get a great into a small across the ravine, and big along the ravine but that's not what we get. And so, simple steepest descent, in which you change each weight in proportion to a learning rate times the error derivative, is gonna have great difficulty, with very elongated surfaces like the one shown in the picture.