, The Chain rule is so important, it's worth thinking through a proof of its validity. You might be tempted to think that you can get away with just cancelling. What I mean is you might be tempted to think that something like this works. let's say you wanted to differentiate g of f of x. You might set f of x equal to y. And then, you might think, well, what you're really trying to calculate is the derivative of g of y and okay. And then, you might say, well, the derivative of g of y, that will be the derivative of g with respect to y times the derivative of y with respect to x, and that's really the Chain rule. I mean, this first thing is the derivative of g and this other thing is the derivative of f just like you'd expect, and then you're trying to say that you can just cancel, alright? You're not allowed to just cancel. I mean, the upshot here is just that dy/dx is not a fraction, alright? You can't justify this equality by just canceling because these objects, the way you're supposedly doing the canceling, they're not fractions. We need a more delicate argument than that. One way to go is to give a slightly different definition of derivative. Well, here's a slightly different way of packaging up the derivative. The function f is differentiable at a point a, provided there's some number, which I"m suggestively calling f prime of a, if the derivative of f at a, so that the limit of this error function is equal to 0. And what's this error function? Well, it's measuring how far my approximation is that I'd get using the derivative from the actual functions output if I plug in an input value near a. In some ways, that's actually a nicer definition of derivative, since it really conveys that the derivative provides a way to approximate output values of functions. In any case, now, let's take this new definition of derivative and try to prove the Chain rule. Try to approximate g of f of x plus h. Try to discover the Chain rules. So, I want to be able to express this in terms of the derivative s of g and f, at least, approximately, and then control the error. Well, I can do that for f because I'm assuming that f is deferential about the point x. So, this is g of, instead of f of x plus h, f of x plus the derivative of f at x times h plus an error term, we should be calling error of f of h times h. I'm going to play the same game with g. This is g of f of x plus a small quantity. And if I assume that g is differentiable at the point f of x, then this is g of f of x plus the derivative of g at f of x times how much I wiggle by, which, in this case, is f prime of x h plus that error term plus an error term for g, which is the error term for g. And I have to put in how much I wiggled by, which, in this case, is f prime of xh plus the error term for f at h times h times that same quantity, f prime of xh plus the error term for f at h times h. Alright, so that's exactly equal to g of f of x plus h and I'm including all of the error terms. Now, we can expand out a bit. So, this is g of f of x plus, you can multiply these two terms together, g prime of f of x times f prime of xh. That's looking really good, because that's what the Chain rule is, right? It's supposed to give me this as the derivative of g composed with f. Plus, I've got a ton of error terms now. All those error terms have an h, so I'm going to collect all the h's at the end. The first error term is g prime of f of x, this term here, times the error of f at h. The next ones, plus the error of g, at this complicated quantity, I was going to abbreviate hyphen, times f prime of x times h, I'm collecting all of these h's the end, plus the error term for g, at that complicated quantity, times the error for f at h, and all of that is times h. Alright. Now, this is almost giving me the derivative of the composite function provided that I can control the size of this error term, right? What I need to show now is that the limit as h approches 0 of this error term is really 0, and the error term, right, it's the part before th e times h, and it's g prime of f of x times the error term for f at h plus the error term for g times f prime of x, plus the error term for g times the error term for f at h. Now, why do I know that, that limit is equal to 0? Well, I can do it in pieces, right? It's the limit of the sum, so it's the sum of the limits. and I know that this first term is 0 because it's got an error f h term in it, and because f is differentiable, the error term goes to 0. I likewise know the same for this, alright? This is, it also got an error f of h term in it. The most mysterious term is this. But if you think a little bit more about it, the error of g at this hyphen thing, which I'm abbreviating this whole thing here, also goes to 0 as h goes to 0. And that's another thing to know that the limit as h goes to 0 of this quantity is 0 which is then enough to say that g of f of x plus h equals this quantity, actually implies that this is the derivative. So, here's what we've actually shown. Suppose that f is differentiable at a point a and g is differentiable at the point f of a, then the composite function, g composed with f, is differentiable at a, with the derivative of g of f at the point a, equal to the derivative of g at f of a times the derivative of f at a.