, The Chain rule is so important, it's
worth thinking through a proof of its
validity. You might be tempted to think
that you can get away with just
cancelling. What I mean is you might be
tempted to think that something like this
works. let's say you wanted to
differentiate g of f of x. You might set f
of x equal to y. And then, you might
think, well, what you're really trying to
calculate is the derivative of g of y and
okay. And then, you might say, well, the
derivative of g of y, that will be the
derivative of g with respect to y times
the derivative of y with respect to x, and
that's really the Chain rule. I mean, this
first thing is the derivative of g and
this other thing is the derivative of f
just like you'd expect, and then you're
trying to say that you can just cancel,
alright? You're not allowed to just
cancel. I mean, the upshot here is just
that dy/dx is not a fraction, alright? You
can't justify this equality by just
canceling because these objects, the way
you're supposedly doing the canceling,
they're not fractions. We need a more
delicate argument than that. One way to go
is to give a slightly different definition
of derivative. Well, here's a slightly
different way of packaging up the
derivative. The function f is
differentiable at a point a, provided
there's some number, which I"m
suggestively calling f prime of a, if the
derivative of f at a, so that the limit of
this error function is equal to 0. And
what's this error function? Well, it's
measuring how far my approximation is that
I'd get using the derivative from the
actual functions output if I plug in an
input value near a. In some ways, that's
actually a nicer definition of derivative,
since it really conveys that the
derivative provides a way to approximate
output values of functions. In any case,
now, let's take this new definition of
derivative and try to prove the Chain
rule. Try to approximate g of f of x plus
h. Try to discover the Chain rules. So, I
want to be able to express this in terms
of the derivative s of g and f, at least,
approximately, and then control the error.
Well, I can do that for f because I'm
assuming that f is deferential about the
point x. So, this is g of, instead of f of
x plus h, f of x plus the derivative of f
at x times h plus an error term, we should
be calling error of f of h times h. I'm
going to play the same game with g. This
is g of f of x plus a small quantity. And
if I assume that g is differentiable at
the point f of x, then this is g of f of x
plus the derivative of g at f of x times
how much I wiggle by, which, in this case,
is f prime of x h plus that error term
plus an error term for g, which is the
error term for g. And I have to put in how
much I wiggled by, which, in this case, is
f prime of xh plus the error term for f at
h times h times that same quantity, f
prime of xh plus the error term for f at h
times h. Alright, so that's exactly equal
to g of f of x plus h and I'm including
all of the error terms. Now, we can expand
out a bit. So, this is g of f of x plus,
you can multiply these two terms together,
g prime of f of x times f prime of xh.
That's looking really good, because that's
what the Chain rule is, right? It's
supposed to give me this as the derivative
of g composed with f. Plus, I've got a ton
of error terms now. All those error terms
have an h, so I'm going to collect all the
h's at the end. The first error term is g
prime of f of x, this term here, times the
error of f at h. The next ones, plus the
error of g, at this complicated quantity,
I was going to abbreviate hyphen, times f
prime of x times h, I'm collecting all of
these h's the end, plus the error term for
g, at that complicated quantity, times the
error for f at h, and all of that is times
h. Alright. Now, this is almost giving me
the derivative of the composite function
provided that I can control the size of
this error term, right? What I need to
show now is that the limit as h approches
0 of this error term is really 0, and the
error term, right, it's the part before th
e times h, and it's g prime of f of x
times the error term for f at h plus the
error term for g times f prime of x, plus
the error term for g times the error term
for f at h. Now, why do I know that, that
limit is equal to 0? Well, I can do it in
pieces, right? It's the limit of the sum,
so it's the sum of the limits. and I know
that this first term is 0 because it's got
an error f h term in it, and because f is
differentiable, the error term goes to 0.
I likewise know the same for this,
alright? This is, it also got an error f
of h term in it. The most mysterious term
is this. But if you think a little bit
more about it, the error of g at this
hyphen thing, which I'm abbreviating this
whole thing here, also goes to 0 as h goes
to 0. And that's another thing to know
that the limit as h goes to 0 of this
quantity is 0 which is then enough to say
that g of f of x plus h equals this
quantity, actually implies that this is
the derivative. So, here's what we've
actually shown. Suppose that f is
differentiable at a point a and g is
differentiable at the point f of a, then
the composite function, g composed with f,
is differentiable at a, with the
derivative of g of f at the point a, equal
to the derivative of g at f of a times the
derivative of f at a.