, I often want to differentiate an inverse
function. Say, I've got a function f. The
derivative of f encodes how wiggling the
input affects the output. The derivative
of the inverse function would encode how
changes to the output affect the input.
Here's a theorem that I can use to handle
this situation. Here is the inverse
function theorem. I'm going to suppose
that f is some differentiable function, f
prime is continuous, the derivative is
continuous. And the derivative, at some
point, a, is nonzero. In that case, I get
the following fantastic conclusion. Then
the inverse function at y is defined for
values of y near f of a. So, the function
f is invertable near a. The inverse
function is differentiable for inputs near
f of a. And that derivative is continuous
in your inputs near f of a. And I've even
got a formula for the derivative. The
derivative of the inverse function at y is
1 over the original derivative, the
derivative of the original function,
evaluated at the inverse function of y.
How can I justify a result like that? Why
should something like that be true? One 1
way to think about this is geometrically.
Here, I've drawn the graph with just some
made up function, y equals f of x. What's
the graph of the inverse function look
like? Well, one way to think about this is
that the inverse function exchanges the
roles of the x and y axes, which is the
same as just flipping it over, alright?
What was the y-axis now, the x-axis, what,
was the x-axis is now the y-axis? And this
graph here is y equals f inverse of x.
This is how you graph the inverse
function. Alright.
So, let's go back to the original function
and if I put down a tangent line to the
curve at some point, let's say that
tangent line has slope m. Well, what's the
tangent line of the inverse function? That
would be the derivative of the inverse
function. Well, if I flip over the graph
again to look at the graph of the inverse
function, I can put down a tangent line to
the to the inverse function. And that has
slo pe 1 over m. If m was the original
slope for the tangent line to the original
function, 1 over m is the new slope to the
tangent line of the inverse function. Why
1 over m? Well, that makes sense because I
got this graph by exchanging the roles of
the x and y-axis, by flipping the paper
over. And that exchange is rise for run,
and run for rise. So, the slope becomes
the reciprocal of the old slope. This
slope business is reflected in the
notation, dy dx. Som let's suppose that y
is f of x, so x is f inverse of y,
supposing that this is an invariable
function. If y is f of x, then f prime of
x could be written dy dx. And if f is
inverse of y, then the derivative of the
inverse function at y, well, that's asking
how's changing y change x could write that
as dx over dy. Well, if you really take
this notation seriously, what it looks
like it's saying, is that, dx dy, which is
the derivative of the inverse function,
should be 1 over dy dx, right? The
derivative of the inverse function is 1
over the derivative of the original
function. But you have to think about
where these derivatives are being
computed. So, maybe you believe that dx dy
is 1 over dy dx, it makes sense that if
you exchange the roles of x and y, that
takes the reciprocal of the slope of the
line. But where is this wiggling
happening, right? dy dx is measuring how
wiggling x affects y. Wiggling around
where? Well, let's suppose that I'm
wiggling around a. So, I'm really
calculating dy dx when x, say, is at a.
This is the quantity that records how
wiggling x near a. will affect y. Well
then, where's y wiggling? Well, if x is
wiggling around a, y is wiggling around f
of a. So, the derivative on this side is
really being calculated at y equals f of
a. And it's really necessary to keep track
of where this wiggling is happening in
order to get a valid formula. It's
actually easier to think about what's
going on if we just phrase all of these in
terms of the Chain rule. So, what do I
know about the inverse function? Well,
here's f inve rse.
F of f inverse of x is just x. Alright,
what is the inverse function do? Whatever
you plug into the inverse function, it
outputs whatever you need to plug into f
to get out the thing you plugged into the
inverse function. Alright. So, this is
true. Now, if I differentiate both sides,
assuming that f and f inverse are
differentiable, then by the Chain rule,
what do I get? Well, the derivative of
this composition is the derivative of the
outside at the inside times the derivative
of the inside. And that's equal to the
derivative of the other side, which is the
derivative of x is just 1. Now, I'll
divide both sides by f prime f inverse of
x and I get that the derivative of the
inverse function of x is 1 over f prime of
f inverse of x. Is that a proof?
Absolutely not. The embarrassing truth is
that this argument assumes the
differentiability of the inverse function.
If this function, f inverse, is
differentiable, then the Chain rule can be
applied to it. The Chain rule requires
that the functions be differentiable. Now,
if the function is differentiable, then
this Chain rule calculation tells me that
the derivative inverse function is this
quantity. But that's all predicated on
knowing that the inverse function is
differentiable. How do we know that? Well,
that's actually the content of this
theorem, right? The content of the inverse
function theorem is not really the
calculation of the derivative of the
inverse function. It's really just the
fact that the inverse function is
differentiable at all. That is a huge
deal, and it's not something that we can
just get from the Chain rule. Once we know
that the inverse function is
differentiable, then the Chain rule gives
us this calculation. But actually
verifying if the inverse function is
differentiable is really quite deep,
that's why the inverse function theorem is
such a big deal. The Chain rule requires
that the functions I'm applying the change
rule to be differentiable. In contrast,
the inverse function theorem is asserting
the differenti ability of the inverse
function. It's really saying much more,
than just a computation of the derivative
if the derivative exists. It's actually
telling me that the derivative exists. I'm
going to have to punt on saying much more
about the proof of the inverse function
theorem. But nevertheless, we can now
apply the inverse function theorem to some
concrete examples. For example think about
the function, f of x equals x squared.
Well, what's the inverse function to this?
Let's suppose the domain is just the
nonnegative real numbers.
Then, the functions invertible on the
domain, and we know the name of the
inverse is the square root of x. What's
the derivative of the original function?
Well, we know that it's 2x, and the
derivative is continuous and the
derivative is not 0 provided that x is a
positive. This is all the stuff that we
need to apply the inverse function
theorem. Then, we know that the derivative
of the inverse function at x is 1 over the
original derivative at the inverse of x.
Now, the inverse fuction is the square
root of x, so that's 1 over f prime of the
square root of x, and what's f prime? f
prime is the function that doubles its
input. So, that's 1 over 2 square roots of
x. So, the derivative of the inverse
function, the derivative of the square
root function is 1 over 2 square roots of
x, provided x is bigger than 0, right?
Just like before, this is a calculation of
the derivative of the square root
function. We can also see this
numerically. So, the square root of 10,000
is 100, and you might ask what do you have
to take the square root of, to get at
about 100.1? Say, some numeric example.
Well, think now about the functions that
are involved here. There's the squaring
function and the square root function. we
saw the derivative of the square root
function is 1 over 2 square root x and the
derivative of x squared, we already know,
is 2x. Where are we evaluating these
functions? Well, I'm evaluating the square
root function at 10,000, right? This is at
x equals 10,000 . And if I evaluate that
at 10,000, that's 1 over 2 times the
square root of 10,000, that's 1 over 200.
Where am I evaluating the other function,
the x squared function? Well there, I'm
really thinking of 100 as the input, so
I'll evaluate that derivative at 100 and
2x, when x is a 100 is 200. And it's not
too surprising, right, that 1 over 200 and
200 are reciprocals of each other, because
I'm calculating derivatives of a function
and the inverse function at the
appropriate places. Now, let's try to
answer the original question. I'm trying
to figure out, what do I have to take the
square root of to get about 100.1? Well,
the ratio here is about 200 between the
input and the output. So, if I want the
output to be affected by 0.1, I should try
to change the input by about 200 times as
much, and 200 times 0.1 is 20, so I should
try to change the input by about 20 and
sure enough, if you take the square root
of 10,020, that's awfully close to a
100.1. I hope that you'll play around with
these numbers. All the conceptual stuff
that we're doing, these theorems, I'm not
telling you these theorems to make numbers
boring, right? I'm telling you all these
theorems to heighten your appreciation of
the numerical examples.