1
00:00:00,012 --> 00:00:07,375
, I often want to differentiate an inverse
function. Say, I've got a function f. The

2
00:00:07,387 --> 00:00:14,165
derivative of f encodes how wiggling the
input affects the output. The derivative

3
00:00:14,177 --> 00:00:20,587
of the inverse function would encode how
changes to the output affect the input.

4
00:00:20,832 --> 00:00:25,504
Here's a theorem that I can use to handle
this situation. Here is the inverse

5
00:00:25,516 --> 00:00:30,261
function theorem. I'm going to suppose
that f is some differentiable function, f

6
00:00:30,354 --> 00:00:35,273
prime is continuous, the derivative is
continuous. And the derivative, at some

7
00:00:35,285 --> 00:00:41,455
point, a, is nonzero. In that case, I get
the following fantastic conclusion. Then

8
00:00:41,467 --> 00:00:47,434
the inverse function at y is defined for
values of y near f of a. So, the function

9
00:00:47,446 --> 00:00:53,460
f is invertable near a. The inverse
function is differentiable for inputs near

10
00:00:53,472 --> 00:00:59,849
f of a. And that derivative is continuous
in your inputs near f of a. And I've even

11
00:00:59,861 --> 00:01:05,331
got a formula for the derivative. The
derivative of the inverse function at y is

12
00:01:05,343 --> 00:01:10,328
1 over the original derivative, the
derivative of the original function,

13
00:01:10,340 --> 00:01:15,508
evaluated at the inverse function of y.
How can I justify a result like that? Why

14
00:01:15,520 --> 00:01:20,173
should something like that be true? One 1
way to think about this is geometrically.

15
00:01:20,276 --> 00:01:24,947
Here, I've drawn the graph with just some
made up function, y equals f of x. What's

16
00:01:24,959 --> 00:01:29,621
the graph of the inverse function look
like? Well, one way to think about this is

17
00:01:29,633 --> 00:01:34,270
that the inverse function exchanges the
roles of the x and y axes, which is the

18
00:01:34,282 --> 00:01:39,475
same as just flipping it over, alright?
What was the y-axis now, the x-axis, what,

19
00:01:39,487 --> 00:01:44,209
was the x-axis is now the y-axis? And this
graph here is y equals f inverse of x.

20
00:01:44,315 --> 00:01:47,526
This is how you graph the inverse
function. Alright.

21
00:01:47,532 --> 00:01:52,480
So, let's go back to the original function
and if I put down a tangent line to the

22
00:01:52,843 --> 00:01:57,840
curve at some point, let's say that
tangent line has slope m. Well, what's the

23
00:01:57,852 --> 00:02:02,635
tangent line of the inverse function? That
would be the derivative of the inverse

24
00:02:02,647 --> 00:02:07,310
function. Well, if I flip over the graph
again to look at the graph of the inverse

25
00:02:07,322 --> 00:02:11,830
function, I can put down a tangent line to
the to the inverse function. And that has

26
00:02:11,830 --> 00:02:16,440
slo pe 1 over m. If m was the original
slope for the tangent line to the original

27
00:02:16,452 --> 00:02:21,718
function, 1 over m is the new slope to the
tangent line of the inverse function. Why

28
00:02:21,730 --> 00:02:26,749
1 over m? Well, that makes sense because I
got this graph by exchanging the roles of

29
00:02:26,761 --> 00:02:32,445
the x and y-axis, by flipping the paper
over. And that exchange is rise for run,

30
00:02:32,582 --> 00:02:39,195
and run for rise. So, the slope becomes
the reciprocal of the old slope. This

31
00:02:39,207 --> 00:02:45,770
slope business is reflected in the
notation, dy dx. Som let's suppose that y

32
00:02:45,782 --> 00:02:49,724
is f of x, so x is f inverse of y,
supposing that this is an invariable

33
00:02:49,736 --> 00:02:56,878
function. If y is f of x, then f prime of
x could be written dy dx. And if f is

34
00:02:56,890 --> 00:03:04,849
inverse of y, then the derivative of the
inverse function at y, well, that's asking

35
00:03:04,861 --> 00:03:12,260
how's changing y change x could write that
as dx over dy. Well, if you really take

36
00:03:12,272 --> 00:03:17,742
this notation seriously, what it looks
like it's saying, is that, dx dy, which is

37
00:03:17,754 --> 00:03:22,730
the derivative of the inverse function,
should be 1 over dy dx, right? The

38
00:03:22,742 --> 00:03:27,717
derivative of the inverse function is 1
over the derivative of the original

39
00:03:27,729 --> 00:03:32,763
function. But you have to think about
where these derivatives are being

40
00:03:32,775 --> 00:03:38,255
computed. So, maybe you believe that dx dy
is 1 over dy dx, it makes sense that if

41
00:03:38,267 --> 00:03:43,697
you exchange the roles of x and y, that
takes the reciprocal of the slope of the

42
00:03:43,709 --> 00:03:48,704
line. But where is this wiggling
happening, right? dy dx is measuring how

43
00:03:48,716 --> 00:03:53,667
wiggling x affects y. Wiggling around
where? Well, let's suppose that I'm

44
00:03:53,679 --> 00:03:58,335
wiggling around a. So, I'm really
calculating dy dx when x, say, is at a.

45
00:03:59,088 --> 00:04:04,313
This is the quantity that records how
wiggling x near a. will affect y. Well

46
00:04:04,325 --> 00:04:09,898
then, where's y wiggling? Well, if x is
wiggling around a, y is wiggling around f

47
00:04:09,910 --> 00:04:15,404
of a. So, the derivative on this side is
really being calculated at y equals f of

48
00:04:15,416 --> 00:04:21,097
a. And it's really necessary to keep track
of where this wiggling is happening in

49
00:04:21,109 --> 00:04:25,629
order to get a valid formula. It's
actually easier to think about what's

50
00:04:25,641 --> 00:04:29,728
going on if we just phrase all of these in
terms of the Chain rule. So, what do I

51
00:04:29,740 --> 00:04:32,869
know about the inverse function? Well,
here's f inve rse.

52
00:04:32,872 --> 00:04:37,195
F of f inverse of x is just x. Alright,
what is the inverse function do? Whatever

53
00:04:37,207 --> 00:04:41,430
you plug into the inverse function, it
outputs whatever you need to plug into f

54
00:04:41,442 --> 00:04:45,690
to get out the thing you plugged into the
inverse function. Alright. So, this is

55
00:04:45,702 --> 00:04:50,430
true. Now, if I differentiate both sides,
assuming that f and f inverse are

56
00:04:50,442 --> 00:04:55,555
differentiable, then by the Chain rule,
what do I get? Well, the derivative of

57
00:04:55,567 --> 00:05:01,230
this composition is the derivative of the
outside at the inside times the derivative

58
00:05:01,242 --> 00:05:07,898
of the inside. And that's equal to the
derivative of the other side, which is the

59
00:05:07,910 --> 00:05:14,265
derivative of x is just 1. Now, I'll
divide both sides by f prime f inverse of

60
00:05:14,277 --> 00:05:20,851
x and I get that the derivative of the
inverse function of x is 1 over f prime of

61
00:05:20,863 --> 00:05:26,582
f inverse of x. Is that a proof?
Absolutely not. The embarrassing truth is

62
00:05:26,594 --> 00:05:30,675
that this argument assumes the
differentiability of the inverse function.

63
00:05:30,773 --> 00:05:34,929
If this function, f inverse, is
differentiable, then the Chain rule can be

64
00:05:34,941 --> 00:05:39,513
applied to it. The Chain rule requires
that the functions be differentiable. Now,

65
00:05:39,525 --> 00:05:44,022
if the function is differentiable, then
this Chain rule calculation tells me that

66
00:05:44,034 --> 00:05:48,605
the derivative inverse function is this
quantity. But that's all predicated on

67
00:05:48,617 --> 00:05:53,270
knowing that the inverse function is
differentiable. How do we know that? Well,

68
00:05:53,282 --> 00:05:57,935
that's actually the content of this
theorem, right? The content of the inverse

69
00:05:57,947 --> 00:06:02,135
function theorem is not really the
calculation of the derivative of the

70
00:06:02,147 --> 00:06:06,425
inverse function. It's really just the
fact that the inverse function is

71
00:06:06,437 --> 00:06:11,010
differentiable at all. That is a huge
deal, and it's not something that we can

72
00:06:11,022 --> 00:06:15,604
just get from the Chain rule. Once we know
that the inverse function is

73
00:06:15,616 --> 00:06:20,163
differentiable, then the Chain rule gives
us this calculation. But actually

74
00:06:20,175 --> 00:06:24,159
verifying if the inverse function is
differentiable is really quite deep,

75
00:06:24,257 --> 00:06:28,915
that's why the inverse function theorem is
such a big deal. The Chain rule requires

76
00:06:28,927 --> 00:06:33,680
that the functions I'm applying the change
rule to be differentiable. In contrast,

77
00:06:33,787 --> 00:06:38,740
the inverse function theorem is asserting
the differenti ability of the inverse

78
00:06:38,752 --> 00:06:43,945
function. It's really saying much more,
than just a computation of the derivative

79
00:06:43,957 --> 00:06:50,040
if the derivative exists. It's actually
telling me that the derivative exists. I'm

80
00:06:50,052 --> 00:06:54,191
going to have to punt on saying much more
about the proof of the inverse function

81
00:06:54,203 --> 00:06:58,578
theorem. But nevertheless, we can now
apply the inverse function theorem to some

82
00:06:58,590 --> 00:07:03,242
concrete examples. For example think about
the function, f of x equals x squared.

83
00:07:03,338 --> 00:07:07,620
Well, what's the inverse function to this?
Let's suppose the domain is just the

84
00:07:07,632 --> 00:07:12,292
nonnegative real numbers.
Then, the functions invertible on the

85
00:07:12,304 --> 00:07:17,482
domain, and we know the name of the
inverse is the square root of x. What's

86
00:07:17,494 --> 00:07:22,884
the derivative of the original function?
Well, we know that it's 2x, and the

87
00:07:22,896 --> 00:07:27,974
derivative is continuous and the
derivative is not 0 provided that x is a

88
00:07:27,986 --> 00:07:33,168
positive. This is all the stuff that we
need to apply the inverse function

89
00:07:33,180 --> 00:07:39,630
theorem. Then, we know that the derivative
of the inverse function at x is 1 over the

90
00:07:39,642 --> 00:07:45,595
original derivative at the inverse of x.
Now, the inverse fuction is the square

91
00:07:45,607 --> 00:07:51,588
root of x, so that's 1 over f prime of the
square root of x, and what's f prime? f

92
00:07:51,600 --> 00:07:57,275
prime is the function that doubles its
input. So, that's 1 over 2 square roots of

93
00:07:57,287 --> 00:08:02,085
x. So, the derivative of the inverse
function, the derivative of the square

94
00:08:02,097 --> 00:08:07,110
root function is 1 over 2 square roots of
x, provided x is bigger than 0, right?

95
00:08:07,217 --> 00:08:11,990
Just like before, this is a calculation of
the derivative of the square root

96
00:08:12,002 --> 00:08:17,348
function. We can also see this
numerically. So, the square root of 10,000

97
00:08:17,360 --> 00:08:22,667
is 100, and you might ask what do you have
to take the square root of, to get at

98
00:08:22,679 --> 00:08:28,360
about 100.1? Say, some numeric example.
Well, think now about the functions that

99
00:08:28,372 --> 00:08:34,473
are involved here. There's the squaring
function and the square root function. we

100
00:08:34,485 --> 00:08:39,881
saw the derivative of the square root
function is 1 over 2 square root x and the

101
00:08:40,140 --> 00:08:45,417
derivative of x squared, we already know,
is 2x. Where are we evaluating these

102
00:08:45,429 --> 00:08:51,380
functions? Well, I'm evaluating the square
root function at 10,000, right? This is at

103
00:08:51,392 --> 00:08:56,485
x equals 10,000 . And if I evaluate that
at 10,000, that's 1 over 2 times the

104
00:08:56,497 --> 00:09:02,013
square root of 10,000, that's 1 over 200.
Where am I evaluating the other function,

105
00:09:02,122 --> 00:09:07,143
the x squared function? Well there, I'm
really thinking of 100 as the input, so

106
00:09:07,155 --> 00:09:12,154
I'll evaluate that derivative at 100 and
2x, when x is a 100 is 200. And it's not

107
00:09:12,166 --> 00:09:17,619
too surprising, right, that 1 over 200 and
200 are reciprocals of each other, because

108
00:09:17,631 --> 00:09:22,525
I'm calculating derivatives of a function
and the inverse function at the

109
00:09:22,537 --> 00:09:27,925
appropriate places. Now, let's try to
answer the original question. I'm trying

110
00:09:27,937 --> 00:09:33,225
to figure out, what do I have to take the
square root of to get about 100.1? Well,

111
00:09:33,337 --> 00:09:38,425
the ratio here is about 200 between the
input and the output. So, if I want the

112
00:09:38,437 --> 00:09:44,090
output to be affected by 0.1, I should try
to change the input by about 200 times as

113
00:09:44,102 --> 00:09:49,655
much, and 200 times 0.1 is 20, so I should
try to change the input by about 20 and

114
00:09:49,772 --> 00:09:55,091
sure enough, if you take the square root
of 10,020, that's awfully close to a

115
00:09:55,091 --> 00:10:00,960
100.1. I hope that you'll play around with
these numbers. All the conceptual stuff

116
00:10:00,972 --> 00:10:06,213
that we're doing, these theorems, I'm not
telling you these theorems to make numbers

117
00:10:06,225 --> 00:10:10,718
boring, right? I'm telling you all these
theorems to heighten your appreciation of

118
00:10:10,730 --> 00:10:12,150
the numerical examples.