1 00:00:00,012 --> 00:00:05,580 [MUSIC] Many of the functions that we'd most like to differentiate are actually 2 00:00:05,592 --> 00:00:11,599 compositions of two different functions. This happens in the real world, too. I 3 00:00:11,712 --> 00:00:17,757 mean, look, if you change the number of flowers, that's going to affect say how 4 00:00:17,769 --> 00:00:24,277 many rabbits there are around to you know, eat those flowers. And if you change them 5 00:00:24,289 --> 00:00:28,851 with rabbits, that'll affect how many wolves that forest can support. There's 6 00:00:28,863 --> 00:00:33,578 some really concrete examples of this. Here's a concrete example. Suppose that f 7 00:00:33,590 --> 00:00:38,010 of x is the number of widgets produced with an investment of x dollars, right? 8 00:00:38,111 --> 00:00:42,938 With, with more money, maybe, you can build more widgets. Suppose g of n is the 9 00:00:42,950 --> 00:00:47,572 income that you get by selling those n widgets. What you're probably really 10 00:00:47,584 --> 00:00:52,299 interested in is not exactly how many widgets you produce. What you'd like to 11 00:00:52,311 --> 00:00:57,194 know is, for a given investment, how much money are you going to make, right? Well, 12 00:00:57,299 --> 00:01:02,219 that's g of f of x minus your initial investment, right, g of f of x is how much 13 00:01:02,231 --> 00:01:06,727 money comes in when you sell the widgets that you produced with your initial 14 00:01:06,739 --> 00:01:11,260 investment of x dollars, right? This quantity is measuring the profit on an 15 00:01:11,272 --> 00:01:15,882 investment of x dollars in widget production. We need some framework, some 16 00:01:15,894 --> 00:01:20,942 general picture that let's us understand how one thing changing affects something 17 00:01:20,954 --> 00:01:26,238 else and how that thing's changing goes on to affect something else. Specifically, if 18 00:01:26,250 --> 00:01:30,700 I've got some function h which is a composition of two functions, g of f of x 19 00:01:30,712 --> 00:01:35,650 in this case, I'd like to know something about the derivative of h. I want to know 20 00:01:35,662 --> 00:01:40,370 how changing x affects f and then how changing f goes on to affect g. And I'd 21 00:01:40,382 --> 00:01:45,370 like some sort of formula that gives me that answer, right? I'd like to know the 22 00:01:45,382 --> 00:01:50,365 derivative of h in terms of information about how x is changing affects f and how 23 00:01:50,377 --> 00:01:55,805 changing the input to g affects g. I want a formula for the derivative of h in terms 24 00:01:55,817 --> 00:02:02,715 of the derivatives of f and the derivative of g. This is exactly what the chain rule 25 00:02:02,727 --> 00:02:09,410 does. What the chain rule says, is that the derivative of the composition is the 26 00:02:09,422 --> 00:02:14,961 derivative of g evaluated at f of x times the derivative of f evaluated at x. 27 00:02:15,065 --> 00:02:19,917 Sometimes, people have the idea that the chain rule looks somehow, that you'd 28 00:02:19,929 --> 00:02:24,756 really expect the formula to look very different. I mean sometimes people think 29 00:02:24,768 --> 00:02:29,737 this formula looks a little bit weird, you know? I'm composing functions, but now 30 00:02:29,749 --> 00:02:34,275 it's the derivative of g composes just a function f. What's going on? You might 31 00:02:34,287 --> 00:02:37,924 think that given the fact that the derivative of a sum is the sum of the 32 00:02:37,936 --> 00:02:42,093 derivatives. You might be tempted to think that the derivative of a composition 33 00:02:42,105 --> 00:02:46,638 should be the composition of derivatives, but that's not the case. But the chain 34 00:02:46,650 --> 00:02:52,269 rule really is capturing what happens when you chain together these changes. So let's 35 00:02:52,281 --> 00:02:56,905 think about this chain rule, the derivative of g of f of x is g prime f of 36 00:02:56,917 --> 00:03:02,392 x times f prime of x in terms of chaining together different changes. I'm trying to 37 00:03:02,404 --> 00:03:07,758 calculate is how changing x changes g of f of x right? This is the derivative of the 38 00:03:07,770 --> 00:03:13,055 composition. What do I know? Well, I know how changing x will change f of x, right? 39 00:03:13,162 --> 00:03:18,575 This is what the derivative of f is, is, is measuring, right? The derivative is the 40 00:03:18,587 --> 00:03:23,763 ratio of output change to input change. Now, in between here, what I have is the 41 00:03:23,775 --> 00:03:28,831 change in f of x will change g of f of x in some way. This ratio of changes is 42 00:03:28,843 --> 00:03:34,460 really the derivative of g at the point of f of x. What is the derivative? You plug 43 00:03:34,460 --> 00:03:39,691 in an input to the derivative to ask how wiggling that input would effect the 44 00:03:39,703 --> 00:03:45,936 output and that's exactly what this ratio is. I'm asking how will f of x is changing 45 00:03:45,948 --> 00:03:51,200 affect g of f of x, right? That's the derivative of g at the point that's 46 00:03:51,212 --> 00:03:54,963 wiggling, f of x. Well, if you think about it, now, if I 47 00:03:54,975 --> 00:03:59,805 just multiply these two things together, then I get the change in g of f of x 48 00:03:59,817 --> 00:04:05,104 divided by the change in x. This is the chain rule, right? If I multiply together 49 00:04:05,116 --> 00:04:09,974 g prime f of x and f prime of x, what I'm left with is exactly what I want, the 50 00:04:09,986 --> 00:04:15,068 derivative of g of f of x. You can see this pictorially as well. So here, I've 51 00:04:15,080 --> 00:04:19,945 drawn three number lines. On the first number line, I've drawn x and I imagine x 52 00:04:19,957 --> 00:04:24,780 is the input to f. And on the second number line, I've drawn f of x and f of x 53 00:04:24,792 --> 00:04:29,615 is now the input to g. And on the last number line, I've drawn g of f of x. The 54 00:04:29,627 --> 00:04:34,651 essential question answered by the derivative is how changing x will affect g 55 00:04:34,663 --> 00:04:39,632 of f of x? But since this is a composition of functions, I'm going to analyze the 56 00:04:39,644 --> 00:04:44,189 effect of changing x and g of f of x in stages, right? I'm first going to see how 57 00:04:44,201 --> 00:04:48,835 this changing x affect f of x and how f of x is changing affect g of f of x. So let's 58 00:04:48,847 --> 00:04:53,242 imagine that I change x by a small quantity. I'm calling that small quantity 59 00:04:53,254 --> 00:04:57,902 h here, h is not a function, just some small number, the amount by which I'm 60 00:04:57,914 --> 00:05:03,266 wiggling the input. Now, how is the output affected? Well, that's exactly what the 61 00:05:03,278 --> 00:05:08,544 derivative measures. Right? The derivative of f at x tells me how wiggling the input 62 00:05:08,556 --> 00:05:13,464 x would affect the output. So f prime of x, which is the ratio of output change to 63 00:05:13,476 --> 00:05:18,624 input change times an actual input change gives me a first order approximation of 64 00:05:18,636 --> 00:05:23,655 the output change. So I imagine the output is changing by about f prime of x times h. 65 00:05:23,822 --> 00:05:29,469 Now, how does that change in value of f of x affect g? Well, I have to figure out how 66 00:05:29,481 --> 00:05:35,159 wiggling the input to g will affect the output of g and that depends on where I'm 67 00:05:35,171 --> 00:05:41,554 calculating the derivative. I need to calculate the derivative of g at the point 68 00:05:41,566 --> 00:05:46,459 f of x, because, f of x is the point that's doing the wiggling. So, it's the 69 00:05:46,471 --> 00:05:51,581 derivative of g at the point f of x that tells me how wiggling the input around f 70 00:05:51,593 --> 00:05:57,068 of x would affect the output to g. So it's that derivative times the amount by which 71 00:05:57,080 --> 00:06:02,386 the input changed, which is this quantity here, f prime of x times h. And when you 72 00:06:02,398 --> 00:06:07,631 look at it this way, you can see that for an input change to x of some small amount 73 00:06:07,643 --> 00:06:12,953 h, the output changes by about g prime f of x times f prime of x as much, which is 74 00:06:12,965 --> 00:06:17,794 exactly what the chain rule is telling me should be the case. Since this is the 75 00:06:17,806 --> 00:06:22,585 correct rule, that the chain rule really is the derivative of the outside at the 76 00:06:22,597 --> 00:06:27,420 inside times the derivative fu nction. Let's try to see a numerical example of 77 00:06:27,432 --> 00:06:35,074 this thing in action. So as a numerical example let's consider the function g of x 78 00:06:35,086 --> 00:06:42,767 equals x to the 4th power and the function f of x equals 1 plus x to the 3rd power. 79 00:06:42,916 --> 00:06:49,098 Andm maybe what I want to try to estimate is g of f of 1.0001, and now, 80 00:06:49,223 --> 00:06:54,966 approximately what is that equal to? Well, it's not too hard to calculate g of, of 1, 81 00:06:54,966 --> 00:07:01,211 right? What's f of 1? Well, that's 1 plus 1 cubed, well, that's 2. So what's g of 2> 82 00:07:01,701 --> 00:07:06,699 Well, that's 2 to the 4th, well, that's 16. So I know that g of f of 1.0001 is 83 00:07:06,711 --> 00:07:11,365 going to be close to 16. The question is, how is wiggling the input up to 1.0001 84 00:07:11,377 --> 00:07:15,956 going to affect the output of this composition of functions? Well, I could do 85 00:07:15,968 --> 00:07:20,715 it in stages, right? That's what the chain rule's telling me to do. So I could 86 00:07:20,727 --> 00:07:27,529 calculate first the derivative of f at 1. Right? And the derivative of f is 1 plus 87 00:07:27,541 --> 00:07:35,939 3x squared, so the derivative of f at 1 is 3. And indeed, if I calculate f of 1.0001, 88 00:07:36,107 --> 00:07:42,918 that's about 2.0003 and a bit more. Now, I want to try to calculate how changing the 89 00:07:42,930 --> 00:07:48,179 input to g will affect the output of g. So I should calculate the derivative of g and 90 00:07:48,191 --> 00:07:53,133 that's 4x cubed by the power rule, but where should I evaluate the derivative of 91 00:07:53,145 --> 00:07:58,162 g? Your first temptation is to calculate the derivative of g at 1, but that is not 92 00:07:58,174 --> 00:08:02,959 a good idea, because you're not wiggling the input 1 to g. What you're really 93 00:08:02,971 --> 00:08:08,274 should be calculating is the derivative of g at 2, because it's this 2 that's going 94 00:08:08,274 --> 00:08:14,199 to be wiggling. When you wiggle the input to f, it's the output to f, f of 1, that's 95 00:08:14,211 --> 00:08:19,332 going to be changing, so you should calculate the derivative of g there and 96 00:08:19,344 --> 00:08:24,016 what is that? That's 4 times 2 cubed, that's 4 times 8, that's 32. 97 00:08:24,022 --> 00:08:33,600 So what we're trying to calculate is g of f of 1.0001 and we know that that's about 98 00:08:33,612 --> 00:08:42,110 g of, well, what's f of 1.0001? It's about 2.0003. So what happens when I wiggle the 99 00:08:42,122 --> 00:08:48,267 input of g from 2 to 2.0003? Well, that should be about the output of g at 2 which 100 00:08:48,279 --> 00:08:54,374 is 16 plus how much I change the input by, times the derivative of g at the point 101 00:08:54,386 --> 00:09:03,425 where the wiggli ng is happening, which is 2 and that's 32. And what's 16 plus 0.0003 102 00:09:03,437 --> 00:09:16,184 times 32, that's 16.0096. So g of f of 1.0001 is about 16.0096. And you can see 103 00:09:16,196 --> 00:09:23,468 this 96 just from the chain rule, right? The relevant thing to calculate is g prime 104 00:09:23,480 --> 00:09:30,292 of f of 1 times f prime of 1, right? This is going to tell me how wiggling the input 105 00:09:30,304 --> 00:09:36,886 1 affects the output and g prime of f of 1 is 32, f prime of 1 is 3, and 32 times 3 106 00:09:36,898 --> 00:09:42,301 is 96. So, that's the chain rule and it's going to take some time for the chain rule 107 00:09:42,313 --> 00:09:47,616 to really sink in. But the chain rule is super important for two very different 108 00:09:47,628 --> 00:09:52,145 reasons. On the one hand, you've ta know the chain rule just to be able to compute 109 00:09:52,157 --> 00:09:56,070 derivatives. A lot of the functions that you'll be asked to differentiate are 110 00:09:56,082 --> 00:10:00,300 actually compositions of differentiable functions, so you'll need to use the chain 111 00:10:00,312 --> 00:10:03,735 rule to finish those derivative calculations. But on the other hand, 112 00:10:03,827 --> 00:10:08,133 you've gotta know the chain rule just to understand how chained together changes 113 00:10:08,145 --> 00:10:12,536 work. In the real world, a lot of things change, and those changing things affect 114 00:10:12,548 --> 00:10:16,973 other things, and those changing things, then go on to affect yet other things. And 115 00:10:16,985 --> 00:10:21,242 you've got, got understand how those changes get composed together, in order to 116 00:10:21,254 --> 00:10:23,541 really understand how the real world works.