, I often want to differentiate an inverse function. Say, I've got a function f. The derivative of f encodes how wiggling the input affects the output. The derivative of the inverse function would encode how changes to the output affect the input. Here's a theorem that I can use to handle this situation. Here is the inverse function theorem. I'm going to suppose that f is some differentiable function, f prime is continuous, the derivative is continuous. And the derivative, at some point, a, is nonzero. In that case, I get the following fantastic conclusion. Then the inverse function at y is defined for values of y near f of a. So, the function f is invertable near a. The inverse function is differentiable for inputs near f of a. And that derivative is continuous in your inputs near f of a. And I've even got a formula for the derivative. The derivative of the inverse function at y is 1 over the original derivative, the derivative of the original function, evaluated at the inverse function of y. How can I justify a result like that? Why should something like that be true? One 1 way to think about this is geometrically. Here, I've drawn the graph with just some made up function, y equals f of x. What's the graph of the inverse function look like? Well, one way to think about this is that the inverse function exchanges the roles of the x and y axes, which is the same as just flipping it over, alright? What was the y-axis now, the x-axis, what, was the x-axis is now the y-axis? And this graph here is y equals f inverse of x. This is how you graph the inverse function. Alright. So, let's go back to the original function and if I put down a tangent line to the curve at some point, let's say that tangent line has slope m. Well, what's the tangent line of the inverse function? That would be the derivative of the inverse function. Well, if I flip over the graph again to look at the graph of the inverse function, I can put down a tangent line to the to the inverse function. And that has slo pe 1 over m. If m was the original slope for the tangent line to the original function, 1 over m is the new slope to the tangent line of the inverse function. Why 1 over m? Well, that makes sense because I got this graph by exchanging the roles of the x and y-axis, by flipping the paper over. And that exchange is rise for run, and run for rise. So, the slope becomes the reciprocal of the old slope. This slope business is reflected in the notation, dy dx. Som let's suppose that y is f of x, so x is f inverse of y, supposing that this is an invariable function. If y is f of x, then f prime of x could be written dy dx. And if f is inverse of y, then the derivative of the inverse function at y, well, that's asking how's changing y change x could write that as dx over dy. Well, if you really take this notation seriously, what it looks like it's saying, is that, dx dy, which is the derivative of the inverse function, should be 1 over dy dx, right? The derivative of the inverse function is 1 over the derivative of the original function. But you have to think about where these derivatives are being computed. So, maybe you believe that dx dy is 1 over dy dx, it makes sense that if you exchange the roles of x and y, that takes the reciprocal of the slope of the line. But where is this wiggling happening, right? dy dx is measuring how wiggling x affects y. Wiggling around where? Well, let's suppose that I'm wiggling around a. So, I'm really calculating dy dx when x, say, is at a. This is the quantity that records how wiggling x near a. will affect y. Well then, where's y wiggling? Well, if x is wiggling around a, y is wiggling around f of a. So, the derivative on this side is really being calculated at y equals f of a. And it's really necessary to keep track of where this wiggling is happening in order to get a valid formula. It's actually easier to think about what's going on if we just phrase all of these in terms of the Chain rule. So, what do I know about the inverse function? Well, here's f inve rse. F of f inverse of x is just x. Alright, what is the inverse function do? Whatever you plug into the inverse function, it outputs whatever you need to plug into f to get out the thing you plugged into the inverse function. Alright. So, this is true. Now, if I differentiate both sides, assuming that f and f inverse are differentiable, then by the Chain rule, what do I get? Well, the derivative of this composition is the derivative of the outside at the inside times the derivative of the inside. And that's equal to the derivative of the other side, which is the derivative of x is just 1. Now, I'll divide both sides by f prime f inverse of x and I get that the derivative of the inverse function of x is 1 over f prime of f inverse of x. Is that a proof? Absolutely not. The embarrassing truth is that this argument assumes the differentiability of the inverse function. If this function, f inverse, is differentiable, then the Chain rule can be applied to it. The Chain rule requires that the functions be differentiable. Now, if the function is differentiable, then this Chain rule calculation tells me that the derivative inverse function is this quantity. But that's all predicated on knowing that the inverse function is differentiable. How do we know that? Well, that's actually the content of this theorem, right? The content of the inverse function theorem is not really the calculation of the derivative of the inverse function. It's really just the fact that the inverse function is differentiable at all. That is a huge deal, and it's not something that we can just get from the Chain rule. Once we know that the inverse function is differentiable, then the Chain rule gives us this calculation. But actually verifying if the inverse function is differentiable is really quite deep, that's why the inverse function theorem is such a big deal. The Chain rule requires that the functions I'm applying the change rule to be differentiable. In contrast, the inverse function theorem is asserting the differenti ability of the inverse function. It's really saying much more, than just a computation of the derivative if the derivative exists. It's actually telling me that the derivative exists. I'm going to have to punt on saying much more about the proof of the inverse function theorem. But nevertheless, we can now apply the inverse function theorem to some concrete examples. For example think about the function, f of x equals x squared. Well, what's the inverse function to this? Let's suppose the domain is just the nonnegative real numbers. Then, the functions invertible on the domain, and we know the name of the inverse is the square root of x. What's the derivative of the original function? Well, we know that it's 2x, and the derivative is continuous and the derivative is not 0 provided that x is a positive. This is all the stuff that we need to apply the inverse function theorem. Then, we know that the derivative of the inverse function at x is 1 over the original derivative at the inverse of x. Now, the inverse fuction is the square root of x, so that's 1 over f prime of the square root of x, and what's f prime? f prime is the function that doubles its input. So, that's 1 over 2 square roots of x. So, the derivative of the inverse function, the derivative of the square root function is 1 over 2 square roots of x, provided x is bigger than 0, right? Just like before, this is a calculation of the derivative of the square root function. We can also see this numerically. So, the square root of 10,000 is 100, and you might ask what do you have to take the square root of, to get at about 100.1? Say, some numeric example. Well, think now about the functions that are involved here. There's the squaring function and the square root function. we saw the derivative of the square root function is 1 over 2 square root x and the derivative of x squared, we already know, is 2x. Where are we evaluating these functions? Well, I'm evaluating the square root function at 10,000, right? This is at x equals 10,000 . And if I evaluate that at 10,000, that's 1 over 2 times the square root of 10,000, that's 1 over 200. Where am I evaluating the other function, the x squared function? Well there, I'm really thinking of 100 as the input, so I'll evaluate that derivative at 100 and 2x, when x is a 100 is 200. And it's not too surprising, right, that 1 over 200 and 200 are reciprocals of each other, because I'm calculating derivatives of a function and the inverse function at the appropriate places. Now, let's try to answer the original question. I'm trying to figure out, what do I have to take the square root of to get about 100.1? Well, the ratio here is about 200 between the input and the output. So, if I want the output to be affected by 0.1, I should try to change the input by about 200 times as much, and 200 times 0.1 is 20, so I should try to change the input by about 20 and sure enough, if you take the square root of 10,020, that's awfully close to a 100.1. I hope that you'll play around with these numbers. All the conceptual stuff that we're doing, these theorems, I'm not telling you these theorems to make numbers boring, right? I'm telling you all these theorems to heighten your appreciation of the numerical examples.