1 00:00:06,010 --> 00:00:09,304 So far in this course we've avoided talking about performance or program 2 00:00:09,304 --> 00:00:13,110 efficiency. And we've done that quite deliberatly. 3 00:00:13,110 --> 00:00:17,190 The reason is that it's very easy for programmers to worry too much about 4 00:00:17,190 --> 00:00:21,338 efficiency, to soon about efficiency or to be honest, to worry just plain 5 00:00:21,338 --> 00:00:28,744 incorrectly about efficiency. As a general rule, it's a better idea to 6 00:00:28,744 --> 00:00:32,774 design a simple program that's easy to understand, and easy to change and then 7 00:00:32,774 --> 00:00:38,050 worry about the efficiency later, once the program's running. 8 00:00:39,100 --> 00:00:43,254 What you can do then is you can run the program against a sample data set, and 9 00:00:43,254 --> 00:00:49,380 measure its performance, and see where the real performance problems are. 10 00:00:49,380 --> 00:00:51,370 What you'll find is you often get surprised. 11 00:00:52,470 --> 00:00:55,200 Some of the things you thought would be a performance problem get taken care of 12 00:00:55,200 --> 00:00:58,270 automatically by the programming language implementation. 13 00:00:59,590 --> 00:01:02,242 Other things you might have thought would be a performance problem turn out not to 14 00:01:02,242 --> 00:01:05,618 be. And then unfortunately there's some other 15 00:01:05,618 --> 00:01:08,413 things that you didn't think would be performance problems that turn out to be 16 00:01:08,413 --> 00:01:12,883 performance problems. That said, there's one category of 17 00:01:12,883 --> 00:01:16,940 performance problem you really need to fix as part of the design. 18 00:01:18,220 --> 00:01:21,294 That has to do with problems of exponential growth in the time it takes 19 00:01:21,294 --> 00:01:24,485 your program to run as the data gets larger. 20 00:01:24,485 --> 00:01:27,015 Alright, exponential growth in performance is not going to be a good 21 00:01:27,015 --> 00:01:32,156 property for your program to have. So what we're going to do in this video 22 00:01:32,156 --> 00:01:35,450 is look at how you can use Local to deal with an important category of these 23 00:01:35,450 --> 00:01:40,268 problems. I'm back in a version of the file system 24 00:01:40,268 --> 00:01:45,514 program, FSV6. And in this version, the definitions are 25 00:01:45,514 --> 00:01:49,580 the same. I've got this function made, skinny. 26 00:01:49,580 --> 00:01:53,155 And what made skinny does is make trees that are very skinny, they're not bushy 27 00:01:53,155 --> 00:01:56,564 at all. Each element just has one child until it 28 00:01:56,564 --> 00:02:00,431 gets down to the leaf. And if I say it makes skinny of two, for 29 00:02:00,431 --> 00:02:04,490 example, I get a tree that looks like this. 30 00:02:04,490 --> 00:02:06,360 There's an element that's called X at the top. 31 00:02:06,360 --> 00:02:11,160 It has one child called X and it has one child called Y, which is the leaf. 32 00:02:12,380 --> 00:02:16,090 So it's X, X, X, X and then Y at the bottom. 33 00:02:16,090 --> 00:02:18,930 So makes skinny of ten is ten X's and a Y. 34 00:02:18,930 --> 00:02:23,410 Make skinny of 100 is 100 x's and a y. That's all make skinny does. 35 00:02:23,410 --> 00:02:29,326 And then find is the function we designed before that just tries to find an element 36 00:02:29,326 --> 00:02:35,473 with a given name. So what I'm going to do now Is I'm going 37 00:02:35,473 --> 00:02:42,160 to call find of y on some trees of different depths. 38 00:02:42,160 --> 00:02:47,040 And I'm using a new BSL primitive here. It's a very special thing. 39 00:02:47,040 --> 00:02:50,822 It's more like an if than a function, because it evaluates its operands in a 40 00:02:50,822 --> 00:02:54,855 special way. But what time does is it evaluates its 41 00:02:54,855 --> 00:03:00,260 operands, it sees how long it takes and gives us back the time. 42 00:03:01,350 --> 00:03:04,831 So, what you can see here I'm timing how long it takes to find y in a tree 10 43 00:03:04,831 --> 00:03:10,085 deep. A tree 11 deep, 12, 13, 14 and so on. 44 00:03:10,085 --> 00:03:13,520 Okay? So, here we go. 45 00:03:13,520 --> 00:03:21,150 If I run that then the times I get Let me scroll this top so you can see. 46 00:03:21,150 --> 00:03:25,796 The time it takes in a tree that's 10 so y is the 11th. 47 00:03:25,796 --> 00:03:28,465 E is 2 milliseconds. This is in milliseconds. 48 00:03:28,465 --> 00:03:34,830 in the next one, it's 3 milliseconds, and the next one it's 7, then it's 20, 35, 49 00:03:34,830 --> 00:03:41,360 57. So now what I'm going to do is plot that. 50 00:03:41,360 --> 00:03:46,780 So let's look at this plot. A bad thing is happening here, isn't it? 51 00:03:48,190 --> 00:03:54,604 Each time, we make skinny one deeper. What's happening to the time it takes for 52 00:03:54,604 --> 00:03:59,180 this to run? It's increasing by what? 53 00:04:01,000 --> 00:04:03,960 It's increasing by pretty close to a factor of 2. 54 00:04:03,960 --> 00:04:17,215 Let me just add a couple more to see. We'll make that 16 and 17. 55 00:04:17,215 --> 00:04:22,880 If we extend the plot to, the numbers changed a little bit. 56 00:04:22,880 --> 00:04:24,344 'Kay? The numbers changed a little bit that's 57 00:04:24,344 --> 00:04:26,720 because it doesn't always compute the exact same time. 58 00:04:26,720 --> 00:04:30,680 It turns out that measuring how long a program takes to run is a very 59 00:04:30,680 --> 00:04:34,484 complicated thing. But these numbers are, are a good 60 00:04:34,484 --> 00:04:40,596 indication of how long it takes to run. So with this now extended plot- scroll it 61 00:04:40,596 --> 00:04:45,720 a bit so we can see all the data- it's really clear that it seems to be taking 62 00:04:45,720 --> 00:04:53,120 twice about as long to run, when we make the tree one deeper. 63 00:04:53,120 --> 00:05:00,290 Now that's a serious performance problem. The tree gets deeper by one. 64 00:05:00,290 --> 00:05:03,760 Find takes twice as long. That's an exponential performance 65 00:05:03,760 --> 00:05:04,465 problem. Okay? 66 00:05:04,465 --> 00:05:09,221 because, as the tree gets deeper and deeper, its basically 2 to the depth of 67 00:05:09,221 --> 00:05:13,620 the tree. That's bad. 68 00:05:13,620 --> 00:05:18,776 So why is that happening? Well here's why its happening right here 69 00:05:18,776 --> 00:05:23,470 in find. Look at what Find does here. 70 00:05:23,470 --> 00:05:27,820 It says, Well, go look for n in the first child. 71 00:05:27,820 --> 00:05:31,370 And if that doesn't produce False, then that's the value you want. 72 00:05:31,370 --> 00:05:35,970 So, produce that value. And from the perspective of producing the 73 00:05:35,970 --> 00:05:41,549 right value, this program is perfectly correct and good and wonderful. 74 00:05:43,920 --> 00:05:47,682 But the problem it has is that in a branch where it finds the value it's 75 00:05:47,682 --> 00:05:51,810 looking for, it searches in that branch twice. 76 00:05:51,810 --> 00:05:57,984 So in this particular degenerate case of the tree, where the tree is one long 77 00:05:57,984 --> 00:06:05,990 first branch, at the very top we search the next level down twice. 78 00:06:07,810 --> 00:06:13,371 And then for each of those, we search the next level down twice, because each time 79 00:06:13,371 --> 00:06:18,185 we first do this to see if it's there, and when we find out it is there, we 80 00:06:18,185 --> 00:06:27,990 search it again to produce the result. So We search this level twice. 81 00:06:27,990 --> 00:06:32,476 We search this level four times. Two times, two, each time it's two times 82 00:06:32,476 --> 00:06:36,430 more. And there's the exponential growth. 83 00:06:36,430 --> 00:06:39,080 So right there, there it is. There's the exponential growth. 84 00:06:39,080 --> 00:06:43,266 We can see why, it's a factor of two each time. 85 00:06:43,266 --> 00:06:48,544 So how do we fix that? Well it's really easy to fix that with 86 00:06:48,544 --> 00:06:53,545 local. Local's exactly what we want to use here. 87 00:06:53,545 --> 00:06:59,962 Basically, we're calling a function here, and then we're calling it here again, to 88 00:06:59,962 --> 00:07:06,651 recompute the same value. And what we can do is we can use Local to 89 00:07:06,651 --> 00:07:12,264 avoid re-computing the same value. Now watch here is the systematic way I do 90 00:07:12,264 --> 00:07:14,990 it. I find the enclosing expression that 91 00:07:14,990 --> 00:07:20,902 contains both computations. So here's one and here's another one, and 92 00:07:20,902 --> 00:07:25,060 I find the nearest enclosing expression that does it. 93 00:07:25,060 --> 00:07:31,908 In that case it's this if I wrap that in local, and I's give, I say define and you 94 00:07:31,908 --> 00:07:39,090 know I've think of some, some name here, right. 95 00:07:39,090 --> 00:07:45,626 I can call it try and I take this value here. 96 00:07:45,626 --> 00:07:52,820 This expression and I name its result, Try. 97 00:07:52,820 --> 00:07:56,660 So now I'm saying, hey, go do this fine and let's name it Try. 98 00:07:56,660 --> 00:08:02,676 And then each place I used to have the expression, I put, Try. 99 00:08:02,676 --> 00:08:09,190 And I need to close that Local, so that's going to be right. 100 00:08:09,190 --> 00:08:14,619 There the if, I add one more [UNKNOWN]. I do a command i to fix the indentation 101 00:08:14,619 --> 00:08:19,675 and now what this says is it says hey go look in the first child and lets call the 102 00:08:19,675 --> 00:08:24,652 result we get from that try and now I can look at the value try as many times as I 103 00:08:24,652 --> 00:08:31,150 want. In this case potentially twice. 104 00:08:31,150 --> 00:08:35,600 Without recomputing the value. I computed it only once here. 105 00:08:35,600 --> 00:08:39,410 I call it try. And then, each time I use try, I don't 106 00:08:39,410 --> 00:08:43,300 recompute it. So if we try this version of the program. 107 00:08:43,300 --> 00:08:48,000 Ha, sorry, pardon the pun. If we run this version of the program, 108 00:08:48,000 --> 00:08:52,070 and look at the performance numbers we get, look. 109 00:08:52,070 --> 00:08:57,320 No exponential growth. So this is a case where we use local to 110 00:08:57,320 --> 00:09:00,950 avoid recomputation. Now, I want to be really clear here. 111 00:09:02,150 --> 00:09:06,764 The reason this recomputation matters is because it's recursive. 112 00:09:06,764 --> 00:09:12,438 This recomputation leads to exponential growth because of the recursion. 113 00:09:13,490 --> 00:09:15,610 Here's a thing I would encourage you not to do. 114 00:09:15,610 --> 00:09:20,730 If you, for example, are going to take a, an argument that you consume and add one 115 00:09:20,730 --> 00:09:26,090 to it and use that twice, you know, using Local to avoid the computation of adding 116 00:09:26,090 --> 00:09:32,366 one twice isn't worth doing. That's going to make your program harder 117 00:09:32,366 --> 00:09:35,681 to read. For little and perhaps to be honest with 118 00:09:35,681 --> 00:09:39,599 you no performance game at all. Kay, because program in language 119 00:09:39,599 --> 00:09:44,330 implementations are good at spoting things like that and doing them for you. 120 00:09:45,950 --> 00:09:50,430 But here because we have a recursive function if the we do the re-computation 121 00:09:50,430 --> 00:09:55,120 at each level we get the expedential growth, exponential Performance problems 122 00:09:55,120 --> 00:09:59,549 are definitely bad. No one's ever going to say that's not a 123 00:09:59,549 --> 00:10:03,680 problem. So here it's worth using local to avoid 124 00:10:03,680 --> 00:10:07,440 the re-computation. So there you go, that's our second 125 00:10:07,440 --> 00:10:11,578 important use of local. We use it to avoid recomputing values. 126 00:10:11,578 --> 00:10:16,198 Especially in recursive functions because in those recursive functions that 127 00:10:16,198 --> 00:10:20,888 recomputation could lead to exponential performance problems and the general 128 00:10:20,888 --> 00:10:25,088 refactoring is you find the nearest expression that encloses all of the 129 00:10:25,088 --> 00:10:31,852 recomputed values. You wrap a local right around that 130 00:10:31,852 --> 00:10:37,150 expression, you give the value some name, like try or something else. 131 00:10:37,150 --> 00:10:43,846 And then you use that name in place of each of the original expressions that 132 00:10:43,846 --> 00:10:47,828 recomputed the value.