1 00:00:00,000 --> 00:00:03,697 So this is the first video of three in which we'll mathematically analyze the 2 00:00:03,697 --> 00:00:06,778 running time of the randomized implementation of quicksort. So in 3 00:00:06,778 --> 00:00:10,618 particular we're gonna prove that the average running time of quicksort is big-O 4 00:00:10,618 --> 00:00:14,084 of n log n. Now this is the first randomized algorithm that we've seen in 5 00:00:14,084 --> 00:00:17,977 the course and therefore, in its analysis will be the first time that we're gonna 6 00:00:17,977 --> 00:00:21,871 need any kind of probability theory. So let me just explain upfront what I?d like, 7 00:00:21,871 --> 00:00:25,718 expect you to know in the following analysis. Basically I need you to know the 8 00:00:25,718 --> 00:00:29,422 first few ingredients of discrete probability theory, so I need you to know 9 00:00:29,422 --> 00:00:33,221 about sample spaces, that is how to model all of the different things that could 10 00:00:33,221 --> 00:00:36,782 happen, all of the ways that random choices could resolve themselves. I need 11 00:00:36,782 --> 00:00:40,391 you to know about random variables, functions on sample spaces which take on 12 00:00:40,391 --> 00:00:44,000 real values, I need you to know about expectations that is average values of 13 00:00:44,000 --> 00:00:48,272 random variables, and Very simple but very key property we're going to need in the 14 00:00:48,272 --> 00:00:52,073 analysis of qucksort is linearity of expectation of expectation. So, if you 15 00:00:52,073 --> 00:00:56,285 haven't seen this before or if you're too rusty, definitely you should review this 16 00:00:56,285 --> 00:01:00,342 stuff before you watch this video. Some places you can go to get that necessary 17 00:01:00,342 --> 00:01:04,502 review, you can look at the probability review part one video that's up on the 18 00:01:04,502 --> 00:01:08,508 course. If you prefer to read something, like I said at the beginning of the 19 00:01:08,508 --> 00:01:12,360 course, I recommend the free online lecture notes by Eric Lehman and Tom 20 00:01:12,360 --> 00:01:16,674 Laten. Mathematics for Computer Science. That covers everything we'll need to know 21 00:01:16,674 --> 00:01:20,734 plus much more. There's also a wiki book on dist. Probability with is a perfectly 22 00:01:20,734 --> 00:01:24,970 fine obviously free-source unless you can learn the necessary material. Okay? So 23 00:01:24,970 --> 00:01:29,277 after you've got that sorta fresh in your minds and you're ready to watch the rest 24 00:01:29,277 --> 00:01:33,083 of this video, and in particular we're ready to prove the following theorems 25 00:01:33,083 --> 00:01:36,989 stated in the previous video. So, the quick sort of algorithm with a randomized 26 00:01:36,989 --> 00:01:40,995 implementation, that is wherein every single recursive sub call you pick a pivot 27 00:01:40,995 --> 00:01:45,328 uniformly at random restated the following assertion. But for every single input so 28 00:01:45,328 --> 00:01:49,860 for a worse case input a ray of length n, the average running time of quick sorts. 29 00:01:49,860 --> 00:01:53,699 With the random pivots. Is O of N login. And again, to be clear where the 30 00:01:53,699 --> 00:01:58,103 randomness is, the randomness is not in the data. We make no assumptions about the 31 00:01:58,103 --> 00:02:02,452 data, as per guiding principles. No matter what the input array is, averaging only 32 00:02:02,452 --> 00:02:06,802 over the randomness in our own code. The randomness internal to the algorithm, we 33 00:02:06,802 --> 00:02:11,151 get a running time of N log N. We saw in the past that the best case behavior of 34 00:02:11,151 --> 00:02:15,392 Quick Sort is N log N. The worst case behavior is N squared. So this theorem is 35 00:02:15,392 --> 00:02:19,687 asserting that, no matter what the input array is, the typical behavior of Quick 36 00:02:19,687 --> 00:02:23,710 Sort is far closer to the best case behavior than it is to the worst case 37 00:02:23,710 --> 00:02:27,204 behavior. Okay so that's what we are going to prove, in the next few videos. So let's 38 00:02:27,204 --> 00:02:32,494 go ahead and get started. So first I'm going to set up the necessary notation and 39 00:02:32,494 --> 00:02:36,626 be clear about exactly what is the sample space what is the random variable that we 40 00:02:36,626 --> 00:02:42,199 care about and so on. So we're gonna fix an arbitrary array of length n, that's 41 00:02:42,199 --> 00:02:49,391 gonna be the input to the quick sort algorithm. And we'll be working with this 42 00:02:49,391 --> 00:02:54,210 fixed but arbitrary input array for the remainder of the analysis. Okay, so just 43 00:02:54,210 --> 00:02:59,300 fix a single input in your mind. Now what's the relevant sample space. Well 44 00:02:59,300 --> 00:03:03,401 recall what a sample space is. It's just all the possible outcomes of the 45 00:03:03,401 --> 00:03:07,782 randomness in the world. So it's all the distinct things that could happen. Now 46 00:03:07,782 --> 00:03:11,601 here the randomness is of our own devising, it's just a random pivot 47 00:03:11,601 --> 00:03:16,039 sequences. The random pivots chosen by quick sort. So omega is just a set of all 48 00:03:16,039 --> 00:03:22,272 possible random pivots that quick sort could choose. Now the whole point of this 49 00:03:22,272 --> 00:03:27,171 theorem proving that the average, average running time it puts is sort of small, 50 00:03:27,171 --> 00:03:32,070 boils down to computing the expectation of a single random variable. So here's the 51 00:03:32,070 --> 00:03:39,066 random variable we're going to care about. Four Given pipit sequence, remember that 52 00:03:39,066 --> 00:03:43,687 random variables are real valued functions defined on the sample space. So for a 53 00:03:43,687 --> 00:03:48,030 given point in the sample space, or pipit sequence sigma, we're going to define 54 00:03:48,030 --> 00:03:52,261 capital C of sigma, as the number of comparisons that Quick Sort makes, where 55 00:03:52,261 --> 00:03:56,882 by comparison I don't mean something like within array index in a four loop, that's 56 00:03:56,882 --> 00:04:01,225 not what I mean by comparison. I mean, comparison between two different entries 57 00:04:01,225 --> 00:04:05,734 of the input array, by comparing the third entry in the array against the seventh 58 00:04:05,734 --> 00:04:09,854 entry in the array to see whether the third entry or the seventh entry is 59 00:04:09,854 --> 00:04:16,426 smaller. Notice that this is indeed a random variable that is given knowledge of 60 00:04:16,426 --> 00:04:21,148 the pivot sequence sigma. The choices of all pivots you can think a quick sort at 61 00:04:21,148 --> 00:04:25,307 that point is just a deterministic algorithm, with all of the pivot choice 62 00:04:25,307 --> 00:04:29,129 predetermined. So the deterministic version of quick sort makes some 63 00:04:29,298 --> 00:04:33,850 deterministic number of comparisons. So, if we're given pivot sequence sigma, we're 64 00:04:33,850 --> 00:04:38,291 just calling C of sigma to be whatever, however many comparisons it makes given 65 00:04:38,291 --> 00:04:43,442 those choices of pivots. Now the, with the theorem I've stated it's not about the 66 00:04:43,442 --> 00:04:47,366 number of comparisons of Quick Sort but rather about the running time of Quick 67 00:04:47,366 --> 00:04:51,291 Sort but really, if you think about it kinda only real work that the Quick Sort 68 00:04:51,291 --> 00:04:55,265 algorithm does is make comparisons between two, between pairs of elements in the 69 00:04:55,265 --> 00:04:59,239 input array. Yes, there's a little bit of other book keeping but that's all noise, 70 00:04:59,239 --> 00:05:03,214 that's all second order stuff. All Quick Sort really does is comparisons between 71 00:05:03,214 --> 00:05:08,585 the pairs of elements in the input array. And if you want to know what I mean by 72 00:05:08,585 --> 00:05:13,520 that a little more formally, dominated by comparisons. I mean, that there exists a 73 00:05:13,520 --> 00:05:18,331 constant c, so that the total number of operations of any type that quick sort 74 00:05:18,331 --> 00:05:23,530 executes is at most a constant factor larger than the number of comparisons. So 75 00:05:23,530 --> 00:05:29,137 let's say that by RT I mean the number of primitive operations of any form that 76 00:05:29,137 --> 00:05:35,115 quick sort uses and for every. Pivot sequence sigma. The total number of 77 00:05:35,115 --> 00:05:40,100 operations is no more than a constant times. The total number of comparisons. 78 00:05:40,100 --> 00:05:43,810 And if you want a proof of this, it's not that interesting, so I'm not gonna talk 79 00:05:43,810 --> 00:05:47,567 about it here. But in the notes posted on the website, there is a sketch, of why 80 00:05:47,567 --> 00:05:51,416 this is true. How you can formally argue that there isn't much work beyond just the 81 00:05:51,416 --> 00:05:55,126 comparisons. But I hope most of you find that, to be pretty intuitive. So, given 82 00:05:55,126 --> 00:05:58,790 this, given the, the running time of Quick Sort boils down just to the number of 83 00:05:58,790 --> 00:06:02,315 comparisons. And we want to prove the average running time is N log N. All we 84 00:06:02,315 --> 00:06:05,886 gotta do, quote unquote, all we have to do, is prove that the average number of 85 00:06:05,886 --> 00:06:09,782 comparisons that Quick Sort makes, is O of N log N. And that's what we're gonna do. 86 00:06:09,782 --> 00:06:14,586 So that's what the, the rest of these lectures are about. So that's what we 87 00:06:14,586 --> 00:06:18,806 gotta prove. We gotta prove that the expectation of this random variable C, 88 00:06:18,806 --> 00:06:23,540 which counts up the number of comparisons quicksort makes is for this arbitrary 89 00:06:23,540 --> 00:06:30,978 input array A of length n, bounded by big-O of n log n. So the high order bit of 90 00:06:30,978 --> 00:06:35,236 this lecture is a decomposition principle. We've identified this random variable C, 91 00:06:35,236 --> 00:06:39,235 the number of comparisons, and it's exactly what we care about. It governs the 92 00:06:39,235 --> 00:06:43,285 average running time of Quick Sort. The problem is, it's quite complicated. It's 93 00:06:43,285 --> 00:06:47,648 very hard to understand what this capital C is. It's fluctuating between n log n and 94 00:06:47,648 --> 00:06:52,854 n squared, and it's hard to know how to get a handle on it. So how are we gonna go 95 00:06:52,854 --> 00:06:57,020 about proving this assertion that the expected number of comparisons that 96 00:06:57,020 --> 00:07:01,245 quicksort makes is on average just O (n log n)? At this point we actually have a 97 00:07:01,245 --> 00:07:04,813 fair amount of experience with divide-and-conquer algorithms. We've seen 98 00:07:04,813 --> 00:07:08,876 a number of examples. And, whenever we had to do a running-time analysis of such an 99 00:07:08,876 --> 00:07:12,890 algorithm, we wrote out a recurrence, we applied the master method or, in the worst 100 00:07:12,890 --> 00:07:16,358 case, we wrote out a recursion tree to figure out the solution of that 101 00:07:16,358 --> 00:07:20,323 recurrence. So you'd be very right to expect something similar to happen here. 102 00:07:20,323 --> 00:07:24,287 But as we probe deeper and we think about quicksort, we quickly realize that the 103 00:07:24,287 --> 00:07:28,251 master method just doesn't apply, or at least not in the form that we're used to. 104 00:07:28,400 --> 00:07:32,712 The problem is twofold. So, first of all, the size of the two subproblems is random. 105 00:07:32,712 --> 00:07:36,704 Right? As we discussed in the last video the quality of the pivot is what 106 00:07:36,704 --> 00:07:41,134 determines how balanced a split we get into the two subproblems. It can be as bad 107 00:07:41,134 --> 00:07:45,181 subproblem size zero and one a size n minus one. Or it can be as good as a 108 00:07:45,181 --> 00:07:49,228 perfectly balanced split into two subproblems of equal sizes. But we don't 109 00:07:49,228 --> 00:07:53,494 know. It's gonna be dependent on our range of choice of the pivot. Moreover the 110 00:07:53,494 --> 00:07:57,760 master at least as we discussed it required solved subproblems to have the 111 00:07:57,760 --> 00:08:02,026 same size. And unless you're extremely lucky that's not gonna happen in the 112 00:08:02,026 --> 00:08:06,762 quicksort algorithm. It is possible to develop a theory of recurrence relations 113 00:08:06,762 --> 00:08:10,824 for randomized algorithms, and to apply it to quick sort in particular. But I'm not 114 00:08:10,824 --> 00:08:14,935 going to go that route for two reasons. The first one is its really quite messy. 115 00:08:14,935 --> 00:08:18,997 It gets pretty technical, to talk about solutions to recurrences for randomized 116 00:08:18,997 --> 00:08:22,811 algorithms, or to think about random recursion trees. Both of those get, get 117 00:08:22,811 --> 00:08:26,824 pretty complicated. The second reason is, I really want to introduce you to what I 118 00:08:26,824 --> 00:08:30,489 call a decomposition principle, by which you take a random variable that's 119 00:08:30,489 --> 00:08:34,211 complicated, but that you care about a lot. And decompose it in to simple random 120 00:08:34,211 --> 00:08:37,627 variables which you don't really care about in their own right but which are 121 00:08:37,627 --> 00:08:41,264 easy to analyze and then you stitch those two things together using linearity and 122 00:08:41,264 --> 00:08:44,591 expectation so that's gonna be the workhorse for our analysis of the quick 123 00:08:44,591 --> 00:08:48,185 store, of the quick-sort algorithm and it's gonna come up again a couple of times 124 00:08:48,185 --> 00:08:51,936 in the rest of the course, for example, when we study hashing. So, to explain how 125 00:08:51,936 --> 00:08:56,293 this decomposition principle applies to quicksort in particular, I'm gonna need to 126 00:08:56,293 --> 00:09:00,543 introduce you to the building blocks, simple random variables which will make up 127 00:09:00,543 --> 00:09:04,794 the complicated random variable that we care about, the number of comparisons. So 128 00:09:04,794 --> 00:09:10,013 here's some notation. Recall that we fixed in the background an arbitrary array of 129 00:09:10,013 --> 00:09:15,456 length N and that's denoted by capital A [sound]. And some notation which is 130 00:09:15,456 --> 00:09:22,054 simple, but also quite important by z sub i. What I mean is the ith smallest element 131 00:09:22,054 --> 00:09:28,348 in the input array capitol A. Also known as the ith order statistic. So let me tell 132 00:09:28,348 --> 00:09:32,998 you what ZI is not, okay. What ZI is not, in general, is the element in the 133 00:09:32,998 --> 00:09:38,164 [inaudible] position of the input unsorted array. What ZI is, is it's the element 134 00:09:38,164 --> 00:09:42,942 which is going to wind up in the [inaudible] element of the array, once we 135 00:09:42,942 --> 00:09:47,786 sort it, okay. So if you fast forward to the end of the sorting algorithm, in 136 00:09:47,786 --> 00:09:53,417 position I, you're gonna find ZI. So let me give you an example. So suppose we have 137 00:09:53,417 --> 00:10:01,440 just a simple array here, unsorted. Were the numbers six, eight, ten and two, then. 138 00:10:02,080 --> 00:10:07,710 Z one, well that's the first smallest. The one smallest or just the minimum. So z one 139 00:10:07,710 --> 00:10:13,408 would be the two z two would be the six z three would be the eight and z four would 140 00:10:13,408 --> 00:10:17,953 be the ten. For this particular input array. Okay? So z I is just the 141 00:10:17,953 --> 00:10:23,312 [inaudible] smallest number, wherever it may lie in the original unsorted array 142 00:10:23,516 --> 00:10:29,026 that's what z I refers to. So we already defined the sample space that's just all 143 00:10:29,026 --> 00:10:33,901 possible choices of pivots, the Quicksort might make. I already described one random 144 00:10:33,901 --> 00:10:38,364 variable with the number of comparison that Quicksort makes on a particular 145 00:10:38,364 --> 00:10:42,887 choice of pivots. Now I am getting introduced a family of much simpler random 146 00:10:42,887 --> 00:10:47,057 variables which count merely the comparisons involving a given pair of 147 00:10:47,057 --> 00:10:53,623 elements in the input way not all elements just a given pair. So for a given choice 148 00:10:53,623 --> 00:11:01,020 of pivots given sigma and given choices of I and J. Both of which are between one and 149 00:11:01,020 --> 00:11:06,500 N. And so we only count things once. I'm going to insist that I is less than J 150 00:11:06,500 --> 00:11:11,839 always. And now here's a definition. My XIJ, and this is a random variable, so 151 00:11:11,839 --> 00:11:17,889 it's a function of the pivots chosen. This is going to be the number of times that ZI 152 00:11:17,889 --> 00:11:23,386 and ZJ are compared in the execution of quick sort. Okay, so this is gonna be an 153 00:11:23,386 --> 00:11:27,638 important definition in our analysis. It's important you understand it. So, for 154 00:11:27,638 --> 00:11:32,278 something like the third smallest element and the seventh smallest element, x I j is 155 00:11:32,278 --> 00:11:36,751 asking, that's when I equals three and j equals seven, x three seven is asking how 156 00:11:36,751 --> 00:11:41,201 many times those two elements get compared as quicksort. Proceeds. And this is a 157 00:11:41,201 --> 00:11:45,499 random variable in the sense that if the pivot choices are all predetermined, if we 158 00:11:45,499 --> 00:11:49,745 think of those being chosen in advance, then there's just some fixed deterministic 159 00:11:49,745 --> 00:11:53,888 number of times, that ZI and ZJ get compared. So it's important you understand 160 00:11:53,888 --> 00:11:58,342 these random variables XIJ. So the next quiz is gonna ask, a basic question about 161 00:11:58,342 --> 00:12:02,225 the range of values that a given XIJ can take on. So for this quiz we're 162 00:12:02,225 --> 00:12:06,321 considering, as usual, so fixed input array and now furthermore, fix two 163 00:12:06,321 --> 00:12:10,767 specific elements of the input array. For example, the third smallest element, 164 00:12:10,767 --> 00:12:15,039 wherever it may lie, and the seventh smallest element, wherever it may lie. 165 00:12:15,039 --> 00:12:19,544 Think about just these pair of two elements. What is the range of values that 166 00:12:19,544 --> 00:12:23,698 the corresponding random variable, Xij, can take on? That is, what are the 167 00:12:23,698 --> 00:12:28,320 different numbers of times that a given pair of elements might conceivably get 168 00:12:28,320 --> 00:12:33,306 compared in the execution of the quick sort algorithm? All right so the correct 169 00:12:33,306 --> 00:12:37,600 answer to this quiz, is the second option. This is not a trivial quiz. This is a 170 00:12:37,600 --> 00:12:41,895 little tricky to see. So the assertion is that a given pair of elements, they might 171 00:12:41,895 --> 00:12:45,735 not be compared at all. They might be compared once. And they're not going to 172 00:12:45,735 --> 00:12:49,571 get compared more than once. Okay? So here, what I'm gonna discuss is why it's 173 00:12:49,571 --> 00:12:53,808 not possible for a given pair of elements to be compared twice during the execution 174 00:12:53,808 --> 00:12:57,995 of quick sort. It'll be clear later on, if it's not already clear now, that both zero 175 00:12:57,995 --> 00:13:02,081 and one are legitimate possibilities. A pair of elements might never get compared 176 00:13:02,081 --> 00:13:05,865 and they might get compared once. And we'll, and again, we'll go into more 177 00:13:05,865 --> 00:13:10,001 detail on that in the next video. So, but why is it impossible to be compared twice? 178 00:13:10,001 --> 00:13:14,037 Well, think about two elements. Say, the third element and the seventh element. And 179 00:13:14,037 --> 00:13:18,022 let's recall how the partition subroutine works. Observe that in quick sort, the 180 00:13:18,022 --> 00:13:22,254 only place in the code where comparisons between pairs of the. Input array elements 181 00:13:22,254 --> 00:13:26,408 happens, it only happens in the partition subroutine. So that's where we have to 182 00:13:26,408 --> 00:13:30,667 drill down. So what are the comparisons that get made in the partition subroutine? 183 00:13:30,667 --> 00:13:35,391 Well. Go back and look at that code. The pivot elements is compared to each other 184 00:13:35,391 --> 00:13:40,088 element in the input array exactly once. Okay, so the pivot just hangs out in the 185 00:13:40,088 --> 00:13:44,961 first entry of the array. We have this for loop, this index j, which marches over the 186 00:13:44,961 --> 00:13:49,658 rest of the array. And for each value of j, the j element of the input array gets 187 00:13:49,658 --> 00:13:53,825 compared to the pivot. Okay. So summarizing in an invocation of partition, 188 00:13:53,825 --> 00:13:58,196 every single comparison involves the pivot element, okay. So two elements get 189 00:13:58,196 --> 00:14:02,912 compared if and only if one is the pivot. Alright, so let's go back to the question. 190 00:14:02,912 --> 00:14:07,456 Why can't a given pair of elements in the [inaudible] get compared two or more 191 00:14:07,456 --> 00:14:11,931 times? Well, think about the first time they ever get compared in Quick Sort. It 192 00:14:11,931 --> 00:14:16,233 must be the case that at that moment, we're in a recursive call where either one 193 00:14:16,233 --> 00:14:20,751 of those two is the pivot element. So it's the third smallest element or the seventh 194 00:14:20,751 --> 00:14:24,946 smallest element. The first time those two elements are compared to each other, 195 00:14:24,946 --> 00:14:29,303 either the third smallest or the seventh smallest is currently the pivot, because 196 00:14:29,303 --> 00:14:33,512 all comparisons involve the pivot element. Therefore. What's gonna happen in the 197 00:14:33,512 --> 00:14:37,647 recursion, well the pitted is excluded from both recursive calls. So for example 198 00:14:37,647 --> 00:14:41,572 if the second smallest element is currently the pitted that's not gonna be 199 00:14:41,572 --> 00:14:45,915 passed on to recursive call which contains the third smallest element. Therefore if 200 00:14:45,915 --> 00:14:49,736 you compared once, one of the elements that's pitted and it will never be 201 00:14:49,736 --> 00:14:53,922 compared again because the pitted will not even show up in any future recursive 202 00:14:53,922 --> 00:14:57,638 calls. Okay, so that's the reason. You compared once, one is the pitted, it 203 00:14:57,638 --> 00:15:02,309 doesn't get passed to the recursion, you're done, never seen again. So a random 204 00:15:02,309 --> 00:15:07,562 variable which can only take on the value zero or one is often called an indicator 205 00:15:07,562 --> 00:15:12,561 random variable, because it's in, just indicating whether or not a certain thing 206 00:15:12,561 --> 00:15:17,645 happens. So, in that terminology, each x I j. Is indicating whether or not the ith 207 00:15:17,645 --> 00:15:21,686 smallest element in the array, and the jth smallest element in the array, ever get 208 00:15:21,686 --> 00:15:25,677 compared. It can't happen more than once. It may or may not happen, and Xij is one 209 00:15:25,677 --> 00:15:30,668 precisely when it happens. So that's the event that it's indicating. Having defined 210 00:15:30,668 --> 00:15:34,625 the building blocks I need, these indicator random variables, these X, I, 211 00:15:34,625 --> 00:15:38,696 Js, now I can introduce you to the decomposition principle as applied to 212 00:15:38,696 --> 00:15:43,389 Quick Sort. So there's a random variable that we really care about which is denoted 213 00:15:43,389 --> 00:15:47,855 capital C, the number of comparisons the Quick Sort makes, that's really hard to 214 00:15:47,855 --> 00:15:52,548 get a handle on in and of itself. But we can express C as a sum of indicator random 215 00:15:52,548 --> 00:15:56,788 variables of these X, I, Js and those, we don't care about in their own right, 216 00:15:56,788 --> 00:16:00,996 they're gonna be much easier to understand. So, let me just rewrite the 217 00:16:00,996 --> 00:16:06,259 definitions of C M E X and J so we're all clear on them. So C Recall counts all of 218 00:16:06,259 --> 00:16:11,918 the comparisons between pairs of input elements that [inaudible] whatever makes. 219 00:16:11,918 --> 00:16:17,860 Whereas an Xij only counts the number and its going to be zero or one which compare 220 00:16:17,860 --> 00:16:23,307 the ith smallest and the jth smallest elements in particular. Now since every 221 00:16:23,307 --> 00:16:29,036 comparison involves precisely one pair of elements some I and some j, with I less 222 00:16:29,036 --> 00:16:34,278 the j. We can write C as the sum of the Xij's. So don't get intimidated by this 223 00:16:34,278 --> 00:16:38,240 fancy double sum. All this is doing is it's iterating over all of the ordered 224 00:16:38,240 --> 00:16:42,460 pairs. So all of the pairs IJ, where I and J are both between one and N, and where I 225 00:16:42,460 --> 00:16:46,372 is strictly less than N. This double sum is just a convenient way to do that 226 00:16:46,372 --> 00:16:50,180 iteration. And, of course, no matter what the pivots chosen are, we have this 227 00:16:50,180 --> 00:16:54,297 equality, okay? The comparisons, are somehow split up amongst the various pairs 228 00:16:54,297 --> 00:16:58,472 of elements, the various I's and J's. Why is it useful to express a complicated 229 00:16:58,472 --> 00:17:03,005 random variable as a sum of simple random variables? Well, because an equation like 230 00:17:03,005 --> 00:17:07,484 this is now right in the wheelhouse of linearity of expectation. So let's just go 231 00:17:07,484 --> 00:17:11,741 ahead and apply that. Remember, and this is super, super important, linearity of 232 00:17:11,741 --> 00:17:16,390 expectation says that the expectation of a sum. Equals, the sum, of the expectations, 233 00:17:16,390 --> 00:17:21,239 and moreover, this is true, whether or not the reign of variables are independent 234 00:17:21,239 --> 00:17:25,883 ?Kay and I'm not gonna prove it here, but you might wanna think about the fact that 235 00:17:25,883 --> 00:17:30,629 the X-I-J's are not in fact, independent. They were using the fact that we need an 236 00:17:30,629 --> 00:17:35,989 expectation towards, even for not independent reign of variables. And why is 237 00:17:35,989 --> 00:17:43,021 this interesting, well the left hand side. This is complicated, right? This is the, 238 00:17:43,021 --> 00:17:48,110 this is some crazy number of comparisons by some algorithm on some arbitrarily long 239 00:17:48,110 --> 00:17:53,016 array and it fluctuates between two pretty far apart numbers and log in and then 240 00:17:53,016 --> 00:17:57,741 squared. On the other hand, this does not seem as intimidating, given X, I, J it's 241 00:17:57,741 --> 00:18:03,480 just zero or one, whether or not these two guys get compared or not. So that is the 242 00:18:03,480 --> 00:18:07,667 power of this decomposition approach, okay. So it reduces understanding of 243 00:18:07,667 --> 00:18:11,796 complicated random variable to understanding simple random variables. In 244 00:18:11,796 --> 00:18:16,155 fact, because these are indicator random variables, we can even clean up this 245 00:18:16,155 --> 00:18:21,264 expression some more. So for any given xij, being a 01 random variable, if we 246 00:18:21,264 --> 00:18:26,426 expand the definition of expectation, just as an average over the various values 247 00:18:26,619 --> 00:18:31,458 it's, what is it. It's, well, it's some probability it takes on the value zero, 248 00:18:31,458 --> 00:18:37,100 that's possible. And there's some possibility it takes on the value one. And 249 00:18:37,100 --> 00:18:43,100 of course this zero part we can very satisfyingly delete, cancel. And so the 250 00:18:43,100 --> 00:18:49,340 expected value of a given XIJ is just the probability that XIJ equals one. And 251 00:18:49,340 --> 00:18:55,180 remember, it's an indicator random variable. It's one precisely when the I 252 00:18:55,180 --> 00:19:02,057 smallest and the J smallest, elements get compared. So putting it all together. We 253 00:19:02,057 --> 00:19:08,153 find that what we care about, the average value of the number of comparisons made by 254 00:19:08,153 --> 00:19:13,524 quick sort on this input array. Is this double sum, which iterates over all 255 00:19:13,524 --> 00:19:21,127 ordered pairs. Where each summand is the probability that the corresponding x I j 256 00:19:21,127 --> 00:19:27,915 equals one. That is the probability that z I and z j get compared. And this is 257 00:19:27,915 --> 00:19:33,351 essentially the stopping point for this video, for the first part of the analysis. 258 00:19:33,351 --> 00:19:38,309 So let's call this star and put a nice circle around it. So whats gonna happen 259 00:19:38,309 --> 00:19:42,615 next is that in the second video, for the analysis, we're going to drill down on 260 00:19:42,615 --> 00:19:46,758 this probability. Probability that a given, pair of [inaudible] gets compared 261 00:19:46,758 --> 00:19:51,282 and we're gonna nail it, we're gonna get an exact expression as a function of I and 262 00:19:51,282 --> 00:19:55,534 J, for exactly what this probability is. Then in the third video we're going to 263 00:19:55,534 --> 00:19:59,841 take that exact expression, plug it into the sum, and then, evaluate this sum and 264 00:19:59,841 --> 00:20:04,293 it turns out the sum will evaluate to the [inaudible] and login. So that's the plan. 265 00:20:04,293 --> 00:20:07,891 That's how you apply decomposition in terms of 0-1 or indicator random 266 00:20:07,891 --> 00:20:12,048 variables. Apply linearity of expectation. In the next video we'll understand these 267 00:20:12,048 --> 00:20:16,153 simple random variables, and then we'll wrap up in the third video. Before we move 268 00:20:16,153 --> 00:20:19,904 on to the next part of the analysis, I do just want to emphasize that this 269 00:20:19,904 --> 00:20:24,010 decomposition principle is relevant not only for quicksort, but it's relevant for 270 00:20:24,010 --> 00:20:27,406 the analysis of lots of randomized algorithms. And we will see more 271 00:20:27,406 --> 00:20:31,005 applications, at least one more application later in the course. So just 272 00:20:31,005 --> 00:20:35,313 to kinda really hammer the point home, let me spell out the key steps for the general 273 00:20:35,313 --> 00:20:39,104 decomposition principle. So first you need to figure out what is it you care about? 274 00:20:39,104 --> 00:20:42,699 So in Quick Sort we cared about the number of comparisons, we had this lemma that 275 00:20:42,699 --> 00:20:45,939 said the running time dominated our comparisons so we understood what we 276 00:20:45,939 --> 00:20:50,429 wanted to know - the average value for the number of comparisons. The second step is 277 00:20:50,429 --> 00:20:56,206 to express this random variable y as a sum of simpler random variables. Ideally 278 00:20:56,206 --> 00:21:02,375 indicator or 01 random variables. [sound] Now you're in the wheel house of linearity 279 00:21:02,375 --> 00:21:08,536 of expectation. You just apply it. And you find that what it is you care about, the 280 00:21:08,536 --> 00:21:15,766 average value. Of the random value, of the random variable, Y. Is just the sum. 281 00:21:15,766 --> 00:21:22,235 [sound] the probabilities of various events. That, given XL random variable is 282 00:21:22,235 --> 00:21:27,069 equal to one. And so the upshot is, to understand this seemingly very complicated 283 00:21:27,069 --> 00:21:31,783 left hand side, all you have to do is understand something which, in many cases, 284 00:21:31,783 --> 00:21:36,376 is much simpler. Which is, understand the probability of these various events. 285 00:21:36,376 --> 00:21:41,149 [sound]. In the next video, I'll show you exactly how that's done in the case of 286 00:21:41,149 --> 00:21:45,803 Quick Sort. Where these, where we care about the XIJs, the probability that two 287 00:21:45,803 --> 00:21:50,274 elements gets compared. So let's move on and get exact expression for that