So this is the first video of three in which we'll mathematically analyze the running time of the randomized implementation of quicksort. So in particular we're gonna prove that the average running time of quicksort is big-O of n log n. Now this is the first randomized algorithm that we've seen in the course and therefore, in its analysis will be the first time that we're gonna need any kind of probability theory. So let me just explain upfront what I?d like, expect you to know in the following analysis. Basically I need you to know the first few ingredients of discrete probability theory, so I need you to know about sample spaces, that is how to model all of the different things that could happen, all of the ways that random choices could resolve themselves. I need you to know about random variables, functions on sample spaces which take on real values, I need you to know about expectations that is average values of random variables, and Very simple but very key property we're going to need in the analysis of qucksort is linearity of expectation of expectation. So, if you haven't seen this before or if you're too rusty, definitely you should review this stuff before you watch this video. Some places you can go to get that necessary review, you can look at the probability review part one video that's up on the course. If you prefer to read something, like I said at the beginning of the course, I recommend the free online lecture notes by Eric Lehman and Tom Laten. Mathematics for Computer Science. That covers everything we'll need to know plus much more. There's also a wiki book on dist. Probability with is a perfectly fine obviously free-source unless you can learn the necessary material. Okay? So after you've got that sorta fresh in your minds and you're ready to watch the rest of this video, and in particular we're ready to prove the following theorems stated in the previous video. So, the quick sort of algorithm with a randomized implementation, that is wherein every single recursive sub call you pick a pivot uniformly at random restated the following assertion. But for every single input so for a worse case input a ray of length n, the average running time of quick sorts. With the random pivots. Is O of N login. And again, to be clear where the randomness is, the randomness is not in the data. We make no assumptions about the data, as per guiding principles. No matter what the input array is, averaging only over the randomness in our own code. The randomness internal to the algorithm, we get a running time of N log N. We saw in the past that the best case behavior of Quick Sort is N log N. The worst case behavior is N squared. So this theorem is asserting that, no matter what the input array is, the typical behavior of Quick Sort is far closer to the best case behavior than it is to the worst case behavior. Okay so that's what we are going to prove, in the next few videos. So let's go ahead and get started. So first I'm going to set up the necessary notation and be clear about exactly what is the sample space what is the random variable that we care about and so on. So we're gonna fix an arbitrary array of length n, that's gonna be the input to the quick sort algorithm. And we'll be working with this fixed but arbitrary input array for the remainder of the analysis. Okay, so just fix a single input in your mind. Now what's the relevant sample space. Well recall what a sample space is. It's just all the possible outcomes of the randomness in the world. So it's all the distinct things that could happen. Now here the randomness is of our own devising, it's just a random pivot sequences. The random pivots chosen by quick sort. So omega is just a set of all possible random pivots that quick sort could choose. Now the whole point of this theorem proving that the average, average running time it puts is sort of small, boils down to computing the expectation of a single random variable. So here's the random variable we're going to care about. Four Given pipit sequence, remember that random variables are real valued functions defined on the sample space. So for a given point in the sample space, or pipit sequence sigma, we're going to define capital C of sigma, as the number of comparisons that Quick Sort makes, where by comparison I don't mean something like within array index in a four loop, that's not what I mean by comparison. I mean, comparison between two different entries of the input array, by comparing the third entry in the array against the seventh entry in the array to see whether the third entry or the seventh entry is smaller. Notice that this is indeed a random variable that is given knowledge of the pivot sequence sigma. The choices of all pivots you can think a quick sort at that point is just a deterministic algorithm, with all of the pivot choice predetermined. So the deterministic version of quick sort makes some deterministic number of comparisons. So, if we're given pivot sequence sigma, we're just calling C of sigma to be whatever, however many comparisons it makes given those choices of pivots. Now the, with the theorem I've stated it's not about the number of comparisons of Quick Sort but rather about the running time of Quick Sort but really, if you think about it kinda only real work that the Quick Sort algorithm does is make comparisons between two, between pairs of elements in the input array. Yes, there's a little bit of other book keeping but that's all noise, that's all second order stuff. All Quick Sort really does is comparisons between the pairs of elements in the input array. And if you want to know what I mean by that a little more formally, dominated by comparisons. I mean, that there exists a constant c, so that the total number of operations of any type that quick sort executes is at most a constant factor larger than the number of comparisons. So let's say that by RT I mean the number of primitive operations of any form that quick sort uses and for every. Pivot sequence sigma. The total number of operations is no more than a constant times. The total number of comparisons. And if you want a proof of this, it's not that interesting, so I'm not gonna talk about it here. But in the notes posted on the website, there is a sketch, of why this is true. How you can formally argue that there isn't much work beyond just the comparisons. But I hope most of you find that, to be pretty intuitive. So, given this, given the, the running time of Quick Sort boils down just to the number of comparisons. And we want to prove the average running time is N log N. All we gotta do, quote unquote, all we have to do, is prove that the average number of comparisons that Quick Sort makes, is O of N log N. And that's what we're gonna do. So that's what the, the rest of these lectures are about. So that's what we gotta prove. We gotta prove that the expectation of this random variable C, which counts up the number of comparisons quicksort makes is for this arbitrary input array A of length n, bounded by big-O of n log n. So the high order bit of this lecture is a decomposition principle. We've identified this random variable C, the number of comparisons, and it's exactly what we care about. It governs the average running time of Quick Sort. The problem is, it's quite complicated. It's very hard to understand what this capital C is. It's fluctuating between n log n and n squared, and it's hard to know how to get a handle on it. So how are we gonna go about proving this assertion that the expected number of comparisons that quicksort makes is on average just O (n log n)? At this point we actually have a fair amount of experience with divide-and-conquer algorithms. We've seen a number of examples. And, whenever we had to do a running-time analysis of such an algorithm, we wrote out a recurrence, we applied the master method or, in the worst case, we wrote out a recursion tree to figure out the solution of that recurrence. So you'd be very right to expect something similar to happen here. But as we probe deeper and we think about quicksort, we quickly realize that the master method just doesn't apply, or at least not in the form that we're used to. The problem is twofold. So, first of all, the size of the two subproblems is random. Right? As we discussed in the last video the quality of the pivot is what determines how balanced a split we get into the two subproblems. It can be as bad subproblem size zero and one a size n minus one. Or it can be as good as a perfectly balanced split into two subproblems of equal sizes. But we don't know. It's gonna be dependent on our range of choice of the pivot. Moreover the master at least as we discussed it required solved subproblems to have the same size. And unless you're extremely lucky that's not gonna happen in the quicksort algorithm. It is possible to develop a theory of recurrence relations for randomized algorithms, and to apply it to quick sort in particular. But I'm not going to go that route for two reasons. The first one is its really quite messy. It gets pretty technical, to talk about solutions to recurrences for randomized algorithms, or to think about random recursion trees. Both of those get, get pretty complicated. The second reason is, I really want to introduce you to what I call a decomposition principle, by which you take a random variable that's complicated, but that you care about a lot. And decompose it in to simple random variables which you don't really care about in their own right but which are easy to analyze and then you stitch those two things together using linearity and expectation so that's gonna be the workhorse for our analysis of the quick store, of the quick-sort algorithm and it's gonna come up again a couple of times in the rest of the course, for example, when we study hashing. So, to explain how this decomposition principle applies to quicksort in particular, I'm gonna need to introduce you to the building blocks, simple random variables which will make up the complicated random variable that we care about, the number of comparisons. So here's some notation. Recall that we fixed in the background an arbitrary array of length N and that's denoted by capital A [sound]. And some notation which is simple, but also quite important by z sub i. What I mean is the ith smallest element in the input array capitol A. Also known as the ith order statistic. So let me tell you what ZI is not, okay. What ZI is not, in general, is the element in the [inaudible] position of the input unsorted array. What ZI is, is it's the element which is going to wind up in the [inaudible] element of the array, once we sort it, okay. So if you fast forward to the end of the sorting algorithm, in position I, you're gonna find ZI. So let me give you an example. So suppose we have just a simple array here, unsorted. Were the numbers six, eight, ten and two, then. Z one, well that's the first smallest. The one smallest or just the minimum. So z one would be the two z two would be the six z three would be the eight and z four would be the ten. For this particular input array. Okay? So z I is just the [inaudible] smallest number, wherever it may lie in the original unsorted array that's what z I refers to. So we already defined the sample space that's just all possible choices of pivots, the Quicksort might make. I already described one random variable with the number of comparison that Quicksort makes on a particular choice of pivots. Now I am getting introduced a family of much simpler random variables which count merely the comparisons involving a given pair of elements in the input way not all elements just a given pair. So for a given choice of pivots given sigma and given choices of I and J. Both of which are between one and N. And so we only count things once. I'm going to insist that I is less than J always. And now here's a definition. My XIJ, and this is a random variable, so it's a function of the pivots chosen. This is going to be the number of times that ZI and ZJ are compared in the execution of quick sort. Okay, so this is gonna be an important definition in our analysis. It's important you understand it. So, for something like the third smallest element and the seventh smallest element, x I j is asking, that's when I equals three and j equals seven, x three seven is asking how many times those two elements get compared as quicksort. Proceeds. And this is a random variable in the sense that if the pivot choices are all predetermined, if we think of those being chosen in advance, then there's just some fixed deterministic number of times, that ZI and ZJ get compared. So it's important you understand these random variables XIJ. So the next quiz is gonna ask, a basic question about the range of values that a given XIJ can take on. So for this quiz we're considering, as usual, so fixed input array and now furthermore, fix two specific elements of the input array. For example, the third smallest element, wherever it may lie, and the seventh smallest element, wherever it may lie. Think about just these pair of two elements. What is the range of values that the corresponding random variable, Xij, can take on? That is, what are the different numbers of times that a given pair of elements might conceivably get compared in the execution of the quick sort algorithm? All right so the correct answer to this quiz, is the second option. This is not a trivial quiz. This is a little tricky to see. So the assertion is that a given pair of elements, they might not be compared at all. They might be compared once. And they're not going to get compared more than once. Okay? So here, what I'm gonna discuss is why it's not possible for a given pair of elements to be compared twice during the execution of quick sort. It'll be clear later on, if it's not already clear now, that both zero and one are legitimate possibilities. A pair of elements might never get compared and they might get compared once. And we'll, and again, we'll go into more detail on that in the next video. So, but why is it impossible to be compared twice? Well, think about two elements. Say, the third element and the seventh element. And let's recall how the partition subroutine works. Observe that in quick sort, the only place in the code where comparisons between pairs of the. Input array elements happens, it only happens in the partition subroutine. So that's where we have to drill down. So what are the comparisons that get made in the partition subroutine? Well. Go back and look at that code. The pivot elements is compared to each other element in the input array exactly once. Okay, so the pivot just hangs out in the first entry of the array. We have this for loop, this index j, which marches over the rest of the array. And for each value of j, the j element of the input array gets compared to the pivot. Okay. So summarizing in an invocation of partition, every single comparison involves the pivot element, okay. So two elements get compared if and only if one is the pivot. Alright, so let's go back to the question. Why can't a given pair of elements in the [inaudible] get compared two or more times? Well, think about the first time they ever get compared in Quick Sort. It must be the case that at that moment, we're in a recursive call where either one of those two is the pivot element. So it's the third smallest element or the seventh smallest element. The first time those two elements are compared to each other, either the third smallest or the seventh smallest is currently the pivot, because all comparisons involve the pivot element. Therefore. What's gonna happen in the recursion, well the pitted is excluded from both recursive calls. So for example if the second smallest element is currently the pitted that's not gonna be passed on to recursive call which contains the third smallest element. Therefore if you compared once, one of the elements that's pitted and it will never be compared again because the pitted will not even show up in any future recursive calls. Okay, so that's the reason. You compared once, one is the pitted, it doesn't get passed to the recursion, you're done, never seen again. So a random variable which can only take on the value zero or one is often called an indicator random variable, because it's in, just indicating whether or not a certain thing happens. So, in that terminology, each x I j. Is indicating whether or not the ith smallest element in the array, and the jth smallest element in the array, ever get compared. It can't happen more than once. It may or may not happen, and Xij is one precisely when it happens. So that's the event that it's indicating. Having defined the building blocks I need, these indicator random variables, these X, I, Js, now I can introduce you to the decomposition principle as applied to Quick Sort. So there's a random variable that we really care about which is denoted capital C, the number of comparisons the Quick Sort makes, that's really hard to get a handle on in and of itself. But we can express C as a sum of indicator random variables of these X, I, Js and those, we don't care about in their own right, they're gonna be much easier to understand. So, let me just rewrite the definitions of C M E X and J so we're all clear on them. So C Recall counts all of the comparisons between pairs of input elements that [inaudible] whatever makes. Whereas an Xij only counts the number and its going to be zero or one which compare the ith smallest and the jth smallest elements in particular. Now since every comparison involves precisely one pair of elements some I and some j, with I less the j. We can write C as the sum of the Xij's. So don't get intimidated by this fancy double sum. All this is doing is it's iterating over all of the ordered pairs. So all of the pairs IJ, where I and J are both between one and N, and where I is strictly less than N. This double sum is just a convenient way to do that iteration. And, of course, no matter what the pivots chosen are, we have this equality, okay? The comparisons, are somehow split up amongst the various pairs of elements, the various I's and J's. Why is it useful to express a complicated random variable as a sum of simple random variables? Well, because an equation like this is now right in the wheelhouse of linearity of expectation. So let's just go ahead and apply that. Remember, and this is super, super important, linearity of expectation says that the expectation of a sum. Equals, the sum, of the expectations, and moreover, this is true, whether or not the reign of variables are independent ?Kay and I'm not gonna prove it here, but you might wanna think about the fact that the X-I-J's are not in fact, independent. They were using the fact that we need an expectation towards, even for not independent reign of variables. And why is this interesting, well the left hand side. This is complicated, right? This is the, this is some crazy number of comparisons by some algorithm on some arbitrarily long array and it fluctuates between two pretty far apart numbers and log in and then squared. On the other hand, this does not seem as intimidating, given X, I, J it's just zero or one, whether or not these two guys get compared or not. So that is the power of this decomposition approach, okay. So it reduces understanding of complicated random variable to understanding simple random variables. In fact, because these are indicator random variables, we can even clean up this expression some more. So for any given xij, being a 01 random variable, if we expand the definition of expectation, just as an average over the various values it's, what is it. It's, well, it's some probability it takes on the value zero, that's possible. And there's some possibility it takes on the value one. And of course this zero part we can very satisfyingly delete, cancel. And so the expected value of a given XIJ is just the probability that XIJ equals one. And remember, it's an indicator random variable. It's one precisely when the I smallest and the J smallest, elements get compared. So putting it all together. We find that what we care about, the average value of the number of comparisons made by quick sort on this input array. Is this double sum, which iterates over all ordered pairs. Where each summand is the probability that the corresponding x I j equals one. That is the probability that z I and z j get compared. And this is essentially the stopping point for this video, for the first part of the analysis. So let's call this star and put a nice circle around it. So whats gonna happen next is that in the second video, for the analysis, we're going to drill down on this probability. Probability that a given, pair of [inaudible] gets compared and we're gonna nail it, we're gonna get an exact expression as a function of I and J, for exactly what this probability is. Then in the third video we're going to take that exact expression, plug it into the sum, and then, evaluate this sum and it turns out the sum will evaluate to the [inaudible] and login. So that's the plan. That's how you apply decomposition in terms of 0-1 or indicator random variables. Apply linearity of expectation. In the next video we'll understand these simple random variables, and then we'll wrap up in the third video. Before we move on to the next part of the analysis, I do just want to emphasize that this decomposition principle is relevant not only for quicksort, but it's relevant for the analysis of lots of randomized algorithms. And we will see more applications, at least one more application later in the course. So just to kinda really hammer the point home, let me spell out the key steps for the general decomposition principle. So first you need to figure out what is it you care about? So in Quick Sort we cared about the number of comparisons, we had this lemma that said the running time dominated our comparisons so we understood what we wanted to know - the average value for the number of comparisons. The second step is to express this random variable y as a sum of simpler random variables. Ideally indicator or 01 random variables. [sound] Now you're in the wheel house of linearity of expectation. You just apply it. And you find that what it is you care about, the average value. Of the random value, of the random variable, Y. Is just the sum. [sound] the probabilities of various events. That, given XL random variable is equal to one. And so the upshot is, to understand this seemingly very complicated left hand side, all you have to do is understand something which, in many cases, is much simpler. Which is, understand the probability of these various events. [sound]. In the next video, I'll show you exactly how that's done in the case of Quick Sort. Where these, where we care about the XIJs, the probability that two elements gets compared. So let's move on and get exact expression for that