1 00:00:00,000 --> 00:00:04,367 This optional video will be, more or less, the last word that we have on sorting for 2 00:00:04,367 --> 00:00:08,313 the purposes of this course. And it'll answer the question, can we do better? 3 00:00:08,313 --> 00:00:12,575 Remember, that's the mantra of any good algorithm designer. I've shown you N log N 4 00:00:12,575 --> 00:00:16,890 algorithms for sorting, Merge Short in the worst case, Quick Sort, on average. Can we 5 00:00:16,890 --> 00:00:20,836 do better than N log N? Indeed, for the selection problem, we saw we could do 6 00:00:20,836 --> 00:00:25,256 better than N log N. We could linear time. Maybe we can do linear time for sorting as 7 00:00:25,256 --> 00:00:29,537 well. The purpose of this video is to explain to you why we cannot, do sorting, 8 00:00:29,537 --> 00:00:34,216 in linear time. So this is a rare problem where we understand quite precisely how 9 00:00:34,216 --> 00:00:38,399 well it can be solved at least for a particular class of [inaudible] called 10 00:00:38,399 --> 00:00:42,802 comparison based sorts which I'll explain in a moment. So here's the form of the 11 00:00:42,802 --> 00:00:48,035 theorem, I want to give you the gist of in this video. So in addition to restricting 12 00:00:48,035 --> 00:00:51,525 to comparison based sorts which is necessary as we'll see in a second, I'm 13 00:00:51,525 --> 00:00:55,440 going to make a second assumption which is not necessary but is convenient for the 14 00:00:55,440 --> 00:00:59,261 lecture which is that I'm only going to think about deterministic algorithms for 15 00:00:59,261 --> 00:01:03,352 the moment. I encourage you to think about why the same style of arguments gives an N 16 00:01:03,352 --> 00:01:07,070 log and lower bound on the expected running time of any randomized algorithm. 17 00:01:07,070 --> 00:01:10,835 Maybe I'll put that on the course site as an optional theory problem. So, in 18 00:01:10,835 --> 00:01:14,891 particular, a quick sort is optimal in the randomized sense. It have average and long 19 00:01:14,891 --> 00:01:18,753 end time and then again claims that no comparison based sort can be better than 20 00:01:18,753 --> 00:01:22,470 that, even on average. So, I need to tell you what I mean by a comparison based 21 00:01:22,470 --> 00:01:26,612 sorting algorithm. What it means, it's a sorting algorithm that accesses the 22 00:01:26,612 --> 00:01:30,975 elements of the input array. Only via comparisons, it does not do any kind of 23 00:01:30,975 --> 00:01:35,510 direct manipulation on a single array element. All it does, is it picks pairs of 24 00:01:35,510 --> 00:01:40,259 elements and asks the question is the left one bigger or is the right one bigger. I 25 00:01:40,259 --> 00:01:44,292 like to think of comparison based sorts as general purpose sorting routines. They 26 00:01:44,292 --> 00:01:48,376 make no assumptions about what the data is other than that it's from some totally 27 00:01:48,376 --> 00:01:52,509 ordered set. I like to think of it really as a function that takes as an argument a 28 00:01:52,509 --> 00:01:56,393 function pointer that allows it to do comparisons between abstract data types. 29 00:01:56,393 --> 00:02:00,326 There's no way to access the guts of the elements. All you can do is go through 30 00:02:00,326 --> 00:02:04,210 this API, which allows you to make comparisons. And indeed if you look at the 31 00:02:04,210 --> 00:02:08,144 sorting routine and say the unit's operating system, that's exactly how it's 32 00:02:08,144 --> 00:02:12,056 set up. You just patch in a function pointer to a comparison operator. I know 33 00:02:12,056 --> 00:02:15,746 this sounds super abstract so, I think it becomes clear once we talk about some 34 00:02:15,746 --> 00:02:19,435 examples. There's famous examples of comparison based sort including everything 35 00:02:19,435 --> 00:02:22,797 we've discussed in the class so far. There's also famous examples of non 36 00:02:22,797 --> 00:02:26,487 comparison based sort which we're not gonna cover, but perhaps some of you have 37 00:02:26,487 --> 00:02:30,410 heard of or at the very least they're very easy to look up on Wikipedia or wherever. 38 00:02:30,410 --> 00:02:35,393 So examples include the two sorting algorithms we discussed so far, mergesort. 39 00:02:35,393 --> 00:02:40,634 The only way that mergesort interacts with the elements in the input array is by 40 00:02:40,634 --> 00:02:45,698 comparing them and by copying them. Similarly, the only think Quick Sort does 41 00:02:45,698 --> 00:02:50,456 with the input array elements is compare them and swap them in place. For those of 42 00:02:50,456 --> 00:02:54,807 you that know about the heap data structure which we'll be reviewing later 43 00:02:54,807 --> 00:02:59,506 in the class. Heap sort. Where you just, heapify a bunch of elements, and then 44 00:02:59,506 --> 00:03:04,147 extract the minimum N times. That also uses only comparisons. So what are some 45 00:03:04,147 --> 00:03:08,847 famous non examples? I think this will make it even more clear what we're talking 46 00:03:08,847 --> 00:03:12,694 about. So bucket sort is one very useful one. So, bucket sort's used most 47 00:03:12,694 --> 00:03:16,576 frequently when you have some kind of distributional assumption on the data that 48 00:03:16,576 --> 00:03:20,266 you're sorting. Remember that's exactly what I'm focusing on avoiding in this 49 00:03:20,266 --> 00:03:24,100 class. I'm focusing on general purpose subroutines where you don't know anything 50 00:03:24,100 --> 00:03:28,078 about the data. If you do know stuff about the data, bucket sorting can sometimes be 51 00:03:28,078 --> 00:03:31,720 a really useful method. For example, suppose you can model your data as I-I-D 52 00:03:31,720 --> 00:03:35,650 samples from the uniform distribution on zero one. So they're all rational numbers, 53 00:03:35,650 --> 00:03:39,436 bigger than zero, less than one, and you expect them to be evenly spread through 54 00:03:39,436 --> 00:03:43,151 that interval. Then what you can do in bucket sort is you can just. Preallocate 55 00:03:43,151 --> 00:03:46,870 end buckets where you're gonna collect these elements. Each one is gonna have the 56 00:03:46,870 --> 00:03:50,452 same width, width one over n. The first bucket you just do linear pass with the 57 00:03:50,452 --> 00:03:54,172 input array. Everything that's between zero and one over n you stick in the first 58 00:03:54,172 --> 00:03:57,800 bucket. Everything in between one over n and two over n you stick in the second 59 00:03:57,800 --> 00:04:01,565 bucket. Two over end and three over n you sick in the third bucket and so on. So 60 00:04:01,565 --> 00:04:05,627 with the single pass. You've classified the input elements according to which 61 00:04:05,627 --> 00:04:09,942 bucket they belong in, now because the data is assumed to be uniform at random, 62 00:04:09,942 --> 00:04:14,423 that means you expect each of the buckets to have a very small population, just a 63 00:04:14,423 --> 00:04:18,157 few elements in it. So remember if it. Elements are drawing uniform from the 64 00:04:18,157 --> 00:04:21,532 interval zero one, then it's equally likely to be in each of the N available 65 00:04:21,532 --> 00:04:25,085 buckets. And since there's N elements that means you only expect one element per 66 00:04:25,085 --> 00:04:28,727 bucket. So that each one is gonna have a very small population. Having bucketed the 67 00:04:28,727 --> 00:04:32,325 data, you can now just use, say, insertion sort on each bucket independently. You're 68 00:04:32,325 --> 00:04:35,744 gonna be doing insertion sort on a tiny number of elements, so that'll run in 69 00:04:35,744 --> 00:04:39,297 constant time, and then there's gonna be linear number of buckets, so it's linear 70 00:04:39,297 --> 00:04:43,447 time overall. So the upshot is. If you're willing to make really strong assumptions 71 00:04:43,447 --> 00:04:48,053 about your data like it's drawn uniformly at random from the interval zero one then 72 00:04:48,053 --> 00:04:52,333 there's not an N log in lower bound in fact you can allude the lower bound and 73 00:04:52,333 --> 00:04:56,622 sort them in your time. So, just to be clear. In what sense is bucket sort not 74 00:04:56,622 --> 00:05:01,156 comparison based? In what sense does it look at the guts of its elements and do 75 00:05:01,156 --> 00:05:05,920 something other than access them by pairs of comparisons? Well, it actually looks at 76 00:05:05,920 --> 00:05:10,684 an element at input array and it says what is its value, and it checks if its value 77 00:05:10,684 --> 00:05:15,390 is.17 versus.27 versus.77, and according to what value it sees inside this element, 78 00:05:15,390 --> 00:05:20,039 it makes the decision of which bucket to allocate it to. So, it actually stares at 79 00:05:20,039 --> 00:05:24,688 the guts of an element to decide how, what to do next. Another non-example, which eh, 80 00:05:24,688 --> 00:05:29,678 can be quite useful is count and sort. So this sorting algorithm is good when your 81 00:05:29,678 --> 00:05:34,657 data again we're gonna make an assumption on the data, when their integers, and 82 00:05:34,657 --> 00:05:40,083 their small integers, so they're between zero and K where K is say ideally at most 83 00:05:40,083 --> 00:05:44,423 linear in N. So then what you do, is you do a single pass through the input array. 84 00:05:44,423 --> 00:05:48,260 Again, you just bucket the elements according to what their value is. It's 85 00:05:48,260 --> 00:05:52,307 somewhere between zero and K, and it's an integer by assumption. So you need K 86 00:05:52,307 --> 00:05:56,459 buckets. And then you do a pass, and you sort of depopulate the buckets and copy 87 00:05:56,459 --> 00:06:00,873 them into an output array. And that gives you a, a sorting algorithm which runs in 88 00:06:00,873 --> 00:06:04,762 time, O of N Plus K. Where K is the size of the biggest integer. So the upshot with 89 00:06:04,762 --> 00:06:09,072 counting sort is that, if you're willing to assume that datas are integers bounded 90 00:06:09,072 --> 00:06:12,961 above by some factor linear in N, proportional to N, then you can sort them 91 00:06:12,961 --> 00:06:17,109 in linear time. Again county sort does not access the rail and it's merely through 92 00:06:17,109 --> 00:06:20,921 comparisons. It actually stares at an element, figures out what it's value is, 93 00:06:20,921 --> 00:06:24,832 and uses that value to determine what bucket to put the element in. So in that 94 00:06:24,832 --> 00:06:28,994 sense it's not a comparison case sort and it can under compare it's assumptions to 95 00:06:28,994 --> 00:06:32,806 beat the end log and lower it down. So a final example is the one that would 96 00:06:32,806 --> 00:06:36,918 [inaudible] them rated sort. I think that this is sort of an extension of counting 97 00:06:36,918 --> 00:06:40,879 sort, although you don't have to use counting sort as the interloop you can use 98 00:06:40,879 --> 00:06:44,841 other so called stable sorts as well. It's the stuff you can read about in many 99 00:06:44,841 --> 00:06:48,678 programming books or on the web. And up shot at rated sort. [inaudible]. You, you 100 00:06:48,678 --> 00:06:51,975 again you assume that the date are integers. You think of them in digit 101 00:06:51,975 --> 00:06:55,644 representation, say binary representation. And now you just sort one bit at time, 102 00:06:55,644 --> 00:06:59,313 starting from the least significant bits and going all the way out to the most 103 00:06:59,313 --> 00:07:03,028 significant bits. And so the upside of rating sort, it's an extension of counting 104 00:07:03,028 --> 00:07:06,232 sort is the sense that if your data is integers that are not too big, 105 00:07:06,232 --> 00:07:11,653 polynomially bounded in N. Then it lets you sort in linear time. So, summarizing, 106 00:07:11,653 --> 00:07:15,667 a comparison based sort is one that can only access the input array through this 107 00:07:15,667 --> 00:07:19,879 API, that lets you do comparisons between two elements. You cannot access the value 108 00:07:19,879 --> 00:07:24,042 of an element, so in particular you cannot do any kind of bucketing technique. Bucket 109 00:07:24,042 --> 00:07:27,808 sort, counting sort, and rating sort all fundamentally are doing some kind of 110 00:07:27,808 --> 00:07:31,772 bucketing and that's why when you're willing to make assumptions about what the 111 00:07:31,772 --> 00:07:35,439 data is and how you are permitted to access that data, that's when you can 112 00:07:35,439 --> 00:07:39,503 bypass in all of those cases, this analog and lower value. But if you're stuck with 113 00:07:39,503 --> 00:07:43,606 a comparison based sort, if you wanna have something. General purpose. You're gonna 114 00:07:43,606 --> 00:07:48,479 be doing n log n comparisons in the worst case. Let's see why. So we have to prove a 115 00:07:48,479 --> 00:07:52,691 lower band for every single comparison based sorting method, so a fixed one. And 116 00:07:52,691 --> 00:07:57,634 let's focus on a particular input length. Call it N. Okay, so now, let's simplify 117 00:07:57,634 --> 00:08:01,217 our lives. Now that we're focused on a comparison based sorting method, one that 118 00:08:01,217 --> 00:08:04,801 doesn't look at the values of the array elements just in the relative order. We 119 00:08:04,801 --> 00:08:08,112 may as well think of the array as just containing the elements... One, two, 120 00:08:08,112 --> 00:08:11,423 three, all the way up to N, in some jumbled order. Now, some other algorithm 121 00:08:11,423 --> 00:08:15,233 could make use of the fact that everything is small integers. But a comparison based 122 00:08:15,233 --> 00:08:18,952 sorting method cannot. So there's no loss in just thinking about an unsorted array 123 00:08:18,952 --> 00:08:22,348 containing the integers [inaudible] N inclusive. Now, depsite seemingly 124 00:08:22,348 --> 00:08:27,000 restricting the space of inputs that we're thinking about, even here, there's kind of 125 00:08:27,000 --> 00:08:31,320 a lot of different inputs we've gotta worry about, right? So N elements can, can 126 00:08:31,320 --> 00:08:35,308 show up, and N factorial different orderings, right? There's N choices for 127 00:08:35,308 --> 00:08:39,573 who the first element is, then N-1 choices for the second element, M minus two 128 00:08:39,573 --> 00:08:43,837 choices for the third element, and so on. So, there's N factorial for how these 129 00:08:43,837 --> 00:08:47,995 elements are, are arranged in the input array. So I don't wanna prove this super 130 00:08:47,995 --> 00:08:51,866 formally, but I wanna give you, the gist, I think, the good intuition. Now, we're 131 00:08:51,866 --> 00:08:56,178 interested in lower bounding the number of comparisons that, this method makes in the 132 00:08:56,178 --> 00:09:00,000 worst case. So let's introduce a parameter K, which is its worst case number of 133 00:09:00,000 --> 00:09:03,626 comparisons. That is, for every input, each of these end factorial inputs, by 134 00:09:03,626 --> 00:09:07,664 assumption, this method makes no more than K comparisons. The idea behind the proof 135 00:09:07,664 --> 00:09:11,200 is that, because we have N factorial fundamentally different inputs, the 136 00:09:11,200 --> 00:09:14,884 sorting method has to execute in a fundamentally different way on each of 137 00:09:14,884 --> 00:09:18,818 those inputs. But since the only thing that causes a branch in the execution of 138 00:09:18,818 --> 00:09:22,404 the sorting method is the resolution of the comparison, and we have only 139 00:09:22,404 --> 00:09:26,537 [inaudible] comparisons, it can only have two to the K different execution paths. So 140 00:09:26,537 --> 00:09:30,570 that forces two to the K to be at least N factorial. And a calculation then shows 141 00:09:30,570 --> 00:09:35,030 that, that forces K to be at least Omega N log N. So let me just quickly fill in the 142 00:09:35,030 --> 00:09:38,339 details. So cross all in-factorial possible inputs just as a thought 143 00:09:38,339 --> 00:09:42,212 experiment. We can imagine running this method in factorial times just looking at 144 00:09:42,212 --> 00:09:45,560 the pattern of how the comparison is resolved. Right? For each of these 145 00:09:45,560 --> 00:09:49,289 in-factorial inputs, we run it through this sorting method, it makes comparison 146 00:09:49,289 --> 00:09:52,733 number one, then comparison number two, then comparison number three, then 147 00:09:52,733 --> 00:09:56,510 comparison number four, then comparison number five, and you know it gets back a 148 00:09:56,510 --> 00:10:00,384 zero, then a one, then a one, then a zero. Give in some other input and it gets back 149 00:10:00,384 --> 00:10:04,018 a one, then a one, then a zero, then a zero and so on. The point is, for each of 150 00:10:04,018 --> 00:10:07,796 these in-factorial inputs, it makes at most K comparisons, we can associate that 151 00:10:07,796 --> 00:10:11,916 with a K bit string, and because it. Is there's only K bits we're only going to 152 00:10:11,916 --> 00:10:16,239 see two to the K different K-bit strings two to the K different ways that a 153 00:10:16,239 --> 00:10:21,114 sequence of comparisons results. Now to finish the proof we are gonna apply 154 00:10:21,114 --> 00:10:25,323 something which I don't get to use as much as I'd like in an evident class but it's 155 00:10:25,323 --> 00:10:30,288 always fun when it comes up, which is the pigeon-hole principle. The [inaudible] 156 00:10:30,288 --> 00:10:34,432 principle you recall is the essentially obvious fact that if you try to stuff K 157 00:10:34,432 --> 00:10:38,732 plus one pigeons into just K cubby holes, one of those K cubby holes has got to get 158 00:10:38,732 --> 00:10:42,980 two of the pigeons. Okay at least one of the cubby holes gets at least two pigeons. 159 00:10:43,260 --> 00:10:47,717 So for us what are the pigeons and what are the holes? So our pigeons are these in 160 00:10:47,717 --> 00:10:51,956 factorial different inputs. The different ways you can scramble the images one 161 00:10:51,956 --> 00:10:56,250 through. And, what are our holes? Those are the two indicate different executions 162 00:10:56,250 --> 00:11:04,526 that the sorting method can possibly take on. Now if. The number of comparisons K 163 00:11:04,526 --> 00:11:09,307 used is so small, that two to the K, the number of distinct execution, number of 164 00:11:09,307 --> 00:11:13,966 distinct ways comparisons can resolve themselves, is less than the number of 165 00:11:13,966 --> 00:11:19,139 different inputs that have to be correctly sorted. Then by the pivotal principal. One 166 00:11:19,139 --> 00:11:24,328 Color [inaudible] gets two holes. That is, two different inputs get treated in 167 00:11:24,328 --> 00:11:29,517 exactly the same way, by the sorting method. They are asked, exactly the same k 168 00:11:29,517 --> 00:11:34,837 comparisons and the comparisons resolve identically. [inaudible] one. Jumbling of 169 00:11:34,837 --> 00:11:39,937 one through N, then you get a 01101 then it's a totally different jumbling of N and 170 00:11:39,937 --> 00:11:44,669 then again you get a 01101 and if this happens the algorithm is toast, in the 171 00:11:44,669 --> 00:11:49,339 sense that it's definitely not correct, right, cuz we've fed it two different 172 00:11:49,339 --> 00:11:53,839 inputs. And it is unable to resolve which of the two it is. Right? So, it may be one 173 00:11:53,839 --> 00:11:57,908 premutation of one through N, or this totally different premutation of one 174 00:11:57,908 --> 00:12:02,032 through N. The algorithm has tried to learn about what the input is through 175 00:12:02,032 --> 00:12:06,432 these K comparisons, but it has exactly the same data about the input in two, the 176 00:12:06,432 --> 00:12:10,886 two cases. So, if it outputs the correct sorted version in one case, it's gonna get 177 00:12:10,886 --> 00:12:15,285 the other one wrong. So, you can't have a common execution of a sorting algorithm 178 00:12:15,285 --> 00:12:20,052 unscramble totally different premutations. It can't be done. So what have we learned? 179 00:12:20,052 --> 00:12:24,745 We've learned that by correctness, two to the K is in fact at least in the 180 00:12:24,745 --> 00:12:28,982 factorial. So how does that help us? Well, we wanna lower bound K. K is the number of 181 00:12:28,982 --> 00:12:33,135 comparisons this arbitrary storing method is using. They wanna show that's at least 182 00:12:33,135 --> 00:12:36,737 N log N. So we, to lower bound K, we better lower bound N factorial. So, you 183 00:12:36,737 --> 00:12:40,239 know, you could use Stirling's Approximation or something fancy. But we 184 00:12:40,239 --> 00:12:44,342 don't need anything fancy here. We'll just do something super crude. We'll just say, 185 00:12:44,342 --> 00:12:48,095 well, look. This is the product of N things, right? N times N minus one time N 186 00:12:48,095 --> 00:12:51,997 minus two, blah, blah, blah, blah. And the largest of those, the N over two largest 187 00:12:51,997 --> 00:12:56,216 of those N terms are all at least N over two. The rest we'll just ignore. Pretty 188 00:12:56,216 --> 00:13:03,615 sloppy, but it gives us a lower bound of N divided by two raised to the N divided by 189 00:13:03,615 --> 00:13:10,838 two. Now we'll just take log base two of both sides, and we get the K is at least N 190 00:13:10,838 --> 00:13:17,796 over two, log base two of N over two, also known as omega of N log N. And that my 191 00:13:17,796 --> 00:13:25,019 friends is why a heretics deterministic sorting algorithm that's comparison based 192 00:13:25,019 --> 00:13:29,600 has gotta use N log N comparisons in the worst case.