1 00:00:00,012 --> 00:00:04,020 So just like in the Hopcroft Ullman analysis we're going to think about the 2 00:00:04,020 --> 00:00:08,087 total amount of work done by our union fine data structure as constituting 2 3 00:00:08,087 --> 00:00:12,204 pieces, work done visiting good nodes and work done visiting bad nodes. 4 00:00:12,204 --> 00:00:16,312 The previous quiz says that we have a bound on visits to good nodes, even on a 5 00:00:16,312 --> 00:00:19,705 per operation basis. At most inverse acronym for N, good nodes 6 00:00:19,705 --> 00:00:23,856 are visited every single operation. What remains is to have a global bound on 7 00:00:23,856 --> 00:00:28,793 the number of visits to bad nodes. The argument will be to show, that over 8 00:00:28,793 --> 00:00:34,851 an arbitrary sequence of m find and union operations, the total number of visits to 9 00:00:34,851 --> 00:00:39,237 bad nodes is bounded above by big O of n, times, alpha of n. 10 00:00:39,237 --> 00:00:44,789 So here is the crux of the argument. Here is why, when you do a visit to a bad 11 00:00:44,789 --> 00:00:50,844 node, the subsequent path compression massively increases the gap between that 12 00:00:50,844 --> 00:00:56,021 object's rank and the rank of the parent. So, let's freeze the data structure at 13 00:00:56,021 --> 00:01:00,741 the moment where we found the operation and makes a visit to a bad object, 14 00:01:00,741 --> 00:01:04,907 call that Bad Object X. Let's think about what the world with the 15 00:01:04,907 --> 00:01:08,532 data structure must look like in this scenario. 16 00:01:08,532 --> 00:01:12,989 So we've got our bad object X, it may or may not have nodes pointing to 17 00:01:12,989 --> 00:01:14,676 it. We're not going to care. 18 00:01:14,676 --> 00:01:19,806 By virtue of it being bad and meeting the third criterion we know the rank of X is 19 00:01:19,806 --> 00:01:22,889 at least 2. It's delta value could be anything. 20 00:01:22,889 --> 00:01:25,532 Whatever it is, let's call it. Okay. 21 00:01:25,532 --> 00:01:30,540 So because x is not a root, it has some parent, call that parent p. 22 00:01:30,540 --> 00:01:36,727 And not only does x have ancestors, but because it meets the fourth criterion, we 23 00:01:36,727 --> 00:01:41,857 actually know it has an ancestor y who has the same delta value as x. 24 00:01:41,857 --> 00:01:46,002 That is it has an ancestor y With delta of y equal to k. 25 00:01:46,002 --> 00:01:51,332 It is possible that p and y are the same object ir they could be different. 26 00:01:51,332 --> 00:01:57,067 It's not going to matter in the analysis. So we're calling that the statistic delta 27 00:01:57,067 --> 00:02:01,947 is only definied for non roots, we can conclude that y is also not the 28 00:02:01,947 --> 00:02:05,242 root. It must then have some parent p prime. 29 00:02:05,242 --> 00:02:08,412 P prime might be a root or it might not, we don't care. 30 00:02:08,412 --> 00:02:13,167 [SOUND] So, now let's understand of the effect of the past compression, which is 31 00:02:13,167 --> 00:02:16,187 going to happen subsequent to this find operation. 32 00:02:16,187 --> 00:02:20,007 X's parent points is going to be rewired to the root of this tree. 33 00:02:20,007 --> 00:02:24,242 The root of this tree is either at P prime, or even higher than that. 34 00:02:24,242 --> 00:02:31,169 Given that fact let's understand how much bigger the rank of X's new parent P 35 00:02:31,169 --> 00:02:36,552 primer higher is compared to the rank of it's old parent P. 36 00:02:36,552 --> 00:02:39,875 So the rank of x's new parent is at least the rank of P prime. 37 00:02:39,875 --> 00:02:44,420 So if p' is in fact the root, then the rank of x's new parent is just the rank 38 00:02:44,420 --> 00:02:47,375 of p'. Otherwise, this new parnet is even higher 39 00:02:47,375 --> 00:02:51,460 than P prime because its ranks only increase going up the tree, that means it 40 00:02:51,460 --> 00:02:54,002 would only be higher than the rank of P prime. 41 00:02:54,002 --> 00:02:59,366 How does the rank of P prime compare to its child, that of its child y? Well and 42 00:02:59,366 --> 00:03:02,933 here is a key point, because delta of y is equal to k. 43 00:03:02,933 --> 00:03:08,577 Remember what the definition of delta is, it quantifies the gap between the rank of 44 00:03:08,577 --> 00:03:13,384 an object, and that of its parent. We are going to use that here for y, and 45 00:03:13,384 --> 00:03:17,392 its parent P prime. It means the rank of P prime is so big 46 00:03:17,392 --> 00:03:21,417 It's at least the function, a sub k applied to y's rank. 47 00:03:21,417 --> 00:03:25,592 So our third and final inequality is the same as the first one. 48 00:03:25,592 --> 00:03:30,742 So it could be the case that y actually is the same as p, it actually is x's old 49 00:03:30,742 --> 00:03:33,867 parent. In that case the rank of x's old parent 50 00:03:33,867 --> 00:03:38,288 is precisely the rank of y. Otherwise, y is even higher up X's old 51 00:03:38,288 --> 00:03:42,384 parent P and therefore, by the monotonicity of rank, the rank of Y is 52 00:03:42,384 --> 00:03:45,101 even bigger, than the rank of X's old parent. 53 00:03:45,101 --> 00:03:49,461 So now in this chain of inequalities, I want you to focus on the left-most 54 00:03:49,461 --> 00:03:54,044 quantity, and the right-most quantity. What does this say, at the end of the 55 00:03:54,044 --> 00:03:58,074 day? It says when you apply path compression to X, it acquires a new 56 00:03:58,074 --> 00:04:01,428 parent. And the rank of that new parent is at 57 00:04:01,428 --> 00:04:06,520 least the A sub K function applied to the rank of it's old parent. 58 00:04:06,520 --> 00:04:12,908 Again, path compression at the very least applies the A sub K function to the rank 59 00:04:12,908 --> 00:04:17,124 of X's parent. So now let's suppose you visit some bad 60 00:04:17,124 --> 00:04:23,113 object X, not just once, but the same object X, while it's bad over and over 61 00:04:23,113 --> 00:04:26,960 and over again. This argument says that every such visit 62 00:04:26,960 --> 00:04:32,147 to the object x while it was bad applies the function A sub k to the rank of its 63 00:04:32,147 --> 00:04:35,487 parent. So in particular, let's use r to denote 64 00:04:35,487 --> 00:04:40,926 the rank of this bad object x and, again, by virtue of it being bad, r has to be at 65 00:04:40,926 --> 00:04:44,316 least 2. Imagine we do a sequence of R visits to 66 00:04:44,316 --> 00:04:49,146 this object X while it's bad. Each of those visits will result in it 67 00:04:49,146 --> 00:04:53,764 acquiring a new parent. And that new parents rank is much bigger 68 00:04:53,764 --> 00:04:59,074 than the rank of the previous parent. It's at least A sub K, applied to the 69 00:04:59,074 --> 00:05:02,622 rank of the previous parent. So, after R visits, 70 00:05:02,622 --> 00:05:08,572 that's applying the function A sub K, R times to the rank of X's original parent 71 00:05:08,572 --> 00:05:12,212 which forces rank at least that of X, at least R. 72 00:05:12,212 --> 00:05:17,677 Well we have another name for applying the function A sub K, R times to R. 73 00:05:17,677 --> 00:05:23,412 Remember this is just by definition of the acronym function A sub K plus 1. 74 00:05:23,412 --> 00:05:27,282 Applied to r. So what does this mean? Well this means, 75 00:05:27,282 --> 00:05:32,767 that after r visits, to a bad object x, that has rank r, the rank of x's parent 76 00:05:32,767 --> 00:05:38,047 has to have grown, so much, that that growth is reflected in our statistic 77 00:05:38,047 --> 00:05:44,204 delta, that measures the gap in the rank, between objects, and the objects parent. 78 00:05:44,204 --> 00:05:49,986 So remember how we defined delta of x. It's the largest value of k so that the 79 00:05:49,986 --> 00:05:54,944 rank of x's parent is even bigger than a sub k applied to the rank of x. 80 00:05:54,944 --> 00:06:01,106 So what this inequality shows is that every r Parent pointer updates to x allow 81 00:06:01,106 --> 00:06:04,636 you to take k to be even bigger than before. 82 00:06:04,636 --> 00:06:10,033 You can bump up k by 1 and still have this inequality be satisfied. 83 00:06:10,033 --> 00:06:16,206 That is, every r visits to a bad object of rank r, delta has to increase by at 84 00:06:16,206 --> 00:06:19,433 least 1. But there's only so many distinct values 85 00:06:19,433 --> 00:06:24,107 of the statistic delta of X can take on. It's always non negative, it's always an 86 00:06:24,107 --> 00:06:28,770 integer, it can only increase, and it's never bigger than the inverse Ackermann 87 00:06:28,770 --> 00:06:32,270 function alpha of N. So therefore, the total number of visits 88 00:06:32,270 --> 00:06:36,905 to an object X, while it's bad, over an arbitrary sequence, can not be more than 89 00:06:36,905 --> 00:06:40,110 R, the number of visits needed to increment delta of X, 90 00:06:40,110 --> 00:06:44,312 times the number of distinct values, which is alpha of N, the inverse 91 00:06:44,312 --> 00:06:47,513 function. So now we've done all the hard work. 92 00:06:47,513 --> 00:06:52,431 We're almost there, we just need 1 final slide putting it all together. 93 00:06:52,431 --> 00:06:57,630 All right, so to see how all of this comes together, let's first recall that 94 00:06:57,630 --> 00:07:03,162 all that we need to do is bound the total number of visits to bad objects over this 95 00:07:03,162 --> 00:07:06,472 entire sequence of union and find operation. 96 00:07:06,472 --> 00:07:10,693 So in the previous slide, we showed that for a bad object x, with rank r, the 97 00:07:10,693 --> 00:07:15,205 total number of times it's visited while it's bad, is bounded above by r times the 98 00:07:15,205 --> 00:07:19,240 inverse Ackermann function of n. So lets sum that over all of the objects 99 00:07:19,240 --> 00:07:23,172 to get our initial bound on the total number of visits to bad objects. 100 00:07:23,172 --> 00:07:28,147 So now, we have to be a little careful here, because there are n objects x. 101 00:07:28,147 --> 00:07:33,422 and ranks can be as large as log n. So a naive-bound would give us something 102 00:07:33,422 --> 00:07:37,072 like n times LogN times alpha of n, and that's not what we want. 103 00:07:37,072 --> 00:07:41,546 We really want n times alpha of n. So we need to use the fact that not all 104 00:07:41,546 --> 00:07:44,885 big nodes can have big rank. And that's exactly what the rank Lemma 105 00:07:44,885 --> 00:07:46,974 says. So to rewrite this, in a way that we can 106 00:07:46,974 --> 00:07:50,682 apply the rank Lemma, bounding a number of objects with a given rank, lets bucket 107 00:07:50,682 --> 00:07:54,242 the objects according to their rank. So rather than summing over the objects, 108 00:07:54,242 --> 00:07:57,630 we're going to sum over possible ranks r, and then we're going to multiply times 109 00:07:57,630 --> 00:07:59,832 the number of objects that have that rank R. 110 00:07:59,832 --> 00:08:05,567 While I'm at it, let me pull the alpha of n factor, which doesn't depend on the 111 00:08:05,567 --> 00:08:10,967 object out in front of the sum. For every value of R, the rank Lemma 112 00:08:10,967 --> 00:08:15,587 tells us there are at most N/(2^R) objects with that rank. 113 00:08:15,587 --> 00:08:21,847 So this factor of N is independent of the rank R, so let me pull that out in front 114 00:08:21,847 --> 00:08:27,123 of the sum. And when the dust settles we get n times 115 00:08:27,123 --> 00:08:36,018 the inverse acronym function of n times the sum of non-negative integer r of r 116 00:08:36,018 --> 00:08:39,591 divided by 2 to the r. So, we've seen this kind of sum without 117 00:08:39,591 --> 00:08:42,385 the r in the numerator, just a geometric sum 1/2^r. 118 00:08:42,385 --> 00:08:46,376 We know that's bounded by a constant, and more generally throwing an r in the 119 00:08:46,376 --> 00:08:50,830 numerator, well that's no match for this exponentially decreasing denominator. 120 00:08:50,830 --> 00:08:53,752 So, this sum still evaluates to, a universal constant. 121 00:08:53,752 --> 00:08:56,792 I'll let you check that in the privacy of your own home. 122 00:08:56,792 --> 00:09:02,003 And so that's, give us a bound of O(n) * the inverse Ackermann function of n, on 123 00:09:02,003 --> 00:09:06,374 the total number of visits to bad objects, since we also have a per 124 00:09:06,374 --> 00:09:10,769 operation bound of alpha(n) on the number of visits to good nodes. 125 00:09:10,769 --> 00:09:15,678 Combining the work of the two cases we get O(n) m+n times inverse Ackermann 126 00:09:15,678 --> 00:09:19,083 function of n. And again, the interesting case here is 127 00:09:19,083 --> 00:09:22,582 when m is omega of n. Otherwise, you can just do this analysis 128 00:09:22,582 --> 00:09:25,360 separately for each 3 in the data structure. 129 00:09:25,360 --> 00:09:29,323 And, there you have it. You now understand in full detail one of 130 00:09:29,323 --> 00:09:34,002 the most amazing results in the history of algorithms and data structures. 131 00:09:34,002 --> 00:09:39,285 So, Tarjans bound is unimagenably close to being linear without actually being 132 00:09:39,285 --> 00:09:43,186 linear, it's off by this inverse Acromon function factor. 133 00:09:43,186 --> 00:09:48,347 Now, from a practical perspective for any imagenable value of N, alpha of N is 134 00:09:48,347 --> 00:09:53,047 almost 4, so this gives you a linear time bound for Imaginable values of n. 135 00:09:53,047 --> 00:09:57,242 In fact, even the Hopcroft-Ullman log starbound, logs stars at most 5, for any 136 00:09:57,242 --> 00:10:00,658 imaginable value of n. So that also is in, in essence linear 137 00:10:00,658 --> 00:10:04,894 time bound for all practical purposes. But theorotically speaking, we have not 138 00:10:04,894 --> 00:10:09,181 proven a linear time bound, on the amount of work done by the union find Data 139 00:10:09,181 --> 00:10:11,785 structures. So the Pavlovian response, of any 140 00:10:11,785 --> 00:10:16,261 algorithms researcher worth their salt, would be to ask, can we do better? And, 141 00:10:16,261 --> 00:10:19,403 we could ask that question in a couple different senses. 142 00:10:19,403 --> 00:10:23,707 The first sense would be; well maybe we could have a sharper analysis, of this 143 00:10:23,707 --> 00:10:26,691 exact data structure. Maybe, union by rank, and path 144 00:10:26,691 --> 00:10:30,251 compression is sufficient To gaurantee a linear amount of work. 145 00:10:30,251 --> 00:10:34,570 After all, we didn't need to change the data structure, we only needed to change 146 00:10:34,570 --> 00:10:38,038 the analysis, to sharpen the log* bound, to an alpha of n bound. 147 00:10:38,038 --> 00:10:42,366 So this question, remarkably, Tarjan answered in the negative, in his original 148 00:10:42,366 --> 00:10:44,765 paper. He showed that, if you use this exact 149 00:10:44,765 --> 00:10:49,009 data structure, union by rank and path compression, then they are arbitrarily 150 00:10:49,009 --> 00:10:54,260 large Sequences of unions and finds of arbitrarily large collections of objects, 151 00:10:54,260 --> 00:10:59,337 so that this data structure actually performs asymptotically, M the number of 152 00:10:59,337 --> 00:11:04,558 operations times the inverse Ackermann function, N the number of objects amount 153 00:11:04,558 --> 00:11:07,857 of work. That is keeping the data structure fixed, 154 00:11:07,857 --> 00:11:11,128 this analysis cannot be asymptotically improved. 155 00:11:11,128 --> 00:11:16,117 This data structure fundamentally has Worst case performance, governed by this 156 00:11:16,117 --> 00:11:20,222 insane inverse Ackermann function. So this is already a mind boggling fact 157 00:11:20,222 --> 00:11:24,617 that indeed Tarjan in the conclusion of his paper, notes that it's remarkable to 158 00:11:24,617 --> 00:11:28,277 have such a simple algorithm with such a complicated running time. 159 00:11:28,277 --> 00:11:32,432 But you could also ask the question, could we do better perhaps by improving 160 00:11:32,432 --> 00:11:36,527 or changing the data structure. After all, by adding path compression, we 161 00:11:36,527 --> 00:11:39,972 got a qualitative improvement in the average operation time. 162 00:11:39,972 --> 00:11:44,316 That dropped from log to alpha of n. Perhaps there's yet another optimization 163 00:11:44,316 --> 00:11:48,146 out there waiting to be discovered that would drop the amount of work per 164 00:11:48,146 --> 00:11:52,292 operation over a sequence down to constant time per operation, linear work 165 00:11:52,292 --> 00:11:55,009 overall. Tarjan, in his paper made the bold 166 00:11:55,009 --> 00:11:59,633 conjecture that there is no linear time method, no matter how clever you are. 167 00:11:59,633 --> 00:12:02,985 And remarkably, that conjecture has since been proved. 168 00:12:02,985 --> 00:12:08,162 It was proved for certain classes of data structures both in Tarjan in his original 169 00:12:08,162 --> 00:12:10,693 paper and in And a follow-up paper in '79. 170 00:12:10,693 --> 00:12:14,465 But the final version, of this conjecture, was proven by Friedman and 171 00:12:14,465 --> 00:12:17,731 Sachs, back in 1989. No matter how clever you are, no matter 172 00:12:17,731 --> 00:12:22,034 what union-find data structure you come up with, there will be arbitrarily long 173 00:12:22,034 --> 00:12:25,806 sequences of operations, so that the average amount of work you do per 174 00:12:25,806 --> 00:12:29,707 operation Is, the inverse Ackerman function of, the number of objects, in 175 00:12:29,707 --> 00:12:31,807 the data structure. Pretty unbelivable, 176 00:12:31,807 --> 00:12:34,282 so let me just wrap up with a historical comment. 177 00:12:34,282 --> 00:12:38,037 So, it's a full disclosure, I wasn't quite alive when this result came out 178 00:12:38,037 --> 00:12:42,227 but, in talking to, in reading about it, and talking to senior researchers about, 179 00:12:42,227 --> 00:12:46,252 my sense is that it was really sort of a water-shed moment, for the field of data 180 00:12:46,252 --> 00:12:50,845 structures and And algorithms. Specifically it confirmed the belief of, 181 00:12:50,845 --> 00:12:55,366 people working in algorithms and data structures, that the field posessed 182 00:12:55,366 --> 00:12:59,401 surprising depth, and beauty. There had, of course, been earlier 183 00:12:59,401 --> 00:13:04,340 glimpses of this, we mentioned in the optional, material in part 1, Kanuth's 184 00:13:04,340 --> 00:13:07,722 analysis of linear probing, back in the early 1960's. 185 00:13:07,722 --> 00:13:11,784 But this was really something for the worst case analysis, of algorithms and 186 00:13:11,784 --> 00:13:14,669 data structures. So the fact that such a practical and 187 00:13:14,669 --> 00:13:18,623 naturally arising problem, in algorithms and data structures, requires, 188 00:13:18,623 --> 00:13:22,693 necessarily, the understanding of the Ackermann function and it's inverse. 189 00:13:22,693 --> 00:13:27,196 A function, mind you, which was first Proposed and defined back in 1920s in the 190 00:13:27,196 --> 00:13:31,658 service of recursion theory, almost 10 years before Tarjan was doing his work on 191 00:13:31,658 --> 00:13:35,033 models of computation. It was a real eye opener and it showed 192 00:13:35,033 --> 00:13:39,506 that this field is something that will keep many generations of scientists quite 193 00:13:39,506 --> 00:13:40,582 busy for many years to come.