1
00:00:00,012 --> 00:00:04,020
So just like in the Hopcroft Ullman 
analysis we're going to think about the 

2
00:00:04,020 --> 00:00:08,087
total amount of work done by our union 
fine data structure as constituting 2 

3
00:00:08,087 --> 00:00:12,204
pieces, work done visiting good nodes and 
work done visiting bad nodes. 

4
00:00:12,204 --> 00:00:16,312
The previous quiz says that we have a 
bound on visits to good nodes, even on a 

5
00:00:16,312 --> 00:00:19,705
per operation basis. 
At most inverse acronym for N, good nodes 

6
00:00:19,705 --> 00:00:23,856
are visited every single operation. 
What remains is to have a global bound on 

7
00:00:23,856 --> 00:00:28,793
the number of visits to bad nodes. 
The argument will be to show, that over 

8
00:00:28,793 --> 00:00:34,851
an arbitrary sequence of m find and union 
operations, the total number of visits to 

9
00:00:34,851 --> 00:00:39,237
bad nodes is bounded above by big O of n, 
times, alpha of n. 

10
00:00:39,237 --> 00:00:44,789
So here is the crux of the argument. 
Here is why, when you do a visit to a bad 

11
00:00:44,789 --> 00:00:50,844
node, the subsequent path compression 
massively increases the gap between that 

12
00:00:50,844 --> 00:00:56,021
object's rank and the rank of the parent. 
So, let's freeze the data structure at 

13
00:00:56,021 --> 00:01:00,741
the moment where we found the operation 
and makes a visit to a bad object, 

14
00:01:00,741 --> 00:01:04,907
call that Bad Object X. 
Let's think about what the world with the 

15
00:01:04,907 --> 00:01:08,532
data structure must look like in this 
scenario. 

16
00:01:08,532 --> 00:01:12,989
So we've got our bad object X, 
it may or may not have nodes pointing to 

17
00:01:12,989 --> 00:01:14,676
it. 
We're not going to care. 

18
00:01:14,676 --> 00:01:19,806
By virtue of it being bad and meeting the 
third criterion we know the rank of X is 

19
00:01:19,806 --> 00:01:22,889
at least 2. 
It's delta value could be anything. 

20
00:01:22,889 --> 00:01:25,532
Whatever it is, let's call it. 
Okay. 

21
00:01:25,532 --> 00:01:30,540
So because x is not a root, it has some 
parent, call that parent p. 

22
00:01:30,540 --> 00:01:36,727
And not only does x have ancestors, but 
because it meets the fourth criterion, we 

23
00:01:36,727 --> 00:01:41,857
actually know it has an ancestor y who 
has the same delta value as x. 

24
00:01:41,857 --> 00:01:46,002
That is it has an ancestor y With delta 
of y equal to k. 

25
00:01:46,002 --> 00:01:51,332
It is possible that p and y are the same 
object ir they could be different. 

26
00:01:51,332 --> 00:01:57,067
It's not going to matter in the analysis. 
So we're calling that the statistic delta 

27
00:01:57,067 --> 00:02:01,947
is only definied for non roots, 
we can conclude that y is also not the 

28
00:02:01,947 --> 00:02:05,242
root. 
It must then have some parent p prime. 

29
00:02:05,242 --> 00:02:08,412
P prime might be a root or it might not, 
we don't care. 

30
00:02:08,412 --> 00:02:13,167
[SOUND] So, now let's understand of the 
effect of the past compression, which is 

31
00:02:13,167 --> 00:02:16,187
going to happen subsequent to this find 
operation. 

32
00:02:16,187 --> 00:02:20,007
X's parent points is going to be rewired 
to the root of this tree. 

33
00:02:20,007 --> 00:02:24,242
The root of this tree is either at P 
prime, or even higher than that. 

34
00:02:24,242 --> 00:02:31,169
Given that fact let's understand how much 
bigger the rank of X's new parent P 

35
00:02:31,169 --> 00:02:36,552
primer higher is compared to the rank of 
it's old parent P. 

36
00:02:36,552 --> 00:02:39,875
So the rank of x's new parent is at least 
the rank of P prime. 

37
00:02:39,875 --> 00:02:44,420
So if p' is in fact the root, then the 
rank of x's new parent is just the rank 

38
00:02:44,420 --> 00:02:47,375
of p'. 
Otherwise, this new parnet is even higher 

39
00:02:47,375 --> 00:02:51,460
than P prime because its ranks only 
increase going up the tree, that means it 

40
00:02:51,460 --> 00:02:54,002
would only be higher than the rank of P 
prime. 

41
00:02:54,002 --> 00:02:59,366
How does the rank of P prime compare to 
its child, that of its child y? Well and 

42
00:02:59,366 --> 00:03:02,933
here is a key point, because delta of y 
is equal to k. 

43
00:03:02,933 --> 00:03:08,577
Remember what the definition of delta is, 
it quantifies the gap between the rank of 

44
00:03:08,577 --> 00:03:13,384
an object, and that of its parent. 
We are going to use that here for y, and 

45
00:03:13,384 --> 00:03:17,392
its parent P prime. 
It means the rank of P prime is so big 

46
00:03:17,392 --> 00:03:21,417
It's at least the function, a sub k 
applied to y's rank. 

47
00:03:21,417 --> 00:03:25,592
So our third and final inequality is the 
same as the first one. 

48
00:03:25,592 --> 00:03:30,742
So it could be the case that y actually 
is the same as p, it actually is x's old 

49
00:03:30,742 --> 00:03:33,867
parent. 
In that case the rank of x's old parent 

50
00:03:33,867 --> 00:03:38,288
is precisely the rank of y. 
Otherwise, y is even higher up X's old 

51
00:03:38,288 --> 00:03:42,384
parent P and therefore, by the 
monotonicity of rank, the rank of Y is 

52
00:03:42,384 --> 00:03:45,101
even bigger, than the rank of X's old 
parent. 

53
00:03:45,101 --> 00:03:49,461
So now in this chain of inequalities, I 
want you to focus on the left-most 

54
00:03:49,461 --> 00:03:54,044
quantity, and the right-most quantity. 
What does this say, at the end of the 

55
00:03:54,044 --> 00:03:58,074
day? It says when you apply path 
compression to X, it acquires a new 

56
00:03:58,074 --> 00:04:01,428
parent. 
And the rank of that new parent is at 

57
00:04:01,428 --> 00:04:06,520
least the A sub K function applied to the 
rank of it's old parent. 

58
00:04:06,520 --> 00:04:12,908
Again, path compression at the very least 
applies the A sub K function to the rank 

59
00:04:12,908 --> 00:04:17,124
of X's parent. 
So now let's suppose you visit some bad 

60
00:04:17,124 --> 00:04:23,113
object X, not just once, but the same 
object X, while it's bad over and over 

61
00:04:23,113 --> 00:04:26,960
and over again. 
This argument says that every such visit 

62
00:04:26,960 --> 00:04:32,147
to the object x while it was bad applies 
the function A sub k to the rank of its 

63
00:04:32,147 --> 00:04:35,487
parent. 
So in particular, let's use r to denote 

64
00:04:35,487 --> 00:04:40,926
the rank of this bad object x and, again, 
by virtue of it being bad, r has to be at 

65
00:04:40,926 --> 00:04:44,316
least 2. 
Imagine we do a sequence of R visits to 

66
00:04:44,316 --> 00:04:49,146
this object X while it's bad. 
Each of those visits will result in it 

67
00:04:49,146 --> 00:04:53,764
acquiring a new parent. 
And that new parents rank is much bigger 

68
00:04:53,764 --> 00:04:59,074
than the rank of the previous parent. 
It's at least A sub K, applied to the 

69
00:04:59,074 --> 00:05:02,622
rank of the previous parent. 
So, after R visits, 

70
00:05:02,622 --> 00:05:08,572
that's applying the function A sub K, R 
times to the rank of X's original parent 

71
00:05:08,572 --> 00:05:12,212
which forces rank at least that of X, 
at least R. 

72
00:05:12,212 --> 00:05:17,677
Well we have another name for applying 
the function A sub K, R times to R. 

73
00:05:17,677 --> 00:05:23,412
Remember this is just by definition of 
the acronym function A sub K plus 1. 

74
00:05:23,412 --> 00:05:27,282
Applied to r. 
So what does this mean? Well this means, 

75
00:05:27,282 --> 00:05:32,767
that after r visits, to a bad object x, 
that has rank r, the rank of x's parent 

76
00:05:32,767 --> 00:05:38,047
has to have grown, so much, that that 
growth is reflected in our statistic 

77
00:05:38,047 --> 00:05:44,204
delta, that measures the gap in the rank, 
between objects, and the objects parent. 

78
00:05:44,204 --> 00:05:49,986
So remember how we defined delta of x. 
It's the largest value of k so that the 

79
00:05:49,986 --> 00:05:54,944
rank of x's parent is even bigger than a 
sub k applied to the rank of x. 

80
00:05:54,944 --> 00:06:01,106
So what this inequality shows is that 
every r Parent pointer updates to x allow 

81
00:06:01,106 --> 00:06:04,636
you to take k to be even bigger than 
before. 

82
00:06:04,636 --> 00:06:10,033
You can bump up k by 1 and still have 
this inequality be satisfied. 

83
00:06:10,033 --> 00:06:16,206
That is, every r visits to a bad object 
of rank r, delta has to increase by at 

84
00:06:16,206 --> 00:06:19,433
least 1. 
But there's only so many distinct values 

85
00:06:19,433 --> 00:06:24,107
of the statistic delta of X can take on. 
It's always non negative, it's always an 

86
00:06:24,107 --> 00:06:28,770
integer, it can only increase, and it's 
never bigger than the inverse Ackermann 

87
00:06:28,770 --> 00:06:32,270
function alpha of N. 
So therefore, the total number of visits 

88
00:06:32,270 --> 00:06:36,905
to an object X, while it's bad, over an 
arbitrary sequence, can not be more than 

89
00:06:36,905 --> 00:06:40,110
R, the number of visits needed to 
increment delta of X, 

90
00:06:40,110 --> 00:06:44,312
times the number of distinct values, 
which is alpha of N, the inverse 

91
00:06:44,312 --> 00:06:47,513
function. 
So now we've done all the hard work. 

92
00:06:47,513 --> 00:06:52,431
We're almost there, we just need 1 final 
slide putting it all together. 

93
00:06:52,431 --> 00:06:57,630
All right, so to see how all of this 
comes together, let's first recall that 

94
00:06:57,630 --> 00:07:03,162
all that we need to do is bound the total 
number of visits to bad objects over this 

95
00:07:03,162 --> 00:07:06,472
entire sequence of union and find 
operation. 

96
00:07:06,472 --> 00:07:10,693
So in the previous slide, we showed that 
for a bad object x, with rank r, the 

97
00:07:10,693 --> 00:07:15,205
total number of times it's visited while 
it's bad, is bounded above by r times the 

98
00:07:15,205 --> 00:07:19,240
inverse Ackermann function of n. 
So lets sum that over all of the objects 

99
00:07:19,240 --> 00:07:23,172
to get our initial bound on the total 
number of visits to bad objects. 

100
00:07:23,172 --> 00:07:28,147
So now, we have to be a little careful 
here, because there are n objects x. 

101
00:07:28,147 --> 00:07:33,422
and ranks can be as large as log n. 
So a naive-bound would give us something 

102
00:07:33,422 --> 00:07:37,072
like n times LogN times alpha of n, and 
that's not what we want. 

103
00:07:37,072 --> 00:07:41,546
We really want n times alpha of n. 
So we need to use the fact that not all 

104
00:07:41,546 --> 00:07:44,885
big nodes can have big rank. 
And that's exactly what the rank Lemma 

105
00:07:44,885 --> 00:07:46,974
says. 
So to rewrite this, in a way that we can 

106
00:07:46,974 --> 00:07:50,682
apply the rank Lemma, bounding a number 
of objects with a given rank, lets bucket 

107
00:07:50,682 --> 00:07:54,242
the objects according to their rank. 
So rather than summing over the objects, 

108
00:07:54,242 --> 00:07:57,630
we're going to sum over possible ranks r, 
and then we're going to multiply times 

109
00:07:57,630 --> 00:07:59,832
the number of objects that have that rank 
R. 

110
00:07:59,832 --> 00:08:05,567
While I'm at it, let me pull the alpha of 
n factor, which doesn't depend on the 

111
00:08:05,567 --> 00:08:10,967
object out in front of the sum. 
For every value of R, the rank Lemma 

112
00:08:10,967 --> 00:08:15,587
tells us there are at most N/(2^R) 
objects with that rank. 

113
00:08:15,587 --> 00:08:21,847
So this factor of N is independent of the 
rank R, so let me pull that out in front 

114
00:08:21,847 --> 00:08:27,123
of the sum. 
And when the dust settles we get n times 

115
00:08:27,123 --> 00:08:36,018
the inverse acronym function of n times 
the sum of non-negative integer r of r 

116
00:08:36,018 --> 00:08:39,591
divided by 2 to the r. 
So, we've seen this kind of sum without 

117
00:08:39,591 --> 00:08:42,385
the r in the numerator, just a geometric 
sum 1/2^r. 

118
00:08:42,385 --> 00:08:46,376
We know that's bounded by a constant, and 
more generally throwing an r in the 

119
00:08:46,376 --> 00:08:50,830
numerator, well that's no match for this 
exponentially decreasing denominator. 

120
00:08:50,830 --> 00:08:53,752
So, this sum still evaluates to, a 
universal constant. 

121
00:08:53,752 --> 00:08:56,792
I'll let you check that in the privacy of 
your own home. 

122
00:08:56,792 --> 00:09:02,003
And so that's, give us a bound of O(n) * 
the inverse Ackermann function of n, on 

123
00:09:02,003 --> 00:09:06,374
the total number of visits to bad 
objects, since we also have a per 

124
00:09:06,374 --> 00:09:10,769
operation bound of alpha(n) on the number 
of visits to good nodes. 

125
00:09:10,769 --> 00:09:15,678
Combining the work of the two cases we 
get O(n) m+n times inverse Ackermann 

126
00:09:15,678 --> 00:09:19,083
function of n. 
And again, the interesting case here is 

127
00:09:19,083 --> 00:09:22,582
when m is omega of n. 
Otherwise, you can just do this analysis 

128
00:09:22,582 --> 00:09:25,360
separately for each 3 in the data 
structure. 

129
00:09:25,360 --> 00:09:29,323
And, there you have it. 
You now understand in full detail one of 

130
00:09:29,323 --> 00:09:34,002
the most amazing results in the history 
of algorithms and data structures. 

131
00:09:34,002 --> 00:09:39,285
So, Tarjans bound is unimagenably close 
to being linear without actually being 

132
00:09:39,285 --> 00:09:43,186
linear, it's off by this inverse Acromon 
function factor. 

133
00:09:43,186 --> 00:09:48,347
Now, from a practical perspective for any 
imagenable value of N, alpha of N is 

134
00:09:48,347 --> 00:09:53,047
almost 4, so this gives you a linear time 
bound for Imaginable values of n. 

135
00:09:53,047 --> 00:09:57,242
In fact, even the Hopcroft-Ullman log 
starbound, logs stars at most 5, for any 

136
00:09:57,242 --> 00:10:00,658
imaginable value of n. 
So that also is in, in essence linear 

137
00:10:00,658 --> 00:10:04,894
time bound for all practical purposes. 
But theorotically speaking, we have not 

138
00:10:04,894 --> 00:10:09,181
proven a linear time bound, on the amount 
of work done by the union find Data 

139
00:10:09,181 --> 00:10:11,785
structures. 
So the Pavlovian response, of any 

140
00:10:11,785 --> 00:10:16,261
algorithms researcher worth their salt, 
would be to ask, can we do better? And, 

141
00:10:16,261 --> 00:10:19,403
we could ask that question in a couple 
different senses. 

142
00:10:19,403 --> 00:10:23,707
The first sense would be; well maybe we 
could have a sharper analysis, of this 

143
00:10:23,707 --> 00:10:26,691
exact data structure. 
Maybe, union by rank, and path 

144
00:10:26,691 --> 00:10:30,251
compression is sufficient To gaurantee a 
linear amount of work. 

145
00:10:30,251 --> 00:10:34,570
After all, we didn't need to change the 
data structure, we only needed to change 

146
00:10:34,570 --> 00:10:38,038
the analysis, to sharpen the log* bound, 
to an alpha of n bound. 

147
00:10:38,038 --> 00:10:42,366
So this question, remarkably, Tarjan 
answered in the negative, in his original 

148
00:10:42,366 --> 00:10:44,765
paper. 
He showed that, if you use this exact 

149
00:10:44,765 --> 00:10:49,009
data structure, union by rank and path 
compression, then they are arbitrarily 

150
00:10:49,009 --> 00:10:54,260
large Sequences of unions and finds of 
arbitrarily large collections of objects, 

151
00:10:54,260 --> 00:10:59,337
so that this data structure actually 
performs asymptotically, M the number of 

152
00:10:59,337 --> 00:11:04,558
operations times the inverse Ackermann 
function, N the number of objects amount 

153
00:11:04,558 --> 00:11:07,857
of work. 
That is keeping the data structure fixed, 

154
00:11:07,857 --> 00:11:11,128
this analysis cannot be asymptotically 
improved. 

155
00:11:11,128 --> 00:11:16,117
This data structure fundamentally has 
Worst case performance, governed by this 

156
00:11:16,117 --> 00:11:20,222
insane inverse Ackermann function. 
So this is already a mind boggling fact 

157
00:11:20,222 --> 00:11:24,617
that indeed Tarjan in the conclusion of 
his paper, notes that it's remarkable to 

158
00:11:24,617 --> 00:11:28,277
have such a simple algorithm with such a 
complicated running time. 

159
00:11:28,277 --> 00:11:32,432
But you could also ask the question, 
could we do better perhaps by improving 

160
00:11:32,432 --> 00:11:36,527
or changing the data structure. 
After all, by adding path compression, we 

161
00:11:36,527 --> 00:11:39,972
got a qualitative improvement in the 
average operation time. 

162
00:11:39,972 --> 00:11:44,316
That dropped from log to alpha of n. 
Perhaps there's yet another optimization 

163
00:11:44,316 --> 00:11:48,146
out there waiting to be discovered that 
would drop the amount of work per 

164
00:11:48,146 --> 00:11:52,292
operation over a sequence down to 
constant time per operation, linear work 

165
00:11:52,292 --> 00:11:55,009
overall. 
Tarjan, in his paper made the bold 

166
00:11:55,009 --> 00:11:59,633
conjecture that there is no linear time 
method, no matter how clever you are. 

167
00:11:59,633 --> 00:12:02,985
And remarkably, that conjecture has since 
been proved. 

168
00:12:02,985 --> 00:12:08,162
It was proved for certain classes of data 
structures both in Tarjan in his original 

169
00:12:08,162 --> 00:12:10,693
paper and in And a follow-up paper in 
'79. 

170
00:12:10,693 --> 00:12:14,465
But the final version, of this 
conjecture, was proven by Friedman and 

171
00:12:14,465 --> 00:12:17,731
Sachs, back in 1989. 
No matter how clever you are, no matter 

172
00:12:17,731 --> 00:12:22,034
what union-find data structure you come 
up with, there will be arbitrarily long 

173
00:12:22,034 --> 00:12:25,806
sequences of operations, so that the 
average amount of work you do per 

174
00:12:25,806 --> 00:12:29,707
operation Is, the inverse Ackerman 
function of, the number of objects, in 

175
00:12:29,707 --> 00:12:31,807
the data structure. 
Pretty unbelivable, 

176
00:12:31,807 --> 00:12:34,282
so let me just wrap up with a historical 
comment. 

177
00:12:34,282 --> 00:12:38,037
So, it's a full disclosure, I wasn't 
quite alive when this result came out 

178
00:12:38,037 --> 00:12:42,227
but, in talking to, in reading about it, 
and talking to senior researchers about, 

179
00:12:42,227 --> 00:12:46,252
my sense is that it was really sort of a 
water-shed moment, for the field of data 

180
00:12:46,252 --> 00:12:50,845
structures and And algorithms. 
Specifically it confirmed the belief of, 

181
00:12:50,845 --> 00:12:55,366
people working in algorithms and data 
structures, that the field posessed 

182
00:12:55,366 --> 00:12:59,401
surprising depth, and beauty. 
There had, of course, been earlier 

183
00:12:59,401 --> 00:13:04,340
glimpses of this, we mentioned in the 
optional, material in part 1, Kanuth's 

184
00:13:04,340 --> 00:13:07,722
analysis of linear probing, back in the 
early 1960's. 

185
00:13:07,722 --> 00:13:11,784
But this was really something for the 
worst case analysis, of algorithms and 

186
00:13:11,784 --> 00:13:14,669
data structures. 
So the fact that such a practical and 

187
00:13:14,669 --> 00:13:18,623
naturally arising problem, in algorithms 
and data structures, requires, 

188
00:13:18,623 --> 00:13:22,693
necessarily, the understanding of the 
Ackermann function and it's inverse. 

189
00:13:22,693 --> 00:13:27,196
A function, mind you, which was first 
Proposed and defined back in 1920s in the 

190
00:13:27,196 --> 00:13:31,658
service of recursion theory, almost 10 
years before Tarjan was doing his work on 

191
00:13:31,658 --> 00:13:35,033
models of computation. 
It was a real eye opener and it showed 

192
00:13:35,033 --> 00:13:39,506
that this field is something that will 
keep many generations of scientists quite 

193
00:13:39,506 --> 00:13:40,582
busy for many years to come.