So let's try thinking about this 
alternative lay, lazy union based 
approach of union-find data structure in 
some detail. 
so get this to work as quickly as we want 
to, we're going to need two 
optimizations. 
This video will introduce the first of 
those optimizations, union by rank. 
So let's have a quick slide making 
precise what we learned in the previous 
video. 
So, in our new implementation, it's still 
the case that each object has one extra 
field. 
You can think of it logically as a 
pointer to some other object,. Really, 
it's just recording the name of some 
other object, 
but the semantics have changed. 
In our first implementation of union-find 
with eager unions, this field we called a 
leader pointer, 
and the invariant was that every object's 
pointer points directly to the leader of 
its group. 
In this Lazy Union Implementation, we are 
relaxing that invariant, 
so, I'm going to change the name. 
I'm not going to call it a leader 
pointer, I'm just going to call it a 
parent pointer. 
And the new relaxed invariant is that the 
collection of everybody's parent pointers 
induces a collection of directed trees. 
Each tree in this collection represents 
one of the groups in the partition that 
we're maintaining. 
Each of these trees has a root vertex, 
that is, a vertex that points back to 
itself. 
It is these root vertices, these root 
objects, that we think of as the leaders 
of these componenets. 
So as before, we're going to refer to a 
group by the name of its leader vertex 
and that leader is by definition the 
roots of the corresponding directed tree. 
One way to think about things is that 
when we had eager unions, we again 
associated a directed tree with each of 
the groups, but we also insisted that it 
was a super shallow bushy tree, that it 
had depth only one, that is all you had 
was a root, the leader and then everybody 
else who pointed directly to the leader. 
In this lazy union implementation, we're 
allowing the trees to grow deeper. 
We don't insist that they have only one 
level below the root. 
So as far as the initialization, at 
birth, in a, a union-find data structure 
just consists of n degenerate trees, each 
object just initially points to itself 
and is a root at the beginning. 
In general, how do you implement find 
from some object? 
Well, you just have to return the leader 
of its group, and we know you can get to 
the leader just by traversing parent 
pointers from x until you get to the 
root, until you get to some object that 
points back to itself, that's where you 
return at the end of the find call. 
How do you do union? 
Well, given two objects, x and y, you 
have to find its respective roots, so you 
just invoke the find operation twice. 
Once for x to get to its root s1, once 
from y, to get its root s2, and then you 
install either s1 or s2 as a child of the 
other. 
So you do one pointer update after the 
two finds. 
So for now, I'm going to leave union 
underdetermined. 
I'm not going to give you a rule to 
decide which of s1 and s2 becomes the 
child of the other. 
On the next quiz, I want you to think 
through, you know, what would be the 
worst case running time of these 
operations, if we make that choice 
arbitrarily? If we just pick s1 or s2 
arbitrarily to become the child of the 
other. 
Alright. So the correct answer 
unfortunately is the last one, 
both find and union can actually blow up 
to linear time operations, if you're not 
careful about how you implement the 
union. 
So why is this true? Well, first let me 
just point out that B is definitely not 
the correct answer. 
Whatever, finding the unions worst case 
running time is, it's definitely the same 
for both of them. 
Remember, union reduces to two calls to 
find plus constant work, 
so that's going to tell you that they're 
both going to have the same operation 
time in the worst case. 
Now, why is it linear? The reason you're 
stuck with linear time is because the 
trees that you build through a sequence 
of unions can in the worst case become 
very scraggly. 
And to see what I mean, imagine you just 
do a sequence of unions, and in each time 
you're unioning a group with just one 
object into the group that you built up 
so far. 
So if you union together a group of size 
two and a group of size one, well if you 
choose arbitrarily, maybe you make the 
group fo size one the new root. 
So now you just got a chain of the three 
objects. 
Maybe you union in another singleton 
group and arbitrarily you choose the 
singleton group to be the new leader, 
that gives you a chain of four objects, 
and so on. 
So if you do a linear number of unions 
and you're not careful about who you make 
a child of the other, you might well end 
up with a chain of linear length, 
and then if you do finds for the objects 
toward the bottom of that chain, they're 
going to require linear work. 
And again, union has a subroutine of 
finds, so it's going to then take linear 
work as well. 
So when you see this example you know, I 
hope your immediate reaction is, well, 
geez, we can certainly do better than 
that. 
We can just be somewhat intelligent with 
when we do a union, which of the old 
roots do we install as the child of the 
other one? There's a rough analogy with 
how we made a natural optimization with 
our eager union implementation and union 
find. 
There, we had to make some kind of choice 
between which of the two old leaders do 
we retain. 
We retain the one for the bigger root to 
minimize the number of leader updates. 
So here, the natural thing to do is, 
well, you know, look at which of the two 
trees is already deeper and we want to 
keep the root of that deeper tree, 
that's to avoid it from getting deeper 
still. 
So we install the root of the shallower 
tree as a child of the deeper one. 
Let's make that precise on the next slide 
with the notion of the rank of an object. 
So the rank is just going to be a second 
field along with the parent pointer that 
we maintain for each object. 
It's just going to be some integer. 
Let me tell you how I want you to think 
about the ranks, at least for now. 
Okay? So I'm going to give you some 
semantics, pouring out to semantics, 
but I want to warn you, they're only 
going to hold valid for the next couple 
of videos. 
When we introduce something else called 
path compression, these semantics are 
going to break down, 
but initially, this is how I want you to 
think about ranks. 
It's going to be the maximum number of 
hops required to get from a leaf of xs 
tree up to x itself. 
So for example if, X is a root, this is 
going to be the worst case length, the 
maximum length of any leaf to root path 
in the tree. 
So just to make sure this is clear, let 
me draw a quick example. 
So here in blue, I've shown you one 
particular union-find data structure that 
has two groups. 
Now, it's very easy to compute the rank 
of leaves, that's 0, because the only 
path from a leaf to that node is the 
empty path. 
There are two nodes that have rank one, 
then the root of the bigger tree has rank 
two. 
And, if you think about it, in general, 
the rank of a node is going to be one 
plus the largest rank of any of its 
children. 
For example, if you're in node x, you 
have a child y and there's some path from 
a leaf z up to y that requires five 
pointer traversals, while going from z 
all the way up to u at x is going to 
require six pointer traversals. 
And of course, when you initialize the 
data structure, you want, you want to set 
everybody's rank to 0, because everybody 
is just is in a tree by themselves at the 
beginning. 
This notion of rank is going to be a 
super crucial concept throughout this 
entire sequence of lectures on 
union-find. 
So make sure you understand this 
definition, 
I encourage you to draw out some of your 
own examples, do some computations to get 
a better feel for the concept. 
For the moment, we're just going to use 
rank to avoid creating scraggly trees. 
So let's go back to the bad example in 
the quiz. 
Let's remember what the intuition was. 
We wound up with this long chain and 
therefore really bad, worst case running 
time for our operations, because we were 
stupidly taking a tree that was already 
pretty deep and installing it as a child 
under a single node, under a super 
shallow tree, 
thereby producing a tree which is even 
deeper. 
So the obvious way to avoid that is when 
faced with merging together a deep tree 
and a shallow tree, you install the 
shallow tree as a child under the deep. 
one. 
That's going to slow down this deepening 
process as much as possible. 
So, we can make that precise, just using 
this rank notion. 
Notice that the rank is exactly 
quantifying the depth of the tree. 
So the rank of the roots of a tree by 
this invariant is equal to the depth of 
the tree. 
So if you want to make the shallower 
tree, the child of the other one, that 
means you want to make it the smaller 
rank tree, the child of the bigger rank 
tree. 
This optimization is what is meant by 
union by rank. 
There still remains the case where the 
two roots have equal rank that is, with a 
two trees that were fusing, have exactly 
the same maximum path length, and in that 
case, we do just arbitrarily choose one 
of them. 
So in the pseudocode I've written here, 
I've arbitrarily chosen y's root to be 
the one that remains the root, 
x's root, S1 is going to be installed as 
a child of S2. 
But again, you could do it either way, 
it's not going to matter. 
Now, we're not off the hook yet. 
Remember, this is a data structure where 
we're intending for an invariant to, to 
hold to the semantics for the ranks as 
quantifying worst case path length to the 
node. 
And, we made a modification to the data 
structure. 
We rewired somebody's parent pointer. 
So now it's our responsibility to check, 
does the invariance still hold? And if it 
doesn't still hold, then we have to 
restore it, hopefully, using minimal 
extra work. 
So in this quiz, we're going to think 
through how ranks change, when we do a 
union. 
So remember the node notation, we're in 
the middle of a union of the objects x 
and y, 
s1 and s2 refer the ranks, that is the 
leaders, of the groups of the trees that 
contain x and y respectively. 
The first thing I want you to think 
through and convince yourself of, is that 
for objects other than s1 and s2, other 
than the pair of objects they're involved 
in the actual point of rewiring, nobody's 
rank changes. 
So make sure you think that through and 
convince yourself ranks are invariant, 
except possibly for the objects, s1 and 
s2. 
In this quiz, we're going to drill down 
on how did the ranks of s1 and s2 change, 
given that we took one of them and 
rewired its parent pointer to point to 
the other. 
Okay. So the correct answer to this quiz 
is the fourth one. 
So remarkably, in the, in many cases, 
there's no change at all to anybody's 
ranks, including s1 or s2. 
If the ranks were different before the 
union, both of the ranks stay the same. 
The one exception is when you take the 
union of two ranks which are equal, then, 
whichever one is the new root, and then 
the pseudocode that I showed you s2, is 
chosen to be the new root. 
It's rank is incremented, it goes up by 
1. 
I think the easiest way to see this is 
just with a couple pictures, a couple 
examples. 
So let me just draw two directed trees in 
the corresponding ranks. 
And in this one, S1 has strictly a bigger 
rank. 
Its rank is 2, so we install s2 as a 
child under s1. 
So recomputing the ranks in the fused 
together tree, we see that, again, the 
rank of the root at s1 is 2. 
Same thing as it was before, it didn't go 
up. 
So what's the general principle here? 
Well, in general, suppose you have these 
two trees and suppose the first ones a 
root, s1 has a bigger rank, say rank r. 
What does that mean? So that means that 
there exists a path from the leaf to the 
root s1, that requires a full r hops. 
And in the second tree, by virtue of its 
rank being less than r, [INAUDIBLE] minus 
1, 
it says that every single path from a 
leaf to the root s2 of that tree uses 
only r minus 1 hops. 
So when you install s2 as a child under 
s1, well, you know the length of every 
leaf to root path, now the root is s1, 
it's only gone up one. 
You just have to get to s2 and then one 
more hop, you make it all the way to s1. 
So if all those paths took at most r - 1 
before, paths all the way up to s1, take 
at most r hops now. 
So, the worst case path link has not gone 
up. 
That's exactly what happens in this 
example so it's true that when we install 
s2 under s1, we have yet another path 
that requires 2 hops, that goes through 
s2, but we had one previously, so the 
rank doesn't go up, the worst case path 
links exactly the same as before. 
And this is same kind of reasoning 
explains why the rank has to get bumped 
up by one when you fuse together two 
trees that have the same rank. 
So suppose s1 and s2 both have rank r, 
well then, both of them have a leaf to 
root path that requires a full or halves. 
And now, when you install s1 is child 
under s2, that path where you needed r 
hops from the leaf to get to s1, now to 
get all the way up to s2 you need one 
more hop. So now, suddenly there exists a 
path from a leaf to the new root s2 that 
requires a full r + 1 hops, that's why 
the ranks gotta go up.