So welcome to this optional sequence of 
videos on State of the Art 
Implementations of the Union-Find Data 
Structure. 
Now, as advanced in optional material, 
let me make a few comments before we get 
started. 
So the first comment is, I'm going to 
assume that you're in a particularly 
motivated mood. 
Now, no ones forcing you to take to these 
algorithms courses, so I'm always 
assuming that you're a very motiviated 
individual. 
Individual. 
But that's going to hold even more true 
than usual, in these advanced videos. 
And the way that plays out is, well I'm 
going to hold myself to the same exacting 
standards of clarity, that I usually do. 
I'm going to ask a little bit more of 
you, the viewer. 
I'm going to perhaps, dot a few less I's, 
cross a few less T's, than I usually do. 
And so you may find yourself 
periodicially needing to pause, and think 
through some of the details. 
That I'm glossing over. 
The second comment is that I'm going to 
give sort of short shrift to applications 
of the union-find data structure. 
That's not cause there aren't any, there 
are plenty of applications of this data 
structure. 
But for these videos, we're really just 
going to immerse ourselves, in the beauty 
and the depth of ideas behind the design 
of the data structures, and especially 
the analysis of their performance. 
The final comment is to keep in mind that 
this is some seriously next level 
material. 
It is totally normal that the first time 
you see this stuff, you find it 
confusing, you find it difficult to 
understand. 
So confusion should not discourage you. 
It does not represent any intellectual 
failing on your part. 
Rather, keep in mind, it represents an 
opportunity to get even smarter. 
So, with those comments in mind, let's go 
ahead and proceed to a different approach 
to the union-find data structure via lazy 
unions. 
So let's have one slide with a quick 
review of what we already know about the 
union-find data structure. 
If you recall, we discussed this in the 
context of a fast implementation of 
Kruskal's minimum spanning tree 
algorithm. 
The raison d'être of a union-find data 
structure, is to maintain a partition of 
a universe of objects. 
So in the context of Kruskal's algorithm, 
the objects were the vertices. 
And the groups we wanted to maintain, 
were the connecting components, with 
respect to the edges we committed 
ourselves to so far. 
The data structure should support two 
operations. 
No prizes for guessing the names of those 
2 operations. 
First is the find operation. 
This is given an object from the 
universe, return the name of that 
object's group. 
So for example, if x was in the middle of 
this square that I've put on the right, 
representing the universe X, 
we will return to c three the the name of 
the group that contains x . 
So in the context of Kruskal's algorithm, 
we use the find operation to check the 
cycles, how do we know whether adding a 
new edge would create a cycle with 
respect to edge we have already chosen, 
or will be if and only if the n points of 
that edge were already in the same 
connecting component that is if and only 
if two fine operations return the same 
answer. 
Answer. 
In the union operation, you're given not 
1 object, but 2, call them x and y, and 
then the responsibility of the operation 
is to merge the groups that contain x and 
y, so for example, in the picture, if y 
was in C4, x was, as before, in C3, then 
your job is to fuse the groups C3 and C4 
into a single group. 
In the context of Kruskal's algorithm, we 
needed this operation when we added a new 
edge. 
This, this fused together 2 of the 
existing connecting componenents into 1. 
So, we already went through 1 
implementation of union find, that was in 
order to get a blazingly fast 
implemenation of Kruskal's algorithm, so 
lets just review how that worked. 
With each group, we associated a linked 
structure, so each object had one 
pointer, associated with it, and the 
invariant was in a given group, there was 
some representative leader object, and 
everybody in that group pointed to the 
leader of that Group. 
So for example, on the right I'm showing 
you a group with three objects x, y and z 
and x would be the leader of this group. 
All three of the objects point directly 
to x and that would just be the output of 
the Find operation, the leader of the 
group. 
We use that as the The group name. 
Now the part which is cool and obvious 
about this approach is our find 
operation's take constant time. 
All you do is return the leader of the 
given object. 
Now the tricky part was analyzing the 
cost of union operations. 
So the problem here is that to maintain 
the inveriant, that every object of a 
group points to its leader. 
When you fuse 2 groups, you have to 
update a bunch of the object's leaders. 
You would only have one leader for the 
one new. 
New group. 
The simple, but totally crucial 
optimization, that we discussed was, when 
2 groups merged, you update the leader 
pointers of all objects in the smaller 
group, to point to the leader of the 
bigger group. 
That is, the new fused group inherits the 
leader from the bigger of it's 
constituents. 
Parts. 
If we do that optimization, it's still 
the case that a single union might take 
linear time, theta of n time, but a 
sequence of n unions takes only big o of 
n login time. 
And that's because each object endures at 
most a logarithmic number of leader 
updates. 
because every time its leader point gets 
updated, the population of the group it 
inhabits Doubles. 
So in this sequence of videos we are 
going to discuss a different approach, to 
implementing the union find data 
structure. 
And lets just go all-in, and try to get 
away with updating only, a single 
pointer, in each, union. 
All right, well how are we going to do 
that? Well, I think if we look at a 
picture, it'll be clear what the approach 
is going to be. 
So, let's look at a simple example, 
there's just six objects in the world, in 
two groups. 
One, two and three are in one group, with 
leader one, four five and six are in the 
second group with leader 4. 
So, with our previous implementation of 
union find and the 2 groups have equal 
size so we pick one arbitrary to 
represent the new leader. 
Let's say 4 is going to be the new leader 
and now we update objects 1, 2 and 3 so 
that they point directly to their new 
leader. 
Four. 
And so the new idea is very simple. 
Let's just sort of as a shorthand for 
rewiring all of one two and three, to 
point to four, let's just rewire one's 
pointer to point to four. 
And then it's understood that two and 
three as descendants of one, also now 
have the new leader four as well. 
So as a result, we do again get a 
directed tree afterwards, but we don't 
get one with just depth of one. 
We don't' get this shallow bushy a tree. 
We get a deeper one that has two levels 
below the root. 
So let me give you another way of 
thinking about this in terms of array 
representation. 
Here I also want to make the point that, 
you know, while conceptually it's 
going to be very useful to think of these 
unified data structure in terms of these 
directed trees, you're actually 
implementing. 
This data structure, this is not how you 
do it. 
You wouldn't bother with actual pointers 
between the different objects. 
You just have a simple array, indexed by 
the nodes and in the entry corresponding 
to a node i, you would just store the 
name of i's Parent. 
For instance, if we reframe this exact 
same example, in a ray representation, 
before we do the union, what do we got? 
Well, objects 1, 2, and 3 all have parent 
1. 
Objects 4, 5, and 6 all have parent 4. 
So in the array, we'd have a 1, in the 
first 3 entries, and a 4 in the second 3 
entries. 
So in the old solution we have to update 
the parent pointers of objects 1, 2 and 
3.They are all going to point directly to 
4 so that gives us an array entirely of 
4's.Whereas in the new solution, the only 
node whose parent pointer gets updated is 
that of object 1, it gets updated from 
itself to 4. 
And in particular, objects 2 and 3 still 
point to object 1, not directly to object 
4. 
So that's how union works in a simple 
example, how does it work in general. 
Well, in general you're given 2 objects, 
each belongs to some group. 
You can link up these 2 groups 
conceptually as directed trees. 
If you follow all the parent pointers, 
all parent pointers lead to the root 
vertex and we identify those root 
vertices with the leaders. 
Of these two groups. 
Now, when you have to merge the two trees 
together, how do you do it? Well, you 
pick one of the groups and you look at 
it's root. 
Currently, it points to itself. 
You change its parent pointer to point to 
the other group's leader, that is, you 
install one of the two roots as a new 
child of the other tree's root. 
So, I hope the pros and cons of this 
alternative approach to the union fine 
data structure are intuitively clear, the 
when, the pro comes in the union 
operation, which is now very simple. 
Now you might be tempted to just say 
union takes constant time, because all 
you do is update one pointer, it's 
actually not so simple. 
Right because remember you are given two 
objects x and y and in general you aren't 
linking x and y together they might be 
somewhere deep in a tree. 
You are linking roots together, so what 
is that the union operation reduces to, 2 
indicatations of find you have to find 
x's root r r's of 1 you have to find y's 
root R sub 2, and then you just do 
constant time linking, either R1 to R2, 
or vice versa. 
Now the issue, with this lazy union 
approach, is that, it's not at all 
obvious, and in fact it's not going to be 
true, that the find operation takes 
constant time. 
Remember, previously, when we actually 
did all this hard work of updating all 
these leader pointers every time we had a 
union. 
It guaranteed that whenever there was a 
find boom we just look in the field we 
return the leader pointer, we're done. 
Here, parent pointers do not point 
directly to roots rather, you have to 
traverse in general a sequence of parent 
pointers from a given object x to go all 
the way up to the root of the 
corresponding tree. 
So because of these trade offs, these 
pros and cons, it's really not obvious at 
all whether this lazy union approach to 
the unifying data structure is a good 
idea, whether it's going to have any 
payoffs. 
So that's going to take some really quite 
subtle analysis, which is the main point 
of the lectures to come. 
It's also going to take a couple of 
optimizations, and in the next video, 
we'll start with the first one. 
You might be wondering, okay, so when you 
do a union, and you have to install 1 
root, under the other, how do you make 
that choice? So the right answer is 
something called Union By Rank. 
That's coming up next.