So let's start thinking about this
alternative lazy union based approach to the Union-Find data
structure in some detail. To get this to work as quickly as we want
to we're going to need two optimizations. This video will introduce the first of
those optimizations, Union by Rank. So let's have a quick slide making precise
what we learned in the previous video. So in our new implementation
it's still the case that each object has one extra field. You can think of it logically as
a pointer to some other object. Really it's just recording
the name of some other object but the semantics have changed. In our first implementation of
union find with eager unions, this field we called it a leader pointer
and the invariate was that every object's pointer points directly
to the leader of its group. In this Lazy Union Implementation,
we are relaxing that variant. So I'm going to change the name. I'm not going to call it a leader pointer. I'm just going to call
it a parent pointer. And the new relaxed invariant is that the
collection of everybody's parent pointers induces a collection of directed trees. Each tree in this collection
represents one of the groups in the partition that we're maintaining. Each of these trees has a root vertex, that is a vertex which
points back to itself. It is these root vertices, these root objects that we think of
as the leaders of these components. So as before, we're going to refer to a
group by the name of its leader vertex and that leader is, by definition, the root
of the corresponding directed tree. One way to think about things is that,
when we had eager unions, we, again, associated a directed tree with each
of the groups, but we also insisted that it was super shallow, bushy tree,
that it had depth only one. That is, all you had was a root,
the leader, and then everybody else who
pointed directly to the leader. In this lazy union implementation we're
allowing the trees to grow deeper. We don't insist that they have
only one level below the root. So as far as the initialization at birth
in a union defined data structure just consists of n degenerate trees each
object initially points to itself and is a root at the beginning. In general how do you implement
find from some object? Well you just have to return
the leader of its group and we know you can get to the leader just by
traversing parent pointers from x until you get to a root until you get to some
object that points back to itself. That's what you return at
the end of the find call. How do you do a union? Well given two objects x and y you
have to find their respective roots so you just invoke the find
operation twice once for x to get its root s1 once from
y to get its root as two. And then you install either s1 or
s2 as a child of the other. So you do one pointer
update after the two finds. So for now I'm going to leave
Union under determined. I'm not going to give you a rule
to decide which of s1 and s2 becomes the child of the other. In the next quiz, I want you to think
through what would be the worst case running time of these operations if
we make that choice arbitrarily, if we just pick s1 s2 arbitrarily
to become the child of the other. All right so the correct answer
unfortunately is the last one, both find in union can actually blow
up to linear time operations if you're not careful about how
you implement the union. So why is this true? Well first just let me point out that B
is definitely not the correct answer. Whatever finding union's
worst case running time is, it's definitely the same for both of them. Remember, union reduces to calls
to fine plus constant work. So it's going to tell you that they're
both going to have the same operation time in the worst case. Now, why is it linear? The reason you're stuck with linear
time is because the trees that you build through a sequence of unions can
in the worst case become very scraggly. And to see what I mean imagine you
just do a sequence of unions and at each time you're unioning
a group with just one object into the group that you built up so far. So if you union together a group
of size two and group of size one, we'll have you choose arbitrarily. Maybe you make the group
of size one the new root. So now you just got a chain
of the three objects. Maybe you union in
another singleton group, and arbitrarily you choose the singleton
group to be the new leader. That gives you a chain of four objects and
so on. So if you do a linear number of unions,
and you're not careful about who you make a child of the other, you might well
end up with a chain of linear length. And then if you fines for the objects
towards the bottom of that chain, they're going to require linear work. And again,
union has a subroutine of fines so its going to then take
linear work as well. So when you see this example, I hope
your immediate reaction is well jeez, we can certainly do better than that. We can just be somewhat intelligent
when we do a union which of the old roots do we install
as a child of the other one? There's a rough analogy with how we
made a natural optimization with our eager union implementation of union find. There we had to make some kind of choice
between which of the two old leaders do we retain? We retain the one for the bigger group to
minimize the number of leader updates. So here, the natural thing to
do is well look at what you did, two trees is already deeper, and we
want to keep the root of that deeper tree. That's to avoid it from
getting deeper still. So we install the root of the shallower
tree as a child of the deeper one. Let's make that precise on the next slide
with the notion of the rank of an object. So the rank is just going to be
a second field along with a parent pointer that we maintain for each object. It's just going to be some integer. Let me tell you how I want you to think
about the ranks at least for now. Okay, so I'm going to give you some
semantics, very natural semantics. But I want to warn you,
they're only going to hold valid for the next couple of videos. When we introduce something else
called path compression these semantics are going to break down. But initially this is how I
want you to think about ranks. It's going to be the maximum
number of hops required to get from a leaf of x's tree up to x itself. So for example if x is a root,
this is going to be the worst case link, the maximum link of any leaf
to root path in the tree. So just to make sure that this is
clear let me draw a quick example. So here in blue I've shown you one
particular union fine data structure that has two groups, now it's very easy to
compute the rank of leafs, that's zero, because the only path from a leaf
to that node is the empty path. There are two nodes that have rank 1. Then the root of the bigger
tree has rank 2. And if you think about it, in general, the rank of a node is going to be 1 plus
the largest rank of any of its children. For example, if you're in node X,
you have a child Y and their sum path from a leaf Z up to Y that
requires five pointed traversals while going from Z all the way up to U and X is
going to require six pointed traversals. And of course,
when you initialize the data structure, you want to set everybody's rank to zero,
because everybody's just in a tree, by themselves, at the beginning. This notion of rank is going to be a super
crucial concept throughout this entire sequence of lectures on union fine so
make sure you understand this definition. I encourage you to draw out
some of your own examples, do some computations to get
a better feel for the concept. For the moment we're just going to use
rank to avoid creating scraggly trees. So let's go back to the bad
example on the quiz. Let's remember what the intuition was. We wound up with this long chain and therefore really bad worse case
running time for our operations because we were stupidly taking a tree
that was already pretty deep and installing it as a child under a single
note under a super shallow tree. Thereby producing a tree
which is even deeper. So the obvious way to avoid that is
when faced with merging together a deep tree and a shallow tree, you install the shallow tree
as a child under the deep one. That's going to slow down this
deepening process as much as possible. So we can make that precise
just to using this rank notion. Notice that the rank is exactly
quantifying the depth of a tree. So the rank of the root of
a tree by this invariant is equal to the depth of the tree. So if you want to make the shallower tree
the child of the other one, that means you want to make the smaller rank tree
the child of the bigger rank tree. This optimization is what
is meant by Union by Rank. This still remains the case where the two
roots have equal rank, that is where the two trees that were fusing have
exactly the same maximum path length. And in that case we do just
arbitrarily choose one of them. So in the pseudocode I've written here
I've arbitrarily chosen y's root to be the one that remains the root, x's root s1
is going to be installed as a child of s2. But again, you can do it either
way it's not going to matter. Now we are not off the hook yet. Remember this is a data structure
where we're intending for a invariant to hold these semantics for
the ranks that's quantifying worst case path link to the node and we
made a modification to the data structure. We rewired somebody's parent pointer. So now it's our responsibility to check
does the invariant still hold and if doesn't still hold, then we have to restore it
hopefully using minimal extra work. So in this quiz we're going to
think through how ranks change when we do a unions. Remember the intonation, we're in the
middle of a union of the objects x and y. s1 and s2 refer the ranks that is the
leaders of the groups of the trees that contain x and y respectively. The first thing I want
you to think through and convince yourself of is that for objects
other than s1 and s2, other than the pair of objects that are involved in the actual
point of re-wiring, nobody's rank changes. So make sure you think that through and
convince yourself. Ranks are invariant, except possibly for
the objects S1 and S2. In this quiz, we're going to drill down on
how do the ranks of S1 and S2 change given that we took one of them and rewired it's
parent pointer to point to the other. Okay so the correct answer to
this quiz is the fourth one. So remarkably in many cases there's no
change at all to anybody's ranks including S1 or S2. If the ranks were different before
the union both of the ranks stay the same. The one exception is when you take
the union of two ranks which are equal. Then whichever one is the new root and in the pseudo code that I showed you
s2 is chosen to be the new root. Its rank gets incremented,
it goes up by one. I think the easiest way to see this
is is just with a couple pictures, couple of examples. So let me just draw two directed
trees in the corresponding ranks. And in this one s1 has
strictly bigger rank. It's rank is 2 so
we install s2 as a child under s1. So re-computing the ranks in the fused
together tree, we see that again the rank at the route at s1 is 2., same thing
as it was before it didn't go up. So, what's the general principle here? Well, in general,
suppose you have these two trees and suppose the first ones route s1
has a bigger rank, say rank r. What does that mean? So, that means that there exists
a path from a leaf to the root s1 that requires a full r hops. And in the second tree by virtue of its
rank being less than r most r minus 1. It says that every single path
from a leaf to the root s2 of that tree uses only r minus 1 hops. So when you install s2 as a child under
s1 the length of every leaf to root path now the root is S1,
it's only gone up by 1. You just have to get to s2 and then 1
more hop you make it all the way to s1. So if all those paths took
almost r minus 1 before, all the paths all the way to s minus
1 to s1 now, taken most are hops now. So the worst case path
link has not gone up. That's exactly what happens in this
example so it's true that when we install s2 under s1 we have yet another path that
requires two hops that goes through s2. But we had one previously so
the rank doesn't go up. The worst case path length is
exactly the same as before. And this same kind of reasoning explains
why the rank has to get bumped up by one when you fuse together two
trees that had the same rank. So suppose S1 and S2 both have rank R, well then both of them have a leaf to
root path that requires a full R hops. And now when you install S1 as a child
under S2 that path where you needed R hops to get from the leaf to S1, now to get all
the way up to S2 you need one more hop. So now suddenly there exists a path
from a leaf to the new root S2 that requires a full R plus 1 hops
that's why the ranks gotta go up.