I need to begin by setting up some 
notation and definitions. 
The first one is that of rank blocks. 
So if we're given value of N, the number 
of objects in the data structure we 
define the rank blocks as follows. 
0 gets its own block as does 1. 
The next block is 2. 
3. 
4. 
The next block starts at 5 obviously and 
the last entry is going to be 2 raised to 
the fourth. 
So 2 raised to the final. 
The rank of the previous block as known 
as 16. 
The nest block starts at 17 and goes up 
to 2 raised to the last rank of previous 
blocks. 
So 2 raised to the 16 also known as 
65536, and so until we have enough rank 
blocks that we capture. 
N. 
Actually, if you think about it, because 
ranks can only be log n, we only have to 
go up to log n, but that actually doesn't 
affect anything by more than a constant. 
So let's just go ahead all the way up to 
So, how many rank blocks are there? Well, 
focus just on the largest member of each 
of the rank blocks. 
The largest member of a given rank block 
is the largest member is 2 raised to the 
largest member of the previous rank 
block. 
So, for the t-th rank block, roughly what 
you have is this largest entry is 2 to 
the 2 to the 2. 
Raised t times. 
How many times you need to do that before 
you get in? By definition, you need 
log*(n). 
So, that's how many rank block star, 
that's where log*(n) enters the analysis. 
So, I realized this must be a very 
inscrutable definition, these weird rank 
blocks. 
So, let me tell you how you should think 
of about them. 
They're meant to encode an intuition that 
we had on the previous slide where we 
said well, you know we should be happy if 
the rank of an object's parent is a lot 
bigger than the rank of that object 
itself. 
Why? Well, when we traverse that object's 
parent pointer, we make a ton of progress 
through the rank space and you can only 
make a ton of progress through the rank 
space a limited number of times. 
So that means a small number of parent 
traversals, that means fast fines. 
So these rank blocks are just a very 
clever way of defining sufficient 
progress. 
When we're going to be happy with a gap 
between the rank of an object and the 
rank of its parent. 
Specifically we're happy whenever these 2 
ranks lie in different. 
Rank blocks. 
We're not happy if they lie on the same 
rank block. 
So, for example, if you're at an object 
and it has rank 8, and together with his 
parents, it has rank 18, then we're happy 
because 18 is in the rank block after the 
one that contains 8. 
On the other hand, if you go from rank 8 
to rank 15, then we're not happy, okay? 
We're going to call that not a big gap 
Between the rank of the object and the 
object's parent. 
So let's build on that idea to make 
another definition. 
So consider a given snapshot in time. 
That is, we've done some sequence of 
finds, we've done some sequence of 
unions. 
So, we're going to call some objects at 
this point in time good and some of them 
bad. 
Here are the criteria by which you are 
good. 
So first of all, if you're the root of 
your tree, you're going to be good. 
If you're a direct descendant of the 
root, that is if your parent is the root 
of your tree, then you're also good, If 
not though,if you're deeper in the tree, 
then you're going to be good only if your 
parents ranked Rank is in a strictly 
larger ranked block, than your own rank. 
So, that is in the previous example. 
If your rank is 8, and your parents is 
18, than you're good. 
If your rank is 8, and your parents is 15 
and your parent's not a root, then you're 
going to be bad. 
The role of this definition is to split 
the work that we do into two parts. 
First of all, we do some work visiting 
good nodes. 
Second of all, we do some work vising bad 
nodes. 
The work we do visiting good nodes would 
be very easy to bout, no problem. 
We'll have to do a separate global 
analysis to bound the total work that we 
do visiting bad nodes. 
This dichotomy is exactly the same as 
something we faced when analyzing 
Kruskal's algorithm when we implemented 
it using the union 5 data structure with 
eager unions. 
Here, you recall there were some parts of 
the work in Kruskal's algorithm that were 
easy to balance, just iteration by 
iteration. 
For example every cycle check cost only 
at constant time but then there is this 
more complicated type of work, namely all 
of leader pointer updates that we had 
that bound globally via separate 
argument. 
The exact same thing is going to happen 
here. 
Good nodes will be able to bound 
operation by operation whereas the bad 
nodes will have a global analysis control 
the total work done over all of the 
operations. 
So more precisely, I've set up the 
definitions so that the work done 
visiting good nodes is bounded by O(log* 
n) every single operation. 
Indeed, how many good nodes could you 
possibly visit during a find operation? 
Say, from some object x. 
So you start at x, and you traverse these 
parent pointers going all the way up to 
the roots. 
Well, there's the roots. 
There's the descendant of the roots, so 
that's 2. 
So let's set those aside. 
What about the other good nodes that you 
encounter on your path? Well, by 
definition, when you visit a good node. 
The rank of its parent is on the bigger 
rank block than the rank of that node 
itself. 
That is, every time you traverse a parent 
pointer from a good node, you will 
progress to a subsequent rank block. 
Well, assuming that log star n rank 
blocks, so you can only progress through 
subsequent log star n times. 
So, the total number of good nodes you're 
going to see is O(log*n). 
So now, let's go ahead and express the 
total amount of the work that we do 
overall to find a union operations, and 
use two quantities. 
Work on the good nodes, which we now know 
as just log star in each operation. 
Plus, the visits to the bad nodes, which 
at the moment, we have no idea how big it 
is. 
So now, let's proceed to the global bound 
on the total amount of work that we 
performed on bad nodes and this is really 
the crux of the whole theorem. 
[SOUND] So, let's just review the 
definitions real quick. 
What is that mean that you're a bad node? 
So, first of all, you're not the root 
Second of all you're not a direct 
descendant of the root. 
That is you have a grandparent. 
And third, it is not the case that your 
parents' rank is in a later rank block. 
Is in exactly the same rank block as your 
own rank. 
That's what it means that your. 
So how much work do we spend on bad 
nodes? Well let's analyze it one rank 
block at a time. 
So, fix an arbitrary rank block, let's 
say for some integer K, it's smallest 
rank is K + 1 and it's biggest rank is 2 
to the K. 
Now I told you what our two main building 
blocks were. 
First 1 is the rank lemma, I am going to 
ask you to remember that in a second. 
But first I want to put to use our other 
building block, which is that path 
compression increases the rank gap 
between objects and their parents. 
That's what we're going to use right now. 
Specifically, consider a fine operation 
and a bad object x that it visits. 
By virtue of being bad x is not the root 
and is not a direct descendant of the 
root. 
So, root is a higher up ancestor than its 
parent. 
Therefore, x's parent will be changed In 
the subsequent path compression. 
It will be rewired to point directly to 
the root, a strict ancestor of its 
previous parent. 
Therefore, the rank of its new parent 
will be strictly bigger than the rank of 
its previous parent. 
That's going to keep happening every time 
that x is visited, y looks bad. 
It keeps getting new parents, and those 
new parents keep having ranks strictly 
bigger than the previous one. 
Well, how many times can this happen 
before the rank of x's parent has grown 
so big that it lies in a subsequent rank 
block? The biggest value in x's rank 
block and remember x is a nonroot its 
rank is frozen forever. 
So it's always stuck in this rank block. 
Once its parents rank gets updated let's 
say at least 2^k times then the rank has 
to be so big that it lies. 
In the next rank block. 
At that point, x is no longer bad. 
Its parent pointer makes so much 
progress, it goes to another rank block. 
Now, we're going to call it good. 
And, of course, once x becomes good in 
this fashion, it would be good 
forevermore. 
It is not a root, it will never be a root 
again. 
Its rank is frozen forever and its 
parent's rank can only go up. 
So, once you're good, once your parent's 
rank is sufficiently large. 
It will be sufficiently large for the 
rest of time. 
Alright, so we're almost there. 
Let's just make sure we don't forget 
anything that we have done. 
So, the total work we're bounding in this 
two ways. 
So, first of all, we visit log star n 
good nodes for each operation. 
So, overall m operations at O(m). 
log*n for the good nodes, plus there's a 
number of visits over the m operations to 
the bad nodes. 
We're going to bound that wrok globally 
but we're going to proceed rank block by 
rank blocks. 
We're fixing one ranks k+1 up to 2^k. 
What we shown on the last slide is that 
for every object x whose final frozen 
rank lies somewhere in this rank block. 
The number of times it gets visited while 
it's bad, the number of times it could be 
visited before it becomes good forever 
more is bounded above by 2^k. 
So, if now you used one of our key 
building blocks, that path compression 
increases the ranks of parents, now let's 
use the other building block, the rank 
lemma. 
The rank lemma, if you recall, says, that 
in any given moment in time and for any 
possible rank r, the number of objects 
that currently have rank r cannot be any 
bigger than n/2^r. 
So, it's used the rank lemma and apply it 
to the upper bound, how many nodes Could 
possibly have their final frozen ranks 
lengths somewhere in this rank block. 
They have their final frozen ranks 
somewhere between k+1 and 2^5. 
So, let's just sum over all of the ranks 
in the rank blocks, so it's starting at 
2k+1 going up to 2^k. 
By the rank lemma, for a given value of 
i, we noticed the most n/2^i objects that 
eventually wind up with rank i and by the 
usual geometrical sum, this whole thing 
can be upper bounded by n / 2^k. 
So, this is now just trying to look like 
a magical coincidence. 
Of course, we made a number of 
definitions in the analysis. 
specifically, we structured the rank 
blocks so that this magic would happen. 
Specifically, the number of inhabitants 
of a rank block, n/2^k, times the maximum 
number of visits to an inhabitant in the 
rank block wile they're bad, times 2^k, 
that's actually independent of the rank 
block. 
We multiply these two things together, 
the number of visits per object to the k, 
the number of objects n/2^k and what do 
we get? We get n. 
Now, this was only counting the visits to 
bad objects in a given rank block but 
there aren't many rank block, remember, 
there's only log*n of them. 
So, that means the total amount of work 
spent visiting the bad nodes, some that 
over the rank blocks is O(m*log*n). 
Combining the balance of the good nodes 
and the bad nodes, we get (m+n) log*n. 
At the beginning, I mentioned that the 
interesting case is when m is big Omega 
of n, in that case, this bound is just 
O(m*log*n). 
Essentially, if you have a very sparse 
set of union operations, you can just 
apply this analysis to each of the 
directed trees separately. 
So, that's it, that's the full story of 
complete analysis of the brillaint 
Hopcroft only analysis. 
Log star and on average operation time, 
under calf compression. 
Brilliant as it is, you can do even 
there. 
That's the subject of the next couple 
videos.