I need to begin by setting up some notation and definitions. The first one is that of rank blocks. So if we're given value of N, the number of objects in the data structure we define the rank blocks as follows. 0 gets its own block as does 1. The next block is 2. 3. 4. The next block starts at 5 obviously and the last entry is going to be 2 raised to the fourth. So 2 raised to the final. The rank of the previous block as known as 16. The nest block starts at 17 and goes up to 2 raised to the last rank of previous blocks. So 2 raised to the 16 also known as 65536, and so until we have enough rank blocks that we capture. N. Actually, if you think about it, because ranks can only be log n, we only have to go up to log n, but that actually doesn't affect anything by more than a constant. So let's just go ahead all the way up to So, how many rank blocks are there? Well, focus just on the largest member of each of the rank blocks. The largest member of a given rank block is the largest member is 2 raised to the largest member of the previous rank block. So, for the t-th rank block, roughly what you have is this largest entry is 2 to the 2 to the 2. Raised t times. How many times you need to do that before you get in? By definition, you need log*(n). So, that's how many rank block star, that's where log*(n) enters the analysis. So, I realized this must be a very inscrutable definition, these weird rank blocks. So, let me tell you how you should think of about them. They're meant to encode an intuition that we had on the previous slide where we said well, you know we should be happy if the rank of an object's parent is a lot bigger than the rank of that object itself. Why? Well, when we traverse that object's parent pointer, we make a ton of progress through the rank space and you can only make a ton of progress through the rank space a limited number of times. So that means a small number of parent traversals, that means fast fines. So these rank blocks are just a very clever way of defining sufficient progress. When we're going to be happy with a gap between the rank of an object and the rank of its parent. Specifically we're happy whenever these 2 ranks lie in different. Rank blocks. We're not happy if they lie on the same rank block. So, for example, if you're at an object and it has rank 8, and together with his parents, it has rank 18, then we're happy because 18 is in the rank block after the one that contains 8. On the other hand, if you go from rank 8 to rank 15, then we're not happy, okay? We're going to call that not a big gap Between the rank of the object and the object's parent. So let's build on that idea to make another definition. So consider a given snapshot in time. That is, we've done some sequence of finds, we've done some sequence of unions. So, we're going to call some objects at this point in time good and some of them bad. Here are the criteria by which you are good. So first of all, if you're the root of your tree, you're going to be good. If you're a direct descendant of the root, that is if your parent is the root of your tree, then you're also good, If not though,if you're deeper in the tree, then you're going to be good only if your parents ranked Rank is in a strictly larger ranked block, than your own rank. So, that is in the previous example. If your rank is 8, and your parents is 18, than you're good. If your rank is 8, and your parents is 15 and your parent's not a root, then you're going to be bad. The role of this definition is to split the work that we do into two parts. First of all, we do some work visiting good nodes. Second of all, we do some work vising bad nodes. The work we do visiting good nodes would be very easy to bout, no problem. We'll have to do a separate global analysis to bound the total work that we do visiting bad nodes. This dichotomy is exactly the same as something we faced when analyzing Kruskal's algorithm when we implemented it using the union 5 data structure with eager unions. Here, you recall there were some parts of the work in Kruskal's algorithm that were easy to balance, just iteration by iteration. For example every cycle check cost only at constant time but then there is this more complicated type of work, namely all of leader pointer updates that we had that bound globally via separate argument. The exact same thing is going to happen here. Good nodes will be able to bound operation by operation whereas the bad nodes will have a global analysis control the total work done over all of the operations. So more precisely, I've set up the definitions so that the work done visiting good nodes is bounded by O(log* n) every single operation. Indeed, how many good nodes could you possibly visit during a find operation? Say, from some object x. So you start at x, and you traverse these parent pointers going all the way up to the roots. Well, there's the roots. There's the descendant of the roots, so that's 2. So let's set those aside. What about the other good nodes that you encounter on your path? Well, by definition, when you visit a good node. The rank of its parent is on the bigger rank block than the rank of that node itself. That is, every time you traverse a parent pointer from a good node, you will progress to a subsequent rank block. Well, assuming that log star n rank blocks, so you can only progress through subsequent log star n times. So, the total number of good nodes you're going to see is O(log*n). So now, let's go ahead and express the total amount of the work that we do overall to find a union operations, and use two quantities. Work on the good nodes, which we now know as just log star in each operation. Plus, the visits to the bad nodes, which at the moment, we have no idea how big it is. So now, let's proceed to the global bound on the total amount of work that we performed on bad nodes and this is really the crux of the whole theorem. [SOUND] So, let's just review the definitions real quick. What is that mean that you're a bad node? So, first of all, you're not the root Second of all you're not a direct descendant of the root. That is you have a grandparent. And third, it is not the case that your parents' rank is in a later rank block. Is in exactly the same rank block as your own rank. That's what it means that your. So how much work do we spend on bad nodes? Well let's analyze it one rank block at a time. So, fix an arbitrary rank block, let's say for some integer K, it's smallest rank is K + 1 and it's biggest rank is 2 to the K. Now I told you what our two main building blocks were. First 1 is the rank lemma, I am going to ask you to remember that in a second. But first I want to put to use our other building block, which is that path compression increases the rank gap between objects and their parents. That's what we're going to use right now. Specifically, consider a fine operation and a bad object x that it visits. By virtue of being bad x is not the root and is not a direct descendant of the root. So, root is a higher up ancestor than its parent. Therefore, x's parent will be changed In the subsequent path compression. It will be rewired to point directly to the root, a strict ancestor of its previous parent. Therefore, the rank of its new parent will be strictly bigger than the rank of its previous parent. That's going to keep happening every time that x is visited, y looks bad. It keeps getting new parents, and those new parents keep having ranks strictly bigger than the previous one. Well, how many times can this happen before the rank of x's parent has grown so big that it lies in a subsequent rank block? The biggest value in x's rank block and remember x is a nonroot its rank is frozen forever. So it's always stuck in this rank block. Once its parents rank gets updated let's say at least 2^k times then the rank has to be so big that it lies. In the next rank block. At that point, x is no longer bad. Its parent pointer makes so much progress, it goes to another rank block. Now, we're going to call it good. And, of course, once x becomes good in this fashion, it would be good forevermore. It is not a root, it will never be a root again. Its rank is frozen forever and its parent's rank can only go up. So, once you're good, once your parent's rank is sufficiently large. It will be sufficiently large for the rest of time. Alright, so we're almost there. Let's just make sure we don't forget anything that we have done. So, the total work we're bounding in this two ways. So, first of all, we visit log star n good nodes for each operation. So, overall m operations at O(m). log*n for the good nodes, plus there's a number of visits over the m operations to the bad nodes. We're going to bound that wrok globally but we're going to proceed rank block by rank blocks. We're fixing one ranks k+1 up to 2^k. What we shown on the last slide is that for every object x whose final frozen rank lies somewhere in this rank block. The number of times it gets visited while it's bad, the number of times it could be visited before it becomes good forever more is bounded above by 2^k. So, if now you used one of our key building blocks, that path compression increases the ranks of parents, now let's use the other building block, the rank lemma. The rank lemma, if you recall, says, that in any given moment in time and for any possible rank r, the number of objects that currently have rank r cannot be any bigger than n/2^r. So, it's used the rank lemma and apply it to the upper bound, how many nodes Could possibly have their final frozen ranks lengths somewhere in this rank block. They have their final frozen ranks somewhere between k+1 and 2^5. So, let's just sum over all of the ranks in the rank blocks, so it's starting at 2k+1 going up to 2^k. By the rank lemma, for a given value of i, we noticed the most n/2^i objects that eventually wind up with rank i and by the usual geometrical sum, this whole thing can be upper bounded by n / 2^k. So, this is now just trying to look like a magical coincidence. Of course, we made a number of definitions in the analysis. specifically, we structured the rank blocks so that this magic would happen. Specifically, the number of inhabitants of a rank block, n/2^k, times the maximum number of visits to an inhabitant in the rank block wile they're bad, times 2^k, that's actually independent of the rank block. We multiply these two things together, the number of visits per object to the k, the number of objects n/2^k and what do we get? We get n. Now, this was only counting the visits to bad objects in a given rank block but there aren't many rank block, remember, there's only log*n of them. So, that means the total amount of work spent visiting the bad nodes, some that over the rank blocks is O(m*log*n). Combining the balance of the good nodes and the bad nodes, we get (m+n) log*n. At the beginning, I mentioned that the interesting case is when m is big Omega of n, in that case, this bound is just O(m*log*n). Essentially, if you have a very sparse set of union operations, you can just apply this analysis to each of the directed trees separately. So, that's it, that's the full story of complete analysis of the brillaint Hopcroft only analysis. Log star and on average operation time, under calf compression. Brilliant as it is, you can do even there. That's the subject of the next couple videos.