So let's try thinking about this alternative lay, lazy union based approach of union-find data structure in some detail. so get this to work as quickly as we want to, we're going to need two optimizations. This video will introduce the first of those optimizations, union by rank. So let's have a quick slide making precise what we learned in the previous video. So, in our new implementation, it's still the case that each object has one extra field. You can think of it logically as a pointer to some other object,. Really, it's just recording the name of some other object, but the semantics have changed. In our first implementation of union-find with eager unions, this field we called a leader pointer, and the invariant was that every object's pointer points directly to the leader of its group. In this Lazy Union Implementation, we are relaxing that invariant, so, I'm going to change the name. I'm not going to call it a leader pointer, I'm just going to call it a parent pointer. And the new relaxed invariant is that the collection of everybody's parent pointers induces a collection of directed trees. Each tree in this collection represents one of the groups in the partition that we're maintaining. Each of these trees has a root vertex, that is, a vertex that points back to itself. It is these root vertices, these root objects, that we think of as the leaders of these componenets. So as before, we're going to refer to a group by the name of its leader vertex and that leader is by definition the roots of the corresponding directed tree. One way to think about things is that when we had eager unions, we again associated a directed tree with each of the groups, but we also insisted that it was a super shallow bushy tree, that it had depth only one, that is all you had was a root, the leader and then everybody else who pointed directly to the leader. In this lazy union implementation, we're allowing the trees to grow deeper. We don't insist that they have only one level below the root. So as far as the initialization, at birth, in a, a union-find data structure just consists of n degenerate trees, each object just initially points to itself and is a root at the beginning. In general, how do you implement find from some object? Well, you just have to return the leader of its group, and we know you can get to the leader just by traversing parent pointers from x until you get to the root, until you get to some object that points back to itself, that's where you return at the end of the find call. How do you do union? Well, given two objects, x and y, you have to find its respective roots, so you just invoke the find operation twice. Once for x to get to its root s1, once from y, to get its root s2, and then you install either s1 or s2 as a child of the other. So you do one pointer update after the two finds. So for now, I'm going to leave union underdetermined. I'm not going to give you a rule to decide which of s1 and s2 becomes the child of the other. On the next quiz, I want you to think through, you know, what would be the worst case running time of these operations, if we make that choice arbitrarily? If we just pick s1 or s2 arbitrarily to become the child of the other. Alright. So the correct answer unfortunately is the last one, both find and union can actually blow up to linear time operations, if you're not careful about how you implement the union. So why is this true? Well, first let me just point out that B is definitely not the correct answer. Whatever, finding the unions worst case running time is, it's definitely the same for both of them. Remember, union reduces to two calls to find plus constant work, so that's going to tell you that they're both going to have the same operation time in the worst case. Now, why is it linear? The reason you're stuck with linear time is because the trees that you build through a sequence of unions can in the worst case become very scraggly. And to see what I mean, imagine you just do a sequence of unions, and in each time you're unioning a group with just one object into the group that you built up so far. So if you union together a group of size two and a group of size one, well if you choose arbitrarily, maybe you make the group fo size one the new root. So now you just got a chain of the three objects. Maybe you union in another singleton group and arbitrarily you choose the singleton group to be the new leader, that gives you a chain of four objects, and so on. So if you do a linear number of unions and you're not careful about who you make a child of the other, you might well end up with a chain of linear length, and then if you do finds for the objects toward the bottom of that chain, they're going to require linear work. And again, union has a subroutine of finds, so it's going to then take linear work as well. So when you see this example you know, I hope your immediate reaction is, well, geez, we can certainly do better than that. We can just be somewhat intelligent with when we do a union, which of the old roots do we install as the child of the other one? There's a rough analogy with how we made a natural optimization with our eager union implementation and union find. There, we had to make some kind of choice between which of the two old leaders do we retain. We retain the one for the bigger root to minimize the number of leader updates. So here, the natural thing to do is, well, you know, look at which of the two trees is already deeper and we want to keep the root of that deeper tree, that's to avoid it from getting deeper still. So we install the root of the shallower tree as a child of the deeper one. Let's make that precise on the next slide with the notion of the rank of an object. So the rank is just going to be a second field along with the parent pointer that we maintain for each object. It's just going to be some integer. Let me tell you how I want you to think about the ranks, at least for now. Okay? So I'm going to give you some semantics, pouring out to semantics, but I want to warn you, they're only going to hold valid for the next couple of videos. When we introduce something else called path compression, these semantics are going to break down, but initially, this is how I want you to think about ranks. It's going to be the maximum number of hops required to get from a leaf of xs tree up to x itself. So for example if, X is a root, this is going to be the worst case length, the maximum length of any leaf to root path in the tree. So just to make sure this is clear, let me draw a quick example. So here in blue, I've shown you one particular union-find data structure that has two groups. Now, it's very easy to compute the rank of leaves, that's 0, because the only path from a leaf to that node is the empty path. There are two nodes that have rank one, then the root of the bigger tree has rank two. And, if you think about it, in general, the rank of a node is going to be one plus the largest rank of any of its children. For example, if you're in node x, you have a child y and there's some path from a leaf z up to y that requires five pointer traversals, while going from z all the way up to u at x is going to require six pointer traversals. And of course, when you initialize the data structure, you want, you want to set everybody's rank to 0, because everybody is just is in a tree by themselves at the beginning. This notion of rank is going to be a super crucial concept throughout this entire sequence of lectures on union-find. So make sure you understand this definition, I encourage you to draw out some of your own examples, do some computations to get a better feel for the concept. For the moment, we're just going to use rank to avoid creating scraggly trees. So let's go back to the bad example in the quiz. Let's remember what the intuition was. We wound up with this long chain and therefore really bad, worst case running time for our operations, because we were stupidly taking a tree that was already pretty deep and installing it as a child under a single node, under a super shallow tree, thereby producing a tree which is even deeper. So the obvious way to avoid that is when faced with merging together a deep tree and a shallow tree, you install the shallow tree as a child under the deep. one. That's going to slow down this deepening process as much as possible. So, we can make that precise, just using this rank notion. Notice that the rank is exactly quantifying the depth of the tree. So the rank of the roots of a tree by this invariant is equal to the depth of the tree. So if you want to make the shallower tree, the child of the other one, that means you want to make it the smaller rank tree, the child of the bigger rank tree. This optimization is what is meant by union by rank. There still remains the case where the two roots have equal rank that is, with a two trees that were fusing, have exactly the same maximum path length, and in that case, we do just arbitrarily choose one of them. So in the pseudocode I've written here, I've arbitrarily chosen y's root to be the one that remains the root, x's root, S1 is going to be installed as a child of S2. But again, you could do it either way, it's not going to matter. Now, we're not off the hook yet. Remember, this is a data structure where we're intending for an invariant to, to hold to the semantics for the ranks as quantifying worst case path length to the node. And, we made a modification to the data structure. We rewired somebody's parent pointer. So now it's our responsibility to check, does the invariance still hold? And if it doesn't still hold, then we have to restore it, hopefully, using minimal extra work. So in this quiz, we're going to think through how ranks change, when we do a union. So remember the node notation, we're in the middle of a union of the objects x and y, s1 and s2 refer the ranks, that is the leaders, of the groups of the trees that contain x and y respectively. The first thing I want you to think through and convince yourself of, is that for objects other than s1 and s2, other than the pair of objects they're involved in the actual point of rewiring, nobody's rank changes. So make sure you think that through and convince yourself ranks are invariant, except possibly for the objects, s1 and s2. In this quiz, we're going to drill down on how did the ranks of s1 and s2 change, given that we took one of them and rewired its parent pointer to point to the other. Okay. So the correct answer to this quiz is the fourth one. So remarkably, in the, in many cases, there's no change at all to anybody's ranks, including s1 or s2. If the ranks were different before the union, both of the ranks stay the same. The one exception is when you take the union of two ranks which are equal, then, whichever one is the new root, and then the pseudocode that I showed you s2, is chosen to be the new root. It's rank is incremented, it goes up by 1. I think the easiest way to see this is just with a couple pictures, a couple examples. So let me just draw two directed trees in the corresponding ranks. And in this one, S1 has strictly a bigger rank. Its rank is 2, so we install s2 as a child under s1. So recomputing the ranks in the fused together tree, we see that, again, the rank of the root at s1 is 2. Same thing as it was before, it didn't go up. So what's the general principle here? Well, in general, suppose you have these two trees and suppose the first ones a root, s1 has a bigger rank, say rank r. What does that mean? So that means that there exists a path from the leaf to the root s1, that requires a full r hops. And in the second tree, by virtue of its rank being less than r, [INAUDIBLE] minus 1, it says that every single path from a leaf to the root s2 of that tree uses only r minus 1 hops. So when you install s2 as a child under s1, well, you know the length of every leaf to root path, now the root is s1, it's only gone up one. You just have to get to s2 and then one more hop, you make it all the way to s1. So if all those paths took at most r - 1 before, paths all the way up to s1, take at most r hops now. So, the worst case path link has not gone up. That's exactly what happens in this example so it's true that when we install s2 under s1, we have yet another path that requires 2 hops, that goes through s2, but we had one previously, so the rank doesn't go up, the worst case path links exactly the same as before. And this is same kind of reasoning explains why the rank has to get bumped up by one when you fuse together two trees that have the same rank. So suppose s1 and s2 both have rank r, well then, both of them have a leaf to root path that requires a full or halves. And now, when you install s1 is child under s2, that path where you needed r hops from the leaf to get to s1, now to get all the way up to s2 you need one more hop. So now, suddenly there exists a path from a leaf to the new root s2 that requires a full r + 1 hops, that's why the ranks gotta go up.