So welcome to this optional sequence of videos on State of the Art Implementations of the Union-Find Data Structure. Now, as advanced in optional material, let me make a few comments before we get started. So the first comment is, I'm going to assume that you're in a particularly motivated mood. Now, no ones forcing you to take to these algorithms courses, so I'm always assuming that you're a very motiviated individual. Individual. But that's going to hold even more true than usual, in these advanced videos. And the way that plays out is, well I'm going to hold myself to the same exacting standards of clarity, that I usually do. I'm going to ask a little bit more of you, the viewer. I'm going to perhaps, dot a few less I's, cross a few less T's, than I usually do. And so you may find yourself periodicially needing to pause, and think through some of the details. That I'm glossing over. The second comment is that I'm going to give sort of short shrift to applications of the union-find data structure. That's not cause there aren't any, there are plenty of applications of this data structure. But for these videos, we're really just going to immerse ourselves, in the beauty and the depth of ideas behind the design of the data structures, and especially the analysis of their performance. The final comment is to keep in mind that this is some seriously next level material. It is totally normal that the first time you see this stuff, you find it confusing, you find it difficult to understand. So confusion should not discourage you. It does not represent any intellectual failing on your part. Rather, keep in mind, it represents an opportunity to get even smarter. So, with those comments in mind, let's go ahead and proceed to a different approach to the union-find data structure via lazy unions. So let's have one slide with a quick review of what we already know about the union-find data structure. If you recall, we discussed this in the context of a fast implementation of Kruskal's minimum spanning tree algorithm. The raison d'ĂȘtre of a union-find data structure, is to maintain a partition of a universe of objects. So in the context of Kruskal's algorithm, the objects were the vertices. And the groups we wanted to maintain, were the connecting components, with respect to the edges we committed ourselves to so far. The data structure should support two operations. No prizes for guessing the names of those 2 operations. First is the find operation. This is given an object from the universe, return the name of that object's group. So for example, if x was in the middle of this square that I've put on the right, representing the universe X, we will return to c three the the name of the group that contains x . So in the context of Kruskal's algorithm, we use the find operation to check the cycles, how do we know whether adding a new edge would create a cycle with respect to edge we have already chosen, or will be if and only if the n points of that edge were already in the same connecting component that is if and only if two fine operations return the same answer. Answer. In the union operation, you're given not 1 object, but 2, call them x and y, and then the responsibility of the operation is to merge the groups that contain x and y, so for example, in the picture, if y was in C4, x was, as before, in C3, then your job is to fuse the groups C3 and C4 into a single group. In the context of Kruskal's algorithm, we needed this operation when we added a new edge. This, this fused together 2 of the existing connecting componenents into 1. So, we already went through 1 implementation of union find, that was in order to get a blazingly fast implemenation of Kruskal's algorithm, so lets just review how that worked. With each group, we associated a linked structure, so each object had one pointer, associated with it, and the invariant was in a given group, there was some representative leader object, and everybody in that group pointed to the leader of that Group. So for example, on the right I'm showing you a group with three objects x, y and z and x would be the leader of this group. All three of the objects point directly to x and that would just be the output of the Find operation, the leader of the group. We use that as the The group name. Now the part which is cool and obvious about this approach is our find operation's take constant time. All you do is return the leader of the given object. Now the tricky part was analyzing the cost of union operations. So the problem here is that to maintain the inveriant, that every object of a group points to its leader. When you fuse 2 groups, you have to update a bunch of the object's leaders. You would only have one leader for the one new. New group. The simple, but totally crucial optimization, that we discussed was, when 2 groups merged, you update the leader pointers of all objects in the smaller group, to point to the leader of the bigger group. That is, the new fused group inherits the leader from the bigger of it's constituents. Parts. If we do that optimization, it's still the case that a single union might take linear time, theta of n time, but a sequence of n unions takes only big o of n login time. And that's because each object endures at most a logarithmic number of leader updates. because every time its leader point gets updated, the population of the group it inhabits Doubles. So in this sequence of videos we are going to discuss a different approach, to implementing the union find data structure. And lets just go all-in, and try to get away with updating only, a single pointer, in each, union. All right, well how are we going to do that? Well, I think if we look at a picture, it'll be clear what the approach is going to be. So, let's look at a simple example, there's just six objects in the world, in two groups. One, two and three are in one group, with leader one, four five and six are in the second group with leader 4. So, with our previous implementation of union find and the 2 groups have equal size so we pick one arbitrary to represent the new leader. Let's say 4 is going to be the new leader and now we update objects 1, 2 and 3 so that they point directly to their new leader. Four. And so the new idea is very simple. Let's just sort of as a shorthand for rewiring all of one two and three, to point to four, let's just rewire one's pointer to point to four. And then it's understood that two and three as descendants of one, also now have the new leader four as well. So as a result, we do again get a directed tree afterwards, but we don't get one with just depth of one. We don't' get this shallow bushy a tree. We get a deeper one that has two levels below the root. So let me give you another way of thinking about this in terms of array representation. Here I also want to make the point that, you know, while conceptually it's going to be very useful to think of these unified data structure in terms of these directed trees, you're actually implementing. This data structure, this is not how you do it. You wouldn't bother with actual pointers between the different objects. You just have a simple array, indexed by the nodes and in the entry corresponding to a node i, you would just store the name of i's Parent. For instance, if we reframe this exact same example, in a ray representation, before we do the union, what do we got? Well, objects 1, 2, and 3 all have parent 1. Objects 4, 5, and 6 all have parent 4. So in the array, we'd have a 1, in the first 3 entries, and a 4 in the second 3 entries. So in the old solution we have to update the parent pointers of objects 1, 2 and 3.They are all going to point directly to 4 so that gives us an array entirely of 4's.Whereas in the new solution, the only node whose parent pointer gets updated is that of object 1, it gets updated from itself to 4. And in particular, objects 2 and 3 still point to object 1, not directly to object 4. So that's how union works in a simple example, how does it work in general. Well, in general you're given 2 objects, each belongs to some group. You can link up these 2 groups conceptually as directed trees. If you follow all the parent pointers, all parent pointers lead to the root vertex and we identify those root vertices with the leaders. Of these two groups. Now, when you have to merge the two trees together, how do you do it? Well, you pick one of the groups and you look at it's root. Currently, it points to itself. You change its parent pointer to point to the other group's leader, that is, you install one of the two roots as a new child of the other tree's root. So, I hope the pros and cons of this alternative approach to the union fine data structure are intuitively clear, the when, the pro comes in the union operation, which is now very simple. Now you might be tempted to just say union takes constant time, because all you do is update one pointer, it's actually not so simple. Right because remember you are given two objects x and y and in general you aren't linking x and y together they might be somewhere deep in a tree. You are linking roots together, so what is that the union operation reduces to, 2 indicatations of find you have to find x's root r r's of 1 you have to find y's root R sub 2, and then you just do constant time linking, either R1 to R2, or vice versa. Now the issue, with this lazy union approach, is that, it's not at all obvious, and in fact it's not going to be true, that the find operation takes constant time. Remember, previously, when we actually did all this hard work of updating all these leader pointers every time we had a union. It guaranteed that whenever there was a find boom we just look in the field we return the leader pointer, we're done. Here, parent pointers do not point directly to roots rather, you have to traverse in general a sequence of parent pointers from a given object x to go all the way up to the root of the corresponding tree. So because of these trade offs, these pros and cons, it's really not obvious at all whether this lazy union approach to the unifying data structure is a good idea, whether it's going to have any payoffs. So that's going to take some really quite subtle analysis, which is the main point of the lectures to come. It's also going to take a couple of optimizations, and in the next video, we'll start with the first one. You might be wondering, okay, so when you do a union, and you have to install 1 root, under the other, how do you make that choice? So the right answer is something called Union By Rank. That's coming up next.