1 00:00:00,000 --> 00:00:03,642 So in this video, we'll start talking about the heap data structure. 2 00:00:03,642 --> 00:00:07,230 So, in this video, I want to be very clear on what are the operations 3 00:00:07,230 --> 00:00:11,525 supported by a heap, what running time guarantees you can expect from canonical 4 00:00:11,525 --> 00:00:15,874 implementations, and I want you to get a feel for what kinds of problems they're 5 00:00:15,874 --> 00:00:18,537 useful for. In a separate video, we'll take a peek 6 00:00:18,537 --> 00:00:22,778 under the hood and talk a little bit about how heaps actually get implemented. 7 00:00:22,778 --> 00:00:26,040 But for now, let's just focus on how to use them as a client. 8 00:00:27,060 --> 00:00:31,105 So, the number one thing you should remember about a given data structure is 9 00:00:31,105 --> 00:00:35,204 what operations it supports, and what is the running time you can expect from 10 00:00:35,204 --> 00:00:37,866 those operations. So, basically, a heap supports two 11 00:00:37,866 --> 00:00:41,752 operations. There's some bells and whistles that you can throw on, but the 12 00:00:41,752 --> 00:00:44,680 two things you got to know is insertion and extract min. 13 00:00:44,680 --> 00:00:47,449 Alright. So, the first thing I have to say about a 14 00:00:47,449 --> 00:00:50,557 heap is that's it's a container for a bunch of objects. 15 00:00:50,557 --> 00:00:53,044 And each of these objects should have a key, 16 00:00:53,044 --> 00:00:55,983 like a number. So that for any given objects, you can 17 00:00:55,983 --> 00:00:59,600 compare their keys and say one key is bigger than the other key. 18 00:00:59,600 --> 00:01:04,009 So, for example, maybe the objects are employee records and the key is social 19 00:01:04,009 --> 00:01:07,061 security numbers. Maybe the objects are the edges of a 20 00:01:07,061 --> 00:01:11,413 network and the keys are something like the length or the weight of the edge. 21 00:01:11,413 --> 00:01:15,991 Maybe each object indicates an event and the key is the time that event is meant 22 00:01:15,991 --> 00:01:18,892 to occur. Now, the number one thing you should 23 00:01:18,892 --> 00:01:24,076 remember about a given data structure is first of all, what are the operations 24 00:01:24,076 --> 00:01:29,393 that it supports, and second of all, what is the, running time you can expect from 25 00:01:29,393 --> 00:01:33,114 those operations. For heap, essentially, there's two basic 26 00:01:33,114 --> 00:01:36,437 operations, insert and extract the object that has 27 00:01:36,437 --> 00:01:40,774 the minimum key value. So in our discussion of heaps, we're 28 00:01:40,774 --> 00:01:44,051 going to allow ties. They're pretty much equally easy with or 29 00:01:44,051 --> 00:01:46,953 without ties. So when you extract min from a heap, they 30 00:01:46,953 --> 00:01:50,499 may have duplicate key values. Then, there's no specification about 31 00:01:50,499 --> 00:01:53,830 which one you get. You just get one of the objects that has 32 00:01:53,830 --> 00:01:57,681 a tie for the minimum key value. Now, of course, there's no special reason 33 00:01:57,681 --> 00:02:00,878 that I chose to extract the minimum rather than the maximum. 34 00:02:00,878 --> 00:02:05,248 either you could have second notion of the heap, which is the max heap, which 35 00:02:05,248 --> 00:02:09,405 always returns the object to the maximum key value, or if all you have it, your 36 00:02:09,405 --> 00:02:11,962 disposal is one of these extract min type heaps. 37 00:02:11,962 --> 00:02:16,439 You can just negate the sign of all of the key values before you insert them and 38 00:02:16,439 --> 00:02:19,530 then extract min or actually extract the max key value. 39 00:02:19,530 --> 00:02:23,661 So, just to be clear, I'm not proposing here a data structure that supports 40 00:02:23,661 --> 00:02:27,513 simultaneously an extract-min operation and an extract-max operation. 41 00:02:27,513 --> 00:02:31,812 If you wanted both of those operations, there'd be data structures that would 42 00:02:31,812 --> 00:02:34,715 give it to you. Probably, a binary search tree is the 43 00:02:34,715 --> 00:02:38,847 first thing you'd want to consider. So, I'm just saying, you can have a heap 44 00:02:38,847 --> 00:02:42,364 of one of two flavors. Either the heap supports extract-min and 45 00:02:42,364 --> 00:02:46,440 not extract-max, or the heap will support extract-max and not extract-min. 46 00:02:47,460 --> 00:02:52,128 So, I mentioned that you should remember not just the supported operations of a 47 00:02:52,128 --> 00:02:55,221 data structure, but what is the running time of those 48 00:02:55,221 --> 00:02:57,614 operations? Now for the heap, the way it's 49 00:02:57,614 --> 00:03:02,166 economically implemented, the running time you should expect is logarithmic in 50 00:03:02,166 --> 00:03:06,135 the number of items in the heap. That's log base two with quite good 51 00:03:06,135 --> 00:03:09,111 constants. So when you think about heaps, you should 52 00:03:09,111 --> 00:03:13,663 absolutely remember these two operations. Optionally, there's a couple of other 53 00:03:13,663 --> 00:03:16,640 things about heaps that might be worth remembering, 54 00:03:16,640 --> 00:03:20,180 some additional operations that they can support. 55 00:03:20,180 --> 00:03:22,677 So the first is an operation called heapify. 56 00:03:22,677 --> 00:03:26,935 Like a lot of the other stuff about heaps, it has a few other names as well, 57 00:03:26,935 --> 00:03:29,660 but I'm going to call it heapify. One standard name. 58 00:03:29,660 --> 00:03:33,293 And the point of heapify is to initialize a heap in linear time. 59 00:03:33,293 --> 00:03:37,607 Now, if you have n things and you want to put them all in a heap, obviously you 60 00:03:37,607 --> 00:03:40,218 could just invoke insert once per each object. 61 00:03:40,218 --> 00:03:44,249 If you have n objects, it seems like that would take n times login time. 62 00:03:44,249 --> 00:03:48,336 Login for each of the n inserts. But there's a slick way to do them in a 63 00:03:48,336 --> 00:03:52,500 batch, which takes only linear time. So that's the heapify operation. 64 00:03:52,500 --> 00:03:56,517 And another operation which can be implementedn although there are some 65 00:03:56,517 --> 00:04:00,761 subtleties is you can delete not just the minimum but you can delete an ar, 66 00:04:00,761 --> 00:04:04,722 arbitrary element from the middle of a heap, again in algorithmic time. 67 00:04:04,722 --> 00:04:09,136 I mention this here primarily because we are going to use this operation when we 68 00:04:09,136 --> 00:04:11,400 use heaps to speed up Dijkstra's algorithm. 69 00:04:11,400 --> 00:04:15,475 So that's the gist of a heap. You maintain objects that have keys you 70 00:04:15,475 --> 00:04:20,201 can insert in logarithmic time, and you can find the one with the minimum key in 71 00:04:20,201 --> 00:04:22,977 logarithmic time. So let's turn to applications. 72 00:04:22,977 --> 00:04:26,285 I'll give you several. but before I dive into any one 73 00:04:26,285 --> 00:04:29,652 application, let me just say, what's the general principle? 74 00:04:29,652 --> 00:04:33,787 What should trigger you to think that maybe you want to use a heap data 75 00:04:33,787 --> 00:04:37,509 structure in some task? So, the most common reason to use a heap 76 00:04:37,509 --> 00:04:41,880 is if you notice that your program is doing repeated minimum computations, 77 00:04:41,880 --> 00:04:46,134 especially via exhaustive search. And most of the applications that we go 78 00:04:46,134 --> 00:04:49,994 through will have this flavor. It'll be, there'll be a naive program 79 00:04:49,994 --> 00:04:54,027 which does a bunch of repeated minimums, using just brute force search. 80 00:04:54,027 --> 00:04:58,175 And we'll see that a very simple application of a heap will allow us to 81 00:04:58,175 --> 00:05:01,632 speed it up tremendously. So, let's start by returning to the 82 00:05:01,632 --> 00:05:05,760 mother of all computational problems, sorting and unsorted array. 83 00:05:05,760 --> 00:05:10,623 Now, a sorting algorithm which is sort of so obvious and suboptimal that I didn't 84 00:05:10,623 --> 00:05:15,056 really even bother to talk about it in any other point in the course, is 85 00:05:15,056 --> 00:05:18,000 selection sort. When you do selection sort, you do a scan 86 00:05:18,000 --> 00:05:21,744 through the unsorted array, you find the minimum element, you put that in the 87 00:05:21,744 --> 00:05:25,489 first position. You scan through the other n - 1 elements, you find the 88 00:05:25,489 --> 00:05:29,233 minimum among them, you put that in the second position, you scan through the 89 00:05:29,233 --> 00:05:33,224 remaining n - 2 unsorted elements, you find the minimum, you put that in the 90 00:05:33,224 --> 00:05:36,673 third position, and so on. So evidently, this sorting algorithm does 91 00:05:36,673 --> 00:05:39,186 a linear number of linear scans through this array. 92 00:05:39,186 --> 00:05:43,078 So this is definitely a quadratic time algorithm that's why I didn't bother to 93 00:05:43,078 --> 00:05:46,738 tell you about it earlier. So, this certainly fits the bill as being 94 00:05:46,738 --> 00:05:51,205 a bunch of repeated minimum computations, or for each computation we're doing 95 00:05:51,205 --> 00:05:54,338 exhaustive search. So this, we should just, a light bulb 96 00:05:54,338 --> 00:05:57,876 should go off and say, aha. Can we do better using a heap data 97 00:05:57,876 --> 00:05:59,095 structure? And we can. 98 00:05:59,095 --> 00:06:02,460 And the sorting algorithm that we get is called heap sort. 99 00:06:02,460 --> 00:06:06,633 And given the heap data structure, this sorting algorithm is totally trivial. 100 00:06:06,633 --> 00:06:11,027 We just insert all of the elements from the array into the heap, then we extract 101 00:06:11,027 --> 00:06:14,267 the minimum one by one. From the first extraction we get the 102 00:06:14,267 --> 00:06:18,441 minimum of all n elements, the second extraction gives us the minimum of the 103 00:06:18,441 --> 00:06:22,945 remaining n - 1 elements, and so on. So when we extract min one by one, we can 104 00:06:22,945 --> 00:06:27,080 just populate a sorted array from left to right, boom, we're done. 105 00:06:27,080 --> 00:06:33,010 What is the running time of heap sort? Well, we insert each element once and we 106 00:06:33,010 --> 00:06:36,330 extract each element once, so that's 2n heap operations. 107 00:06:36,330 --> 00:06:40,968 And what I promised you is that you can count on heaps being implemented so that 108 00:06:40,968 --> 00:06:45,606 every operation takes logarithmic time. So we have a linear number of logarithmic 109 00:06:45,606 --> 00:06:48,240 time operations for a running time of N log N. 110 00:06:48,240 --> 00:06:51,240 So, let's a step back and appreciate what just happened. 111 00:06:51,240 --> 00:06:55,556 We took the least imaginative sorting algorithm possible, selection sort, which 112 00:06:55,556 --> 00:06:59,208 is evidently quadratic time. We recognized the pattern of repeated 113 00:06:59,208 --> 00:07:02,694 minimum computations. We swapped in the heap data structure and 114 00:07:02,694 --> 00:07:06,401 boom, we get an N log N sorting algorithm, which is just two trivial 115 00:07:06,401 --> 00:07:08,836 lines. And remember, N log N is a pretty good 116 00:07:08,836 --> 00:07:13,096 running time for a sorting algorithm. This is exactly the running time we had 117 00:07:13,096 --> 00:07:16,195 for mergesort. This was exactly the average running time 118 00:07:16,195 --> 00:07:20,234 we got from randomized quicksort. Moreover, heapsort is a comparison-based 119 00:07:20,234 --> 00:07:23,262 sorting algorithm. We don't use any data about the key 120 00:07:23,262 --> 00:07:26,278 elements, we just used it as a totally ordered set. 121 00:07:26,278 --> 00:07:30,924 And as some of you may have seen in an optional video, there does not exist a 122 00:07:30,924 --> 00:07:35,327 comparison-based sorting algorithm with running time better than N log N. 123 00:07:35,327 --> 00:07:39,590 So for the question can we do better, the answer is no if we use a 124 00:07:39,590 --> 00:07:43,090 comparison-based sorting algorithm like heap sort. 125 00:07:43,090 --> 00:07:47,466 So, that's pretty amazing. All we do is swap in a heap and the right time drops 126 00:07:47,466 --> 00:07:51,618 from the really quite unsatisfactory quadratic to the optimal N log N. 127 00:07:51,618 --> 00:07:55,882 Moreover, heapsort's a pretty practical sorting algorithm. You run this, it's 128 00:07:55,882 --> 00:07:58,744 going to go really fast. Is it as good as Quicksort? 129 00:07:58,744 --> 00:08:02,783 Mm-hm, maybe not quite. But it's close, it's getting into the same ballpark. 130 00:08:02,783 --> 00:08:06,935 So, let's look at another application which, which frankly, in some sense, is 131 00:08:06,935 --> 00:08:10,021 almost trivial. but this is also a connotical way in 132 00:08:10,021 --> 00:08:13,821 which heaps are used. And in this application, it will be 133 00:08:13,821 --> 00:08:17,870 natural to call a heap by a synonymous name, a priority queue. 134 00:08:17,870 --> 00:08:22,169 So, what I want you to think about for this example is that, you've been tasked 135 00:08:22,169 --> 00:08:26,082 with writing software that performs a simulation of the physical world. 136 00:08:26,082 --> 00:08:30,657 So you might pretend, for example that you're helping writing a video game which 137 00:08:30,657 --> 00:08:33,680 is for basketball. Now, why would a heap come up in a 138 00:08:33,680 --> 00:08:37,262 simulation context? Well, the objects in this application are 139 00:08:37,262 --> 00:08:41,144 going to be events records. So, an event might be, for example, that 140 00:08:41,144 --> 00:08:44,129 the ball will reach the hoop at a particular time. 141 00:08:44,129 --> 00:08:47,951 And that would be because a player shot it a couple seconds ago. 142 00:08:47,951 --> 00:08:50,638 When if, for example, the ball hits the rim, 143 00:08:50,638 --> 00:08:55,295 that could trigger another event to be scheduled for the near future which is 144 00:08:55,295 --> 00:08:58,900 that, a couple players are going to vie for the rebound. 145 00:08:58,900 --> 00:09:03,100 That event in turn, could trigger the scheduling of another event, which is one 146 00:09:03,100 --> 00:09:07,515 of these players commits an over the back foul on the other one and knocks him to 147 00:09:07,515 --> 00:09:10,315 the ground. That in turn could trigger another event 148 00:09:10,315 --> 00:09:14,569 which is, the player that got knocked on the ground, gets up and argues the foul 149 00:09:14,569 --> 00:09:17,584 call, and so on. So when you have event records like this, 150 00:09:17,584 --> 00:09:21,838 there's a very natural key, which is just the time stamp. The time in which this 151 00:09:21,838 --> 00:09:24,640 event, in the future, is scheduled to occur. 152 00:09:24,640 --> 00:09:28,457 Now, clearly a problem which has to get solved over and over and over again in 153 00:09:28,457 --> 00:09:32,323 this kind of simulation, is you have to figure out what's the next that's going 154 00:09:32,323 --> 00:09:34,623 to occur. You have to know what other events to 155 00:09:34,623 --> 00:09:37,608 schedule, you have to know how to update the screen, and so on. 156 00:09:37,608 --> 00:09:41,180 So, that's a minimum computation. So, a very silly thing you can do is just 157 00:09:41,180 --> 00:09:45,095 maintain an unordered list of all of the list that have ever been scheduled, and 158 00:09:45,095 --> 00:09:47,738 do a linear pass through them and compute the minimum. 159 00:09:47,738 --> 00:09:51,261 But you're going to be computing the minimums over and over and over again. 160 00:09:51,261 --> 00:09:55,078 So again, that light bulb should go on and you can say, maybe a heap is just 161 00:09:55,078 --> 00:09:57,330 what I need for this problem, and indeed it is. 162 00:09:57,330 --> 00:10:01,286 So if you're storing these event records in a heap with the key being the time 163 00:10:01,286 --> 00:10:05,346 stamps, then when you extract min, the heap hands for you on a silver platter 164 00:10:05,346 --> 00:10:09,140 using only logarithmic time exactly which event is going to occur next. 165 00:10:09,140 --> 00:10:14,249 So let's move on to a less obvious application of heaps, which is a problem 166 00:10:14,249 --> 00:10:19,135 I'm going to call median maintenance. The way that this is going to work is you 167 00:10:19,135 --> 00:10:23,246 and I are going to play a little game. So, on my side, what I'm going to do is 168 00:10:23,246 --> 00:10:27,808 I'm going to pass you index cards, one at a time when there's a number written on 169 00:10:27,808 --> 00:10:30,971 each index card. Your responsibility is to tell me at each 170 00:10:30,971 --> 00:10:34,579 time step the median of the numbers that I passed you so far. 171 00:10:34,579 --> 00:10:38,995 So, after I've given you the first eleven numbers, you should tell me as quickly as 172 00:10:38,995 --> 00:10:42,873 possible the six smallest. After I've given you the thirteen numbers, you 173 00:10:42,873 --> 00:10:45,242 should tell me the seven smallest, and so on. 174 00:10:45,242 --> 00:10:49,389 Moreover, we know how to compute the median in linear time, but the last thing 175 00:10:49,389 --> 00:10:53,643 I want is for you to be doing a linear time computation every single time step. 176 00:10:53,643 --> 00:10:57,736 But I will only give you one new number. Do you really have to do linear time is 177 00:10:57,736 --> 00:11:00,860 to recompute the median if I just gave you one new number? 178 00:11:00,860 --> 00:11:04,432 So to make sure that you don't run a linear time selection algorithm every 179 00:11:04,432 --> 00:11:08,052 time I give you one new number, I'm going to put a budget on the amount of time 180 00:11:08,052 --> 00:11:10,768 that you can use at each time step to tell me the median. 181 00:11:10,768 --> 00:11:14,483 And it's going to be logarithmic in the number of numbers I've passed you so far. 182 00:11:14,483 --> 00:11:18,342 So, I encourage you to pause the video at this point and spend some time thinking 183 00:11:18,342 --> 00:11:23,548 about how you would solve this problem. Alright. 184 00:11:23,548 --> 00:11:27,881 So hopefully, you've thought about this problem a little bit, so let me give you 185 00:11:27,881 --> 00:11:30,405 a hint. What if you use two heaps, do you see a 186 00:11:30,405 --> 00:11:34,875 good way to solve this problem then? Alright. 187 00:11:34,875 --> 00:11:39,270 So let me show you a solution to this problem that makes use of two heaps. 188 00:11:39,270 --> 00:11:44,221 So the first heap we'll call H low, this heap will support extract max. Remember, 189 00:11:44,221 --> 00:11:48,796 we discussed that a heap, you could pick whether it supports extract min or 190 00:11:48,796 --> 00:11:52,181 extract max. You don't get both but you can get either 191 00:11:52,181 --> 00:11:56,067 one, it doesn't matter. And then we'll have another heap H high, 192 00:11:56,067 --> 00:12:00,016 which supports extract min. And the key idea is to maintain the 193 00:12:00,016 --> 00:12:05,030 invariant that the smallest half of the numbers that you've seen so far are all 194 00:12:05,030 --> 00:12:10,044 in the low heap, and the largest half of the numbers that you've seen so far are 195 00:12:10,044 --> 00:12:13,018 in the high heap. So, for example, after you've seen the 196 00:12:13,018 --> 00:12:16,940 first ten elements, the smallest five of them should reside in H low and the 197 00:12:16,940 --> 00:12:19,261 biggest five of them should reside in H high. 198 00:12:19,261 --> 00:12:23,286 After you've seen twenty elements then the bottom ten, the smallest ten should, 199 00:12:23,286 --> 00:12:26,743 should reside in H low and the largest ten should reside in H high. 200 00:12:26,743 --> 00:12:30,612 If you've seen an odd number either one can be bigger, it doesn't matter. 201 00:12:30,612 --> 00:12:34,688 So, if you have 21, you have the smallest ten and the one and the biggest eleven 202 00:12:34,688 --> 00:12:37,320 and the other, or vice versa. It's not, not important. 203 00:12:37,320 --> 00:12:42,054 Now given this key idea of splitting the elements in half according to the two 204 00:12:42,054 --> 00:12:46,900 heaps, you need two realizations, which I'll leave for you to check. 205 00:12:46,900 --> 00:12:51,411 So, first of all, you have to prove, you can eventually maintain this invariant 206 00:12:51,411 --> 00:12:55,687 with only O of log i worked and step i. Second of all, you, after you realize 207 00:12:55,687 --> 00:12:58,910 this invariant, allows you to solve the desired problem. 208 00:12:58,910 --> 00:13:03,258 So let me just quickly talk through both of these points and then you can think 209 00:13:03,258 --> 00:13:07,497 about it in more detail on your own time. So, let's start with the first one. 210 00:13:07,497 --> 00:13:11,573 How can we maintain this invariant using only log i work at time step i? 211 00:13:11,573 --> 00:13:15,106 This is a little tricky. So, let's suppose we've already processed 212 00:13:15,106 --> 00:13:18,584 the first twenty numbers. And the smallest ten of them, we've all 213 00:13:18,584 --> 00:13:20,813 worked hard to, to put them only in H low. 214 00:13:20,813 --> 00:13:24,509 And the biggest ten of them we've worked hard to put only in H high. 215 00:13:24,509 --> 00:13:28,797 Now here's a preliminary observation. What's true, so what do we know about the 216 00:13:28,797 --> 00:13:32,795 maximum element in H low? Well, these are the tenth smallest overall and the 217 00:13:32,795 --> 00:13:35,572 maximum then is the biggest of the tenth smallest. 218 00:13:35,572 --> 00:13:39,571 So that will be the tenth order statistic, so the tenth smallest overall. 219 00:13:39,571 --> 00:13:42,792 Now, what about in the high heap, what is its minimum value? 220 00:13:42,792 --> 00:13:47,179 Well, those are the biggest ten values, so the minimum of the ten biggest values 221 00:13:47,179 --> 00:13:51,728 would be the eleventh order statistic, okay? So the maximum of H low is the 222 00:13:51,728 --> 00:13:56,534 tenth order statistic, the minimum of H high is the eleventh order statistic. 223 00:13:56,534 --> 00:13:59,650 They're right next to each other. These are in fact the two medians right 224 00:13:59,650 --> 00:14:03,753 now. So, when this new element comes in, the 21st element comes in, we need to 225 00:14:03,753 --> 00:14:07,457 know which heap to insert it into. And well, if it's smaller than the tenth 226 00:14:07,457 --> 00:14:11,560 order statistic then it's still going to be in the bottom, then it's in the bottom 227 00:14:11,560 --> 00:14:14,163 half of the elements. It needs to go in the low heap. 228 00:14:14,163 --> 00:14:17,816 If it's bigger than the eleventh order statistic, if it's bigger than the 229 00:14:17,816 --> 00:14:21,419 minimum value of the high heap then that's where it belongs, in the high 230 00:14:21,419 --> 00:14:23,692 heap. If it's wedged in between the tenth and 231 00:14:23,692 --> 00:14:25,917 eleventh order statistics, it doesn't matter. 232 00:14:25,917 --> 00:14:28,900 We can put it in either one, this is the new medium anyways. 233 00:14:28,900 --> 00:14:32,743 Now, we're not done yet with this first point because there's a problem with 234 00:14:32,743 --> 00:14:36,131 potential imbalance. So, imagine that the 21st element comes 235 00:14:36,131 --> 00:14:38,659 up and it's less than the maximum of the low heap. 236 00:14:38,659 --> 00:14:42,249 So we stick it in the low heap, and now that has a population of eleven. 237 00:14:42,249 --> 00:14:46,800 And now imagine the 22nd number comes up, and that again, is less than the maximum 238 00:14:46,800 --> 00:14:50,087 element in the low heap. So again, we have to insert it in the low 239 00:14:50,087 --> 00:14:52,653 heap. Now, we have twelve elements in the low 240 00:14:52,653 --> 00:14:57,280 heap, but we only have ten in the right heap, so we don't have a 50-50 split of 241 00:14:57,280 --> 00:14:59,653 the numbers. But, we can easily rebalance. 242 00:14:59,653 --> 00:15:04,280 We just extract the max from the low heap and we insert it into the high heap. 243 00:15:04,280 --> 00:15:09,008 And boom, now they both have eleven and the low heap has the smallest eleven and 244 00:15:09,008 --> 00:15:13,204 the high heap has the biggest eleven. So, that's how you maintain the 245 00:15:13,204 --> 00:15:17,991 invariants that you have this 50-50 split in terms of the small and the high and 246 00:15:17,991 --> 00:15:20,804 the two heaps. You check where it lies with respect to 247 00:15:20,804 --> 00:15:23,593 the max of the low heap and the min in the, of the high heap. 248 00:15:23,593 --> 00:15:26,800 You put it in the appropriate place. And whenever you need to do some 249 00:15:26,800 --> 00:15:30,332 rebalancing, you do some rebalancing. And this uses only a constant number of 250 00:15:30,332 --> 00:15:32,331 heap operations when a new number shows up, 251 00:15:32,331 --> 00:15:35,120 so that's log i work. So, now, given this discussion, it's easy 252 00:15:35,120 --> 00:15:38,187 to see the second point. Given that this invariant is true at each 253 00:15:38,187 --> 00:15:40,000 time step, how do we compute the median? 254 00:15:40,000 --> 00:15:44,200 Well, it's going to be either the maximum of the low heap and or the minimum of the 255 00:15:44,200 --> 00:15:47,242 high heap depending on whether i is even or odd. 256 00:15:47,242 --> 00:15:52,074 If it's even, both of those are medians, if i is odd than it's just whichever heap 257 00:15:52,074 --> 00:15:56,625 has one more element than the other one. So the final application we'll talk about 258 00:15:56,625 --> 00:16:00,287 in detail in a different video. A video concerned with the running time 259 00:16:00,287 --> 00:16:04,309 of Dijkstra's Shortest Path Algorithm. But I do want to mention it here as well 260 00:16:04,309 --> 00:16:08,229 just to reiterate the point of how careful use of data structures can speed 261 00:16:08,229 --> 00:16:11,220 up algorithms. especially when you're doing things like 262 00:16:11,220 --> 00:16:15,198 minimum computations in an inner loop. So, Dijkstra's Shortest Path Algorithm 263 00:16:15,198 --> 00:16:18,252 hopefully man of you have watched that video at this point. 264 00:16:18,252 --> 00:16:22,289 But basically what it does is, it has a central wild loop and so it operates once 265 00:16:22,289 --> 00:16:25,601 per vertex of the graph. And, at least naively, it seems like what 266 00:16:25,601 --> 00:16:29,794 each iteration of the while loop does is it exhaustive search to the edges of the 267 00:16:29,794 --> 00:16:33,221 graph computing a minimum. So, if we think about the work performed 268 00:16:33,221 --> 00:16:37,072 in this naive implementation, it's exactly in the wheel house of a heap, 269 00:16:37,072 --> 00:16:39,414 right? So what we do in each of these loop 270 00:16:39,414 --> 00:16:42,432 iterations is do an exhaustive search computing a minimum. 271 00:16:42,432 --> 00:16:44,826 You see remeded, repeated minimum computations. 272 00:16:44,826 --> 00:16:48,624 A light bulb should go off and you should think maybe a heap can help. 273 00:16:48,624 --> 00:16:50,810 And a heap can help in Dijkstra's algorithm. 274 00:16:50,810 --> 00:16:54,661 The details are a bit subtle, and they're discussed in this separate video. 275 00:16:54,661 --> 00:16:58,304 But the upshot is, we get a tremendous improvement in the running time. 276 00:16:58,304 --> 00:17:01,010 So, we're calling that m denotes the number of edges, 277 00:17:01,010 --> 00:17:04,870 and then denotes the number of vertices of a graph. 278 00:17:04,870 --> 00:17:10,300 With the careful deployment of heaps in Dijkstra's algorithm, the run time drops 279 00:17:10,300 --> 00:17:15,799 from this really rather large polynomial, the product of a number of vertices and 280 00:17:15,799 --> 00:17:20,280 the number of edges, down to something which is almost linear time. 281 00:17:22,320 --> 00:17:25,920 Namely O of m log n, where m is the number of edges and n is 282 00:17:25,920 --> 00:17:29,700 the number of vertices. So the linear time here would be O of m. 283 00:17:29,700 --> 00:17:33,480 Linear, the number of edges, were picking up an extra log factor. 284 00:17:33,480 --> 00:17:36,300 But still, this is basically as good as sorting. 285 00:17:36,300 --> 00:17:39,660 So this is a fantastically fast shortest-path algorithm. 286 00:17:39,660 --> 00:17:44,340 Certainly way, way better than what you get if you don't use heaps, and just do 287 00:17:44,340 --> 00:17:46,980 repeated exhausted searches for the minimum. 288 00:17:46,980 --> 00:17:50,819 So that wraps up our discussion of what I think you really want to know about 289 00:17:50,819 --> 00:17:53,084 heaps. Namely, what are the key operations that 290 00:17:53,084 --> 00:17:55,742 is supports. What is the running time you could expect 291 00:17:55,742 --> 00:17:58,794 from those operations. What are the types of problems that the 292 00:17:58,794 --> 00:18:02,191 data structures will yield speed ups for, and a suite of applications. 293 00:18:02,191 --> 00:18:06,228 For those of you who want to take it to the next level and see a little bit about 294 00:18:06,228 --> 00:18:10,116 the guts of the implementation, there is a separate optional video that talks a 295 00:18:10,116 --> 00:18:11,200 little bit about that.