1
00:00:00,000 --> 00:00:03,642
So in this video, we'll start talking 
about the heap data structure. 

2
00:00:03,642 --> 00:00:07,230
So, in this video, I want to be very 
clear on what are the operations 

3
00:00:07,230 --> 00:00:11,525
supported by a heap, what running time 
guarantees you can expect from canonical 

4
00:00:11,525 --> 00:00:15,874
implementations, and I want you to get a 
feel for what kinds of problems they're 

5
00:00:15,874 --> 00:00:18,537
useful for. 
In a separate video, we'll take a peek 

6
00:00:18,537 --> 00:00:22,778
under the hood and talk a little bit 
about how heaps actually get implemented. 

7
00:00:22,778 --> 00:00:26,040
But for now, let's just focus on how to 
use them as a client. 

8
00:00:27,060 --> 00:00:31,105
So, the number one thing you should 
remember about a given data structure is 

9
00:00:31,105 --> 00:00:35,204
what operations it supports, and what is 
the running time you can expect from 

10
00:00:35,204 --> 00:00:37,866
those operations. 
So, basically, a heap supports two 

11
00:00:37,866 --> 00:00:41,752
operations. There's some bells and 
whistles that you can throw on, but the 

12
00:00:41,752 --> 00:00:44,680
two things you got to know is insertion 
and extract min. 

13
00:00:44,680 --> 00:00:47,449
Alright. 
So, the first thing I have to say about a 

14
00:00:47,449 --> 00:00:50,557
heap is that's it's a container for a 
bunch of objects. 

15
00:00:50,557 --> 00:00:53,044
And each of these objects should have a 
key, 

16
00:00:53,044 --> 00:00:55,983
like a number. 
So that for any given objects, you can 

17
00:00:55,983 --> 00:00:59,600
compare their keys and say one key is 
bigger than the other key. 

18
00:00:59,600 --> 00:01:04,009
So, for example, maybe the objects are 
employee records and the key is social 

19
00:01:04,009 --> 00:01:07,061
security numbers. 
Maybe the objects are the edges of a 

20
00:01:07,061 --> 00:01:11,413
network and the keys are something like 
the length or the weight of the edge. 

21
00:01:11,413 --> 00:01:15,991
Maybe each object indicates an event and 
the key is the time that event is meant 

22
00:01:15,991 --> 00:01:18,892
to occur. 
Now, the number one thing you should 

23
00:01:18,892 --> 00:01:24,076
remember about a given data structure is 
first of all, what are the operations 

24
00:01:24,076 --> 00:01:29,393
that it supports, and second of all, what 
is the, running time you can expect from 

25
00:01:29,393 --> 00:01:33,114
those operations. 
For heap, essentially, there's two basic 

26
00:01:33,114 --> 00:01:36,437
operations, 
insert and extract the object that has 

27
00:01:36,437 --> 00:01:40,774
the minimum key value. 
So in our discussion of heaps, we're 

28
00:01:40,774 --> 00:01:44,051
going to allow ties. 
They're pretty much equally easy with or 

29
00:01:44,051 --> 00:01:46,953
without ties. 
So when you extract min from a heap, they 

30
00:01:46,953 --> 00:01:50,499
may have duplicate key values. 
Then, there's no specification about 

31
00:01:50,499 --> 00:01:53,830
which one you get. 
You just get one of the objects that has 

32
00:01:53,830 --> 00:01:57,681
a tie for the minimum key value. 
Now, of course, there's no special reason 

33
00:01:57,681 --> 00:02:00,878
that I chose to extract the minimum 
rather than the maximum. 

34
00:02:00,878 --> 00:02:05,248
either you could have second notion of 
the heap, which is the max heap, which 

35
00:02:05,248 --> 00:02:09,405
always returns the object to the maximum 
key value, or if all you have it, your 

36
00:02:09,405 --> 00:02:11,962
disposal is one of these extract min type 
heaps. 

37
00:02:11,962 --> 00:02:16,439
You can just negate the sign of all of 
the key values before you insert them and 

38
00:02:16,439 --> 00:02:19,530
then extract min or actually extract the 
max key value. 

39
00:02:19,530 --> 00:02:23,661
So, just to be clear, I'm not proposing 
here a data structure that supports 

40
00:02:23,661 --> 00:02:27,513
simultaneously an extract-min operation 
and an extract-max operation. 

41
00:02:27,513 --> 00:02:31,812
If you wanted both of those operations, 
there'd be data structures that would 

42
00:02:31,812 --> 00:02:34,715
give it to you. 
Probably, a binary search tree is the 

43
00:02:34,715 --> 00:02:38,847
first thing you'd want to consider. 
So, I'm just saying, you can have a heap 

44
00:02:38,847 --> 00:02:42,364
of one of two flavors. 
Either the heap supports extract-min and 

45
00:02:42,364 --> 00:02:46,440
not extract-max, or the heap will support 
extract-max and not extract-min. 

46
00:02:47,460 --> 00:02:52,128
So, I mentioned that you should remember 
not just the supported operations of a 

47
00:02:52,128 --> 00:02:55,221
data structure, 
but what is the running time of those 

48
00:02:55,221 --> 00:02:57,614
operations? 
Now for the heap, the way it's 

49
00:02:57,614 --> 00:03:02,166
economically implemented, the running 
time you should expect is logarithmic in 

50
00:03:02,166 --> 00:03:06,135
the number of items in the heap. 
That's log base two with quite good 

51
00:03:06,135 --> 00:03:09,111
constants. 
So when you think about heaps, you should 

52
00:03:09,111 --> 00:03:13,663
absolutely remember these two operations. 
Optionally, there's a couple of other 

53
00:03:13,663 --> 00:03:16,640
things about heaps that might be worth 
remembering, 

54
00:03:16,640 --> 00:03:20,180
some additional operations that they can 
support. 

55
00:03:20,180 --> 00:03:22,677
So the first is an operation called 
heapify. 

56
00:03:22,677 --> 00:03:26,935
Like a lot of the other stuff about 
heaps, it has a few other names as well, 

57
00:03:26,935 --> 00:03:29,660
but I'm going to call it heapify. 
One standard name. 

58
00:03:29,660 --> 00:03:33,293
And the point of heapify is to initialize 
a heap in linear time. 

59
00:03:33,293 --> 00:03:37,607
Now, if you have n things and you want to 
put them all in a heap, obviously you 

60
00:03:37,607 --> 00:03:40,218
could just invoke insert once per each 
object. 

61
00:03:40,218 --> 00:03:44,249
If you have n objects, it seems like that 
would take n times login time. 

62
00:03:44,249 --> 00:03:48,336
Login for each of the n inserts. 
But there's a slick way to do them in a 

63
00:03:48,336 --> 00:03:52,500
batch, which takes only linear time. 
So that's the heapify operation. 

64
00:03:52,500 --> 00:03:56,517
And another operation which can be 
implementedn although there are some 

65
00:03:56,517 --> 00:04:00,761
subtleties is you can delete not just the 
minimum but you can delete an ar, 

66
00:04:00,761 --> 00:04:04,722
arbitrary element from the middle of a 
heap, again in algorithmic time. 

67
00:04:04,722 --> 00:04:09,136
I mention this here primarily because we 
are going to use this operation when we 

68
00:04:09,136 --> 00:04:11,400
use heaps to speed up Dijkstra's 
algorithm. 

69
00:04:11,400 --> 00:04:15,475
So that's the gist of a heap. 
You maintain objects that have keys you 

70
00:04:15,475 --> 00:04:20,201
can insert in logarithmic time, and you 
can find the one with the minimum key in 

71
00:04:20,201 --> 00:04:22,977
logarithmic time. 
So let's turn to applications. 

72
00:04:22,977 --> 00:04:26,285
I'll give you several. 
but before I dive into any one 

73
00:04:26,285 --> 00:04:29,652
application, let me just say, what's the 
general principle? 

74
00:04:29,652 --> 00:04:33,787
What should trigger you to think that 
maybe you want to use a heap data 

75
00:04:33,787 --> 00:04:37,509
structure in some task? 
So, the most common reason to use a heap 

76
00:04:37,509 --> 00:04:41,880
is if you notice that your program is 
doing repeated minimum computations, 

77
00:04:41,880 --> 00:04:46,134
especially via exhaustive search. 
And most of the applications that we go 

78
00:04:46,134 --> 00:04:49,994
through will have this flavor. 
It'll be, there'll be a naive program 

79
00:04:49,994 --> 00:04:54,027
which does a bunch of repeated minimums, 
using just brute force search. 

80
00:04:54,027 --> 00:04:58,175
And we'll see that a very simple 
application of a heap will allow us to 

81
00:04:58,175 --> 00:05:01,632
speed it up tremendously. 
So, let's start by returning to the 

82
00:05:01,632 --> 00:05:05,760
mother of all computational problems, 
sorting and unsorted array. 

83
00:05:05,760 --> 00:05:10,623
Now, a sorting algorithm which is sort of 
so obvious and suboptimal that I didn't 

84
00:05:10,623 --> 00:05:15,056
really even bother to talk about it in 
any other point in the course, is 

85
00:05:15,056 --> 00:05:18,000
selection sort. 
When you do selection sort, you do a scan 

86
00:05:18,000 --> 00:05:21,744
through the unsorted array, you find the 
minimum element, you put that in the 

87
00:05:21,744 --> 00:05:25,489
first position. You scan through the 
other n - 1 elements, you find the 

88
00:05:25,489 --> 00:05:29,233
minimum among them, you put that in the 
second position, you scan through the 

89
00:05:29,233 --> 00:05:33,224
remaining n - 2 unsorted elements, you 
find the minimum, you put that in the 

90
00:05:33,224 --> 00:05:36,673
third position, and so on. 
So evidently, this sorting algorithm does 

91
00:05:36,673 --> 00:05:39,186
a linear number of linear scans through 
this array. 

92
00:05:39,186 --> 00:05:43,078
So this is definitely a quadratic time 
algorithm that's why I didn't bother to 

93
00:05:43,078 --> 00:05:46,738
tell you about it earlier. 
So, this certainly fits the bill as being 

94
00:05:46,738 --> 00:05:51,205
a bunch of repeated minimum computations, 
or for each computation we're doing 

95
00:05:51,205 --> 00:05:54,338
exhaustive search. 
So this, we should just, a light bulb 

96
00:05:54,338 --> 00:05:57,876
should go off and say, aha. 
Can we do better using a heap data 

97
00:05:57,876 --> 00:05:59,095
structure? 
And we can. 

98
00:05:59,095 --> 00:06:02,460
And the sorting algorithm that we get is 
called heap sort. 

99
00:06:02,460 --> 00:06:06,633
And given the heap data structure, this 
sorting algorithm is totally trivial. 

100
00:06:06,633 --> 00:06:11,027
We just insert all of the elements from 
the array into the heap, then we extract 

101
00:06:11,027 --> 00:06:14,267
the minimum one by one. 
From the first extraction we get the 

102
00:06:14,267 --> 00:06:18,441
minimum of all n elements, the second 
extraction gives us the minimum of the 

103
00:06:18,441 --> 00:06:22,945
remaining n - 1 elements, and so on. 
So when we extract min one by one, we can 

104
00:06:22,945 --> 00:06:27,080
just populate a sorted array from left to 
right, boom, we're done. 

105
00:06:27,080 --> 00:06:33,010
What is the running time of heap sort? 
Well, we insert each element once and we 

106
00:06:33,010 --> 00:06:36,330
extract each element once, so that's 2n 
heap operations. 

107
00:06:36,330 --> 00:06:40,968
And what I promised you is that you can 
count on heaps being implemented so that 

108
00:06:40,968 --> 00:06:45,606
every operation takes logarithmic time. 
So we have a linear number of logarithmic 

109
00:06:45,606 --> 00:06:48,240
time operations for a running time of N 
log N. 

110
00:06:48,240 --> 00:06:51,240
So, let's a step back and appreciate what 
just happened. 

111
00:06:51,240 --> 00:06:55,556
We took the least imaginative sorting 
algorithm possible, selection sort, which 

112
00:06:55,556 --> 00:06:59,208
is evidently quadratic time. 
We recognized the pattern of repeated 

113
00:06:59,208 --> 00:07:02,694
minimum computations. 
We swapped in the heap data structure and 

114
00:07:02,694 --> 00:07:06,401
boom, we get an N log N sorting 
algorithm, which is just two trivial 

115
00:07:06,401 --> 00:07:08,836
lines. 
And remember, N log N is a pretty good 

116
00:07:08,836 --> 00:07:13,096
running time for a sorting algorithm. 
This is exactly the running time we had 

117
00:07:13,096 --> 00:07:16,195
for mergesort. 
This was exactly the average running time 

118
00:07:16,195 --> 00:07:20,234
we got from randomized quicksort. 
Moreover, heapsort is a comparison-based 

119
00:07:20,234 --> 00:07:23,262
sorting algorithm. 
We don't use any data about the key 

120
00:07:23,262 --> 00:07:26,278
elements, we just used it as a totally 
ordered set. 

121
00:07:26,278 --> 00:07:30,924
And as some of you may have seen in an 
optional video, there does not exist a 

122
00:07:30,924 --> 00:07:35,327
comparison-based sorting algorithm with 
running time better than N log N. 

123
00:07:35,327 --> 00:07:39,590
So for the question can we do better, 
the answer is no if we use a 

124
00:07:39,590 --> 00:07:43,090
comparison-based sorting algorithm like 
heap sort. 

125
00:07:43,090 --> 00:07:47,466
So, that's pretty amazing. All we do is 
swap in a heap and the right time drops 

126
00:07:47,466 --> 00:07:51,618
from the really quite unsatisfactory 
quadratic to the optimal N log N. 

127
00:07:51,618 --> 00:07:55,882
Moreover, heapsort's a pretty practical 
sorting algorithm. You run this, it's 

128
00:07:55,882 --> 00:07:58,744
going to go really fast. 
Is it as good as Quicksort? 

129
00:07:58,744 --> 00:08:02,783
Mm-hm, maybe not quite. But it's close, 
it's getting into the same ballpark. 

130
00:08:02,783 --> 00:08:06,935
So, let's look at another application 
which, which frankly, in some sense, is 

131
00:08:06,935 --> 00:08:10,021
almost trivial. 
but this is also a connotical way in 

132
00:08:10,021 --> 00:08:13,821
which heaps are used. 
And in this application, it will be 

133
00:08:13,821 --> 00:08:17,870
natural to call a heap by a synonymous 
name, a priority queue. 

134
00:08:17,870 --> 00:08:22,169
So, what I want you to think about for 
this example is that, you've been tasked 

135
00:08:22,169 --> 00:08:26,082
with writing software that performs a 
simulation of the physical world. 

136
00:08:26,082 --> 00:08:30,657
So you might pretend, for example that 
you're helping writing a video game which 

137
00:08:30,657 --> 00:08:33,680
is for basketball. 
Now, why would a heap come up in a 

138
00:08:33,680 --> 00:08:37,262
simulation context? 
Well, the objects in this application are 

139
00:08:37,262 --> 00:08:41,144
going to be events records. 
So, an event might be, for example, that 

140
00:08:41,144 --> 00:08:44,129
the ball will reach the hoop at a 
particular time. 

141
00:08:44,129 --> 00:08:47,951
And that would be because a player shot 
it a couple seconds ago. 

142
00:08:47,951 --> 00:08:50,638
When if, for example, the ball hits the 
rim, 

143
00:08:50,638 --> 00:08:55,295
that could trigger another event to be 
scheduled for the near future which is 

144
00:08:55,295 --> 00:08:58,900
that, a couple players are going to vie 
for the rebound. 

145
00:08:58,900 --> 00:09:03,100
That event in turn, could trigger the 
scheduling of another event, which is one 

146
00:09:03,100 --> 00:09:07,515
of these players commits an over the back 
foul on the other one and knocks him to 

147
00:09:07,515 --> 00:09:10,315
the ground. 
That in turn could trigger another event 

148
00:09:10,315 --> 00:09:14,569
which is, the player that got knocked on 
the ground, gets up and argues the foul 

149
00:09:14,569 --> 00:09:17,584
call, and so on. 
So when you have event records like this, 

150
00:09:17,584 --> 00:09:21,838
there's a very natural key, which is just 
the time stamp. The time in which this 

151
00:09:21,838 --> 00:09:24,640
event, in the future, is scheduled to 
occur. 

152
00:09:24,640 --> 00:09:28,457
Now, clearly a problem which has to get 
solved over and over and over again in 

153
00:09:28,457 --> 00:09:32,323
this kind of simulation, is you have to 
figure out what's the next that's going 

154
00:09:32,323 --> 00:09:34,623
to occur. 
You have to know what other events to 

155
00:09:34,623 --> 00:09:37,608
schedule, you have to know how to update 
the screen, and so on. 

156
00:09:37,608 --> 00:09:41,180
So, that's a minimum computation. 
So, a very silly thing you can do is just 

157
00:09:41,180 --> 00:09:45,095
maintain an unordered list of all of the 
list that have ever been scheduled, and 

158
00:09:45,095 --> 00:09:47,738
do a linear pass through them and compute 
the minimum. 

159
00:09:47,738 --> 00:09:51,261
But you're going to be computing the 
minimums over and over and over again. 

160
00:09:51,261 --> 00:09:55,078
So again, that light bulb should go on 
and you can say, maybe a heap is just 

161
00:09:55,078 --> 00:09:57,330
what I need for this problem, and indeed 
it is. 

162
00:09:57,330 --> 00:10:01,286
So if you're storing these event records 
in a heap with the key being the time 

163
00:10:01,286 --> 00:10:05,346
stamps, then when you extract min, the 
heap hands for you on a silver platter 

164
00:10:05,346 --> 00:10:09,140
using only logarithmic time exactly which 
event is going to occur next. 

165
00:10:09,140 --> 00:10:14,249
So let's move on to a less obvious 
application of heaps, which is a problem 

166
00:10:14,249 --> 00:10:19,135
I'm going to call median maintenance. 
The way that this is going to work is you 

167
00:10:19,135 --> 00:10:23,246
and I are going to play a little game. 
So, on my side, what I'm going to do is 

168
00:10:23,246 --> 00:10:27,808
I'm going to pass you index cards, one at 
a time when there's a number written on 

169
00:10:27,808 --> 00:10:30,971
each index card. 
Your responsibility is to tell me at each 

170
00:10:30,971 --> 00:10:34,579
time step the median of the numbers that 
I passed you so far. 

171
00:10:34,579 --> 00:10:38,995
So, after I've given you the first eleven 
numbers, you should tell me as quickly as 

172
00:10:38,995 --> 00:10:42,873
possible the six smallest. After I've 
given you the thirteen numbers, you 

173
00:10:42,873 --> 00:10:45,242
should tell me the seven smallest, and so 
on. 

174
00:10:45,242 --> 00:10:49,389
Moreover, we know how to compute the 
median in linear time, but the last thing 

175
00:10:49,389 --> 00:10:53,643
I want is for you to be doing a linear 
time computation every single time step. 

176
00:10:53,643 --> 00:10:57,736
But I will only give you one new number. 
Do you really have to do linear time is 

177
00:10:57,736 --> 00:11:00,860
to recompute the median if I just gave 
you one new number? 

178
00:11:00,860 --> 00:11:04,432
So to make sure that you don't run a 
linear time selection algorithm every 

179
00:11:04,432 --> 00:11:08,052
time I give you one new number, I'm going 
to put a budget on the amount of time 

180
00:11:08,052 --> 00:11:10,768
that you can use at each time step to 
tell me the median. 

181
00:11:10,768 --> 00:11:14,483
And it's going to be logarithmic in the 
number of numbers I've passed you so far. 

182
00:11:14,483 --> 00:11:18,342
So, I encourage you to pause the video at 
this point and spend some time thinking 

183
00:11:18,342 --> 00:11:23,548
about how you would solve this problem. 
Alright. 

184
00:11:23,548 --> 00:11:27,881
So hopefully, you've thought about this 
problem a little bit, so let me give you 

185
00:11:27,881 --> 00:11:30,405
a hint. 
What if you use two heaps, do you see a 

186
00:11:30,405 --> 00:11:34,875
good way to solve this problem then? 
Alright. 

187
00:11:34,875 --> 00:11:39,270
So let me show you a solution to this 
problem that makes use of two heaps. 

188
00:11:39,270 --> 00:11:44,221
So the first heap we'll call H low, this 
heap will support extract max. Remember, 

189
00:11:44,221 --> 00:11:48,796
we discussed that a heap, you could pick 
whether it supports extract min or 

190
00:11:48,796 --> 00:11:52,181
extract max. 
You don't get both but you can get either 

191
00:11:52,181 --> 00:11:56,067
one, it doesn't matter. 
And then we'll have another heap H high, 

192
00:11:56,067 --> 00:12:00,016
which supports extract min. 
And the key idea is to maintain the 

193
00:12:00,016 --> 00:12:05,030
invariant that the smallest half of the 
numbers that you've seen so far are all 

194
00:12:05,030 --> 00:12:10,044
in the low heap, and the largest half of 
the numbers that you've seen so far are 

195
00:12:10,044 --> 00:12:13,018
in the high heap. 
So, for example, after you've seen the 

196
00:12:13,018 --> 00:12:16,940
first ten elements, the smallest five of 
them should reside in H low and the 

197
00:12:16,940 --> 00:12:19,261
biggest five of them should reside in H 
high. 

198
00:12:19,261 --> 00:12:23,286
After you've seen twenty elements then 
the bottom ten, the smallest ten should, 

199
00:12:23,286 --> 00:12:26,743
should reside in H low and the largest 
ten should reside in H high. 

200
00:12:26,743 --> 00:12:30,612
If you've seen an odd number either one 
can be bigger, it doesn't matter. 

201
00:12:30,612 --> 00:12:34,688
So, if you have 21, you have the smallest 
ten and the one and the biggest eleven 

202
00:12:34,688 --> 00:12:37,320
and the other, or vice versa. 
It's not, not important. 

203
00:12:37,320 --> 00:12:42,054
Now given this key idea of splitting the 
elements in half according to the two 

204
00:12:42,054 --> 00:12:46,900
heaps, you need two realizations, which 
I'll leave for you to check. 

205
00:12:46,900 --> 00:12:51,411
So, first of all, you have to prove, you 
can eventually maintain this invariant 

206
00:12:51,411 --> 00:12:55,687
with only O of log i worked and step i. 
Second of all, you, after you realize 

207
00:12:55,687 --> 00:12:58,910
this invariant, allows you to solve the 
desired problem. 

208
00:12:58,910 --> 00:13:03,258
So let me just quickly talk through both 
of these points and then you can think 

209
00:13:03,258 --> 00:13:07,497
about it in more detail on your own time. 
So, let's start with the first one. 

210
00:13:07,497 --> 00:13:11,573
How can we maintain this invariant using 
only log i work at time step i? 

211
00:13:11,573 --> 00:13:15,106
This is a little tricky. 
So, let's suppose we've already processed 

212
00:13:15,106 --> 00:13:18,584
the first twenty numbers. 
And the smallest ten of them, we've all 

213
00:13:18,584 --> 00:13:20,813
worked hard to, to put them only in H 
low. 

214
00:13:20,813 --> 00:13:24,509
And the biggest ten of them we've worked 
hard to put only in H high. 

215
00:13:24,509 --> 00:13:28,797
Now here's a preliminary observation. 
What's true, so what do we know about the 

216
00:13:28,797 --> 00:13:32,795
maximum element in H low? Well, these are 
the tenth smallest overall and the 

217
00:13:32,795 --> 00:13:35,572
maximum then is the biggest of the tenth 
smallest. 

218
00:13:35,572 --> 00:13:39,571
So that will be the tenth order 
statistic, so the tenth smallest overall. 

219
00:13:39,571 --> 00:13:42,792
Now, what about in the high heap, what is 
its minimum value? 

220
00:13:42,792 --> 00:13:47,179
Well, those are the biggest ten values, 
so the minimum of the ten biggest values 

221
00:13:47,179 --> 00:13:51,728
would be the eleventh order statistic, 
okay? So the maximum of H low is the 

222
00:13:51,728 --> 00:13:56,534
tenth order statistic, the minimum of H 
high is the eleventh order statistic. 

223
00:13:56,534 --> 00:13:59,650
They're right next to each other. 
These are in fact the two medians right 

224
00:13:59,650 --> 00:14:03,753
now. So, when this new element comes in, 
the 21st element comes in, we need to 

225
00:14:03,753 --> 00:14:07,457
know which heap to insert it into. 
And well, if it's smaller than the tenth 

226
00:14:07,457 --> 00:14:11,560
order statistic then it's still going to 
be in the bottom, then it's in the bottom 

227
00:14:11,560 --> 00:14:14,163
half of the elements. 
It needs to go in the low heap. 

228
00:14:14,163 --> 00:14:17,816
If it's bigger than the eleventh order 
statistic, if it's bigger than the 

229
00:14:17,816 --> 00:14:21,419
minimum value of the high heap then 
that's where it belongs, in the high 

230
00:14:21,419 --> 00:14:23,692
heap. 
If it's wedged in between the tenth and 

231
00:14:23,692 --> 00:14:25,917
eleventh order statistics, 
it doesn't matter. 

232
00:14:25,917 --> 00:14:28,900
We can put it in either one, 
this is the new medium anyways. 

233
00:14:28,900 --> 00:14:32,743
Now, we're not done yet with this first 
point because there's a problem with 

234
00:14:32,743 --> 00:14:36,131
potential imbalance. 
So, imagine that the 21st element comes 

235
00:14:36,131 --> 00:14:38,659
up and it's less than the maximum of the 
low heap. 

236
00:14:38,659 --> 00:14:42,249
So we stick it in the low heap, and now 
that has a population of eleven. 

237
00:14:42,249 --> 00:14:46,800
And now imagine the 22nd number comes up, 
and that again, is less than the maximum 

238
00:14:46,800 --> 00:14:50,087
element in the low heap. 
So again, we have to insert it in the low 

239
00:14:50,087 --> 00:14:52,653
heap. 
Now, we have twelve elements in the low 

240
00:14:52,653 --> 00:14:57,280
heap, but we only have ten in the right 
heap, so we don't have a 50-50 split of 

241
00:14:57,280 --> 00:14:59,653
the numbers. 
But, we can easily rebalance. 

242
00:14:59,653 --> 00:15:04,280
We just extract the max from the low heap 
and we insert it into the high heap. 

243
00:15:04,280 --> 00:15:09,008
And boom, now they both have eleven and 
the low heap has the smallest eleven and 

244
00:15:09,008 --> 00:15:13,204
the high heap has the biggest eleven. 
So, that's how you maintain the 

245
00:15:13,204 --> 00:15:17,991
invariants that you have this 50-50 split 
in terms of the small and the high and 

246
00:15:17,991 --> 00:15:20,804
the two heaps. 
You check where it lies with respect to 

247
00:15:20,804 --> 00:15:23,593
the max of the low heap and the min in 
the, of the high heap. 

248
00:15:23,593 --> 00:15:26,800
You put it in the appropriate place. 
And whenever you need to do some 

249
00:15:26,800 --> 00:15:30,332
rebalancing, you do some rebalancing. 
And this uses only a constant number of 

250
00:15:30,332 --> 00:15:32,331
heap operations when a new number shows 
up, 

251
00:15:32,331 --> 00:15:35,120
so that's log i work. 
So, now, given this discussion, it's easy 

252
00:15:35,120 --> 00:15:38,187
to see the second point. 
Given that this invariant is true at each 

253
00:15:38,187 --> 00:15:40,000
time step, 
how do we compute the median? 

254
00:15:40,000 --> 00:15:44,200
Well, it's going to be either the maximum 
of the low heap and or the minimum of the 

255
00:15:44,200 --> 00:15:47,242
high heap depending on whether i is even 
or odd. 

256
00:15:47,242 --> 00:15:52,074
If it's even, both of those are medians, 
if i is odd than it's just whichever heap 

257
00:15:52,074 --> 00:15:56,625
has one more element than the other one. 
So the final application we'll talk about 

258
00:15:56,625 --> 00:16:00,287
in detail in a different video. 
A video concerned with the running time 

259
00:16:00,287 --> 00:16:04,309
of Dijkstra's Shortest Path Algorithm. 
But I do want to mention it here as well 

260
00:16:04,309 --> 00:16:08,229
just to reiterate the point of how 
careful use of data structures can speed 

261
00:16:08,229 --> 00:16:11,220
up algorithms. 
especially when you're doing things like 

262
00:16:11,220 --> 00:16:15,198
minimum computations in an inner loop. 
So, Dijkstra's Shortest Path Algorithm 

263
00:16:15,198 --> 00:16:18,252
hopefully man of you have watched that 
video at this point. 

264
00:16:18,252 --> 00:16:22,289
But basically what it does is, it has a 
central wild loop and so it operates once 

265
00:16:22,289 --> 00:16:25,601
per vertex of the graph. 
And, at least naively, it seems like what 

266
00:16:25,601 --> 00:16:29,794
each iteration of the while loop does is 
it exhaustive search to the edges of the 

267
00:16:29,794 --> 00:16:33,221
graph computing a minimum. 
So, if we think about the work performed 

268
00:16:33,221 --> 00:16:37,072
in this naive implementation, it's 
exactly in the wheel house of a heap, 

269
00:16:37,072 --> 00:16:39,414
right? 
So what we do in each of these loop 

270
00:16:39,414 --> 00:16:42,432
iterations is do an exhaustive search 
computing a minimum. 

271
00:16:42,432 --> 00:16:44,826
You see remeded, repeated minimum 
computations. 

272
00:16:44,826 --> 00:16:48,624
A light bulb should go off and you should 
think maybe a heap can help. 

273
00:16:48,624 --> 00:16:50,810
And a heap can help in Dijkstra's 
algorithm. 

274
00:16:50,810 --> 00:16:54,661
The details are a bit subtle, and they're 
discussed in this separate video. 

275
00:16:54,661 --> 00:16:58,304
But the upshot is, we get a tremendous 
improvement in the running time. 

276
00:16:58,304 --> 00:17:01,010
So, we're calling that m denotes the 
number of edges, 

277
00:17:01,010 --> 00:17:04,870
and then denotes the number of vertices 
of a graph. 

278
00:17:04,870 --> 00:17:10,300
With the careful deployment of heaps in 
Dijkstra's algorithm, the run time drops 

279
00:17:10,300 --> 00:17:15,799
from this really rather large polynomial, 
the product of a number of vertices and 

280
00:17:15,799 --> 00:17:20,280
the number of edges, down to something 
which is almost linear time. 

281
00:17:22,320 --> 00:17:25,920
Namely O of m log n, 
where m is the number of edges and n is 

282
00:17:25,920 --> 00:17:29,700
the number of vertices. 
So the linear time here would be O of m. 

283
00:17:29,700 --> 00:17:33,480
Linear, the number of edges, were picking 
up an extra log factor. 

284
00:17:33,480 --> 00:17:36,300
But still, this is basically as good as 
sorting. 

285
00:17:36,300 --> 00:17:39,660
So this is a fantastically fast 
shortest-path algorithm. 

286
00:17:39,660 --> 00:17:44,340
Certainly way, way better than what you 
get if you don't use heaps, and just do 

287
00:17:44,340 --> 00:17:46,980
repeated exhausted searches for the 
minimum. 

288
00:17:46,980 --> 00:17:50,819
So that wraps up our discussion of what I 
think you really want to know about 

289
00:17:50,819 --> 00:17:53,084
heaps. 
Namely, what are the key operations that 

290
00:17:53,084 --> 00:17:55,742
is supports. 
What is the running time you could expect 

291
00:17:55,742 --> 00:17:58,794
from those operations. 
What are the types of problems that the 

292
00:17:58,794 --> 00:18:02,191
data structures will yield speed ups for, 
and a suite of applications. 

293
00:18:02,191 --> 00:18:06,228
For those of you who want to take it to 
the next level and see a little bit about 

294
00:18:06,228 --> 00:18:10,116
the guts of the implementation, there is 
a separate optional video that talks a 

295
00:18:10,116 --> 00:18:11,200
little bit about that.