1
00:00:00,000 --> 00:00:04,024
So in this video, we'll start talking
about the heap data structure. So in this

2
00:00:04,024 --> 00:00:08,060
video I want to be very clear on what are
the operations supported by heap, what

3
00:00:08,060 --> 00:00:12,090
running time guarantees you can expect
from [inaudible] limitations and I want

4
00:00:12,090 --> 00:00:17,015
you to get a feel for what kinds of
problems they're useful for. In a separate

5
00:00:17,015 --> 00:00:21,035
video, we'll take a peek under the hood
and talk a little bit about how heaps

6
00:00:21,035 --> 00:00:25,059
actually get implemented. But for now,
let's just focus on how to use them as a

7
00:00:25,059 --> 00:00:30,082
client. So the number one thing you should
remember about a given data structure is

8
00:00:30,082 --> 00:00:35,043
what operations it supports, and what is
the running time you can expect from those

9
00:00:35,043 --> 00:00:39,075
operations. So basically, a heap supports
two operations. There's some bells and

10
00:00:39,075 --> 00:00:43,097
whistles you can throw on. But the two
things you gotta now is insertion and

11
00:00:43,097 --> 00:00:48,031
extract min. And so the first thing I have
to say about a heap is that it's a

12
00:00:48,031 --> 00:00:53,005
container for a bunch of objects. And each
of these objects should have a key, like a

13
00:00:53,005 --> 00:00:57,074
number so that for any given objects you
can compare their keys and say one key is

14
00:00:57,074 --> 00:01:02,049
bigger than the other key. So for example,
maybe the objects are employee records and

15
00:01:02,049 --> 00:01:07,023
the key is social security numbers, maybe
the objects are the edges of a network and

16
00:01:07,023 --> 00:01:11,087
the keys are something like the length or
the weight of an edge, maybe each object

17
00:01:11,087 --> 00:01:16,071
indicates an event and the key is the time
at which that event is meant to occur. Now

18
00:01:16,071 --> 00:01:22,013
the number one thing you should remember
about a given data structure is, first of

19
00:01:22,013 --> 00:01:27,008
all what are the operations that it
supports? And second of all, what is the

20
00:01:27,008 --> 00:01:31,091
running time you can expect from those
operations? For a heap, essentially

21
00:01:31,091 --> 00:01:37,019
there's two basic operations. Insert and
extract the object that has the minimum

22
00:01:37,019 --> 00:01:42,063
key value. So in our discussion of heaps,
we're going to allow ties that are pretty

23
00:01:42,063 --> 00:01:46,095
much equal to easy with or without ties.
So, when you extract men from a heap they

24
00:01:46,095 --> 00:01:51,042
may have duplicate key values then there's
no specification about which one you get.

25
00:01:51,042 --> 00:01:55,085
You just get one of the objects that has a
tie for the minimum key value. Now, of

26
00:01:55,085 --> 00:02:00,018
course, there's no special reason that I
chose to extract the minimum rather than

27
00:02:00,018 --> 00:02:04,045
the maximum. You either you can have a
second notion of a heap, which is a max

28
00:02:04,045 --> 00:02:08,078
heap, which always returns the object of
the maximum key value. Or if all you have

29
00:02:08,078 --> 00:02:13,022
at your disposal is one of these extract
min-type heaps, you can just, negate the

30
00:02:13,022 --> 00:02:17,044
sign of all of the key values before you
insert them, And then extract min will

31
00:02:17,044 --> 00:02:21,098
actually extract, the max key value. So,
just to be clear, I'm not proposing here a

32
00:02:21,098 --> 00:02:25,083
data structure that supports
simultaneously an extract-min operation

33
00:02:25,083 --> 00:02:30,006
and an extract-max operation. If you
wanted both of those operations, there'd

34
00:02:30,006 --> 00:02:34,063
be data structures that would give it to
you; probably a binary search tree is the

35
00:02:34,063 --> 00:02:38,093
first thing you'd want to consider. So,
I'm just saying, you can have a heap of

36
00:02:38,093 --> 00:02:43,022
one off two flavors. Either the heap
supports extract-min and not extract-max

37
00:02:43,022 --> 00:02:48,081
or the heap will support extract-max and
not extract-min. So I mentioned that you

38
00:02:48,081 --> 00:02:53,041
should remember not just the supported
operations of a data structure, but what

39
00:02:53,041 --> 00:02:57,049
is the running time of those operations.
Now, for the heap, the way it's

40
00:02:57,049 --> 00:03:02,021
canonically implemented, the running time
you should expect is logarithmic in the

41
00:03:02,021 --> 00:03:06,081
number of items in the heap. And its log
base two, with quite good constants. So

42
00:03:06,081 --> 00:03:11,041
when you think about heaps, you should
absolutely remember these two operations.

43
00:03:11,041 --> 00:03:15,090
Optionally, there's a couple other things
about heaps that are, might be worth

44
00:03:15,090 --> 00:03:21,018
remembering Some additional operations
that they can support. So the first is an

45
00:03:21,018 --> 00:03:25,069
operation called heapify. Like a lot of
the other stuff about heaps, it has a few

46
00:03:25,069 --> 00:03:30,014
other names as well. But I'm going to call
it heapify, one standard name. And the

47
00:03:30,014 --> 00:03:34,070
point of heapify is to initialize a heap
in linear time. Now, if you have N things

48
00:03:34,070 --> 00:03:39,009
and you want to put them all in a heap,
obviously you could just invoke insert

49
00:03:39,009 --> 00:03:43,031
once per each object. If you have N
objects, it seems like that would take N

50
00:03:43,031 --> 00:03:47,076
times log N time, log N for each of the N
inserts. But there's a slick way to do

51
00:03:47,076 --> 00:03:52,009
them in a batch, which takes only linear
time. So tha t's the heapify operation,

52
00:03:52,043 --> 00:03:56,043
And another operation which can be
implemented, although there are some

53
00:03:56,043 --> 00:04:00,088
subtleties. Is you can delete not just the
minimum, but you can delete an ar-,

54
00:04:00,088 --> 00:04:04,093
arbitrary element from the middle of a
heap, again, in logarithmic time. I

55
00:04:04,093 --> 00:04:09,060
mention this here primarily cuz we're
gonna use this operation when we use heaps

56
00:04:09,060 --> 00:04:14,027
to speed up Dijkstra's Algorithm. So that's
the gist of a heap. You maintain objects

57
00:04:14,027 --> 00:04:18,066
that have keys you can insert in
logarithmic time, and you can find the one

58
00:04:18,066 --> 00:04:23,039
with the minimum key in logarithmic time.
So let's turn to applications, I'll give

59
00:04:23,039 --> 00:04:28,013
you several. But before I dive into any
one application let me just say; what's

60
00:04:28,013 --> 00:04:32,046
the general principle? What should
[inaudible] you to think that maybe you

61
00:04:32,046 --> 00:04:37,025
want to use a heap data structure in some
task? So the most common reason to use a

62
00:04:37,025 --> 00:04:41,087
heap is if you notice that your program is
doing repeated minimum computations.

63
00:04:41,087 --> 00:04:46,063
Especially via exhaustive search, Most of
the applications that we go through will

64
00:04:46,063 --> 00:04:51,024
have this flavor. It will be, there will
be a naive program which does a bunch of

65
00:04:51,024 --> 00:04:55,090
repeated minimums using just brute force
search and we'll see that a very simple

66
00:04:55,090 --> 00:05:00,045
application of a heap will allow us to
speed it up tremendously. So let's start

67
00:05:00,045 --> 00:05:04,043
by returning to the mother of all
computational problems, sorting and

68
00:05:04,043 --> 00:05:09,074
unsorted array. Now, a sorting algorithm
which is sort of so obvious and suboptimal

69
00:05:09,074 --> 00:05:14,082
that I didn't even really bother to talk
about it at any other point in the course

70
00:05:14,082 --> 00:05:18,084
is selection-sort. What do you do? In
selection sort, you do a scan through the

71
00:05:18,084 --> 00:05:22,031
unsorted array. You find the minimum
element; you put that in the first

72
00:05:22,031 --> 00:05:26,031
position. You scan through the other N-1
elements; you find the minimum among them.

73
00:05:26,031 --> 00:05:30,027
You put that in the second position. You
scan through the remaining N-2 unsorted

74
00:05:30,027 --> 00:05:34,013
elements. You find the minimum; you put
that in the third position, and so on. So

75
00:05:34,013 --> 00:05:37,088
evidently, this [inaudible] sorting
algorithm does a linear number of linear

76
00:05:37,088 --> 00:05:41,055
scans through this array. So this is
definitely a quadratic time algorithm.

77
00:05:41,055 --> 00:05:45,065
That's why I didn't bother to tell you
about it earlier. So this certainly fits

78
00:05:45,065 --> 00:05:49,076
the bill as being a bunch of repeated
minimum computations. Or for each

79
00:05:49,076 --> 00:05:54,000
computation, we're doing exhaustive
search. So this, we should just, a light

80
00:05:54,000 --> 00:05:58,064
bulb should go off, and say, aha! Can we
do better using a heap data structure? And

81
00:05:58,064 --> 00:06:03,038
we can, and the sorting algorithm that we
get is called heap sort. And given a heap

82
00:06:03,038 --> 00:06:08,014
data structure, this sorting algorithm is
totally trivial. We just insert all of the

83
00:06:08,014 --> 00:06:12,056
elements from the array into the heap.
Then we extract the minimum one by one.

84
00:06:12,056 --> 00:06:16,081
From the first extraction, we get the
minimum of all N elements. The second

85
00:06:16,081 --> 00:06:21,041
extraction gives us the minimum of the
remaining N-1 elements, and so on. So when

86
00:06:21,041 --> 00:06:25,071
we extract min one by one, we can just
populate a sorted array from left to

87
00:06:25,071 --> 00:06:31,055
right. Boom, we're done. What is the
running time of heap sort? Well, we insert

88
00:06:31,055 --> 00:06:36,020
each element once and we extract each
element once so that's 2n heap operations

89
00:06:36,020 --> 00:06:40,091
and what I promised you is that you can
count on heaps being implemented so that

90
00:06:40,091 --> 00:06:45,062
every operation takes logarithmic time. So
we have a linear number of logarithmic

91
00:06:45,062 --> 00:06:49,097
time operations for running time of n log
n. So let's take a step back and

92
00:06:49,097 --> 00:06:54,019
appreciate what just happened. We took the
least imaginative sorting algorithm

93
00:06:54,019 --> 00:06:58,004
possible. Selection sort, which is
evidently quadratic Time. We recognize

94
00:06:58,004 --> 00:07:01,088
the pattern of repeated minimum
computations. We swapped in the Heap Data

95
00:07:01,088 --> 00:07:06,016
structure and boom we get an NlogN sorting
algorithm, which is just two trivial

96
00:07:06,016 --> 00:07:10,044
lines. And remember, N log N is a pretty
good running time for a sorting algorithm.

97
00:07:10,044 --> 00:07:14,056
This is exactly the running time we had
for merge sort; this was exactly the

98
00:07:14,056 --> 00:07:18,073
average running time we got for randomized
quick sort. Moreover, Heap Sort is a

99
00:07:18,073 --> 00:07:23,075
comparison based sorting algorithm. We
don't use any data about the key elements

100
00:07:23,075 --> 00:07:28,061
we just used as a totally [inaudible] set.
And, as some of you may have seen in an

101
00:07:28,061 --> 00:07:33,029
optional video, there does not exist a
comparison-based sorting algorithm with

102
00:07:33,029 --> 00:07:38,027
running time better than N log N. So for
the question, can we do better? The answer

103
00:07:38,027 --> 00:07:43,023
is no, if we use a comparison based
sorting algorithm like heap sort. So

104
00:07:43,023 --> 00:07:48,023
that's pretty amazing, all we do is swap
in a Heap and a running time drops from really quite unsatisfactory quadratic

105
00:07:48,023 --> 00:07:53,029
to the optimal N log N. Moreover, HeapSort is a pretty practical sorting algorithm:

106
00:07:53,029 --> 00:07:58,005
when you run this it's gonna go really fast. Is it as good as quick

107
00:07:58,005 --> 00:08:02,094
sort? Hm, maybe not quite but its close
it's getting into the same [inaudible]. So

108
00:08:02,094 --> 00:08:07,082
let's talk of another application which frankly in some sense is almost trivial

109
00:08:08,000 --> 00:08:13,020
but this is also a canonical way in which heaps are used. And in this application it

110
00:08:13,020 --> 00:08:18,022
will be natural to call a heap by a
synonymous name, a priority queue. So what

111
00:08:18,022 --> 00:08:22,085
I want you to think about for this example
is that you've been tasked with writing

112
00:08:22,085 --> 00:08:27,048
software that performs a simulation of the
physical world. So you might pretend, for

113
00:08:27,048 --> 00:08:32,008
example, that you're helping write a video
game which is for basketball. Now why

114
00:08:32,008 --> 00:08:36,020
would a heap come up in a simulation
context? Well, the objects in this

115
00:08:36,020 --> 00:08:40,086
application are going to be events
records. So an event might be for example

116
00:08:40,086 --> 00:08:45,063
that the ball will reach the hoop at a
particular time and that would be because

117
00:08:45,063 --> 00:08:50,070
a player shot it a couple of seconds ago.
When if for example the ball hits the rim,

118
00:08:50,070 --> 00:08:55,029
that could trigger another event to be
scheduled for the near future which is

119
00:08:55,029 --> 00:09:00,019
that a couple players are going to vie for
the rebound. That event in turn could

120
00:09:00,019 --> 00:09:04,052
trigger the scheduling of another event,
which is one of these players? commits, an

121
00:09:04,052 --> 00:09:08,075
over the back foul on the other one and
knocks them to the ground. That in turn

122
00:09:08,075 --> 00:09:13,024
could trigger another event which is the
player that got knocked on the ground gets

123
00:09:13,024 --> 00:09:17,062
up and argues that a foul call, and so on.
So when you have event records like this,

124
00:09:17,062 --> 00:09:21,079
there's a very natural key, which is just
the timestamp, the time at which this

125
00:09:21,079 --> 00:09:26,050
event in the future is scheduled to occur.
Now clearly a problem which has to get

126
00:09:26,050 --> 00:09:30,034
solved over and over and over again in
this kind of simulation is you have to

127
00:09:30,034 --> 00:09:34,014
figure out what's the next event that's
going to occur. You have to know what

128
00:09:34,014 --> 00:09:37,094
other events to schedule; you have to
update the screen and so on. So that's a

129
00:09:37,094 --> 00:09:42,013
minimum computation. So a very silly thing
you could do is just maintain an unordered

130
00:09:42,013 --> 00:09:45,092
list of all of the events that have ever
been scheduled and do a linear path

131
00:09:45,092 --> 00:09:49,092
through them and compute the minimum. But
you're gonna be computing minimums over

132
00:09:49,092 --> 00:09:53,081
and over and over again, so again that
light bulb should go on. And you could say

133
00:09:53,096 --> 00:09:57,089
maybe a heap is just what I need for this
problem. And indeed it is. So, if you're

134
00:09:57,089 --> 00:10:01,090
storing these event records in a heap.
With the key being the time stamps then

135
00:10:01,090 --> 00:10:06,053
when you extract them in the hands for you
on a silver platter using logarithmic time

136
00:10:06,053 --> 00:10:11,047
exactly which algorithm is going to occur
next. So let's move on to a less obvious

137
00:10:11,047 --> 00:10:16,096
application of heaps, which is a problem
I'm going to call median maintenance. The

138
00:10:16,096 --> 00:10:21,052
way this is gonna work is that you and I
are gonna play a little game. So on my

139
00:10:21,052 --> 00:10:26,031
side, what I'm going to do is I'm going to
pass you index cards, one at a time, where

140
00:10:26,031 --> 00:10:30,077
there's a number written on each index
card. Your responsibility is to tell me at

141
00:10:30,077 --> 00:10:34,077
each time step the median of the number
that I've passed you so far. So, after

142
00:10:34,077 --> 00:10:39,002
I've given you the first eleven numbers
you should tell me as quickly as possible

143
00:10:39,002 --> 00:10:43,017
the sixth smallest after I've given you
thirteen numbers you should tell me the

144
00:10:43,017 --> 00:10:47,027
seventh smallest and so on. Moreover, we
know how to compute the median in linear

145
00:10:47,027 --> 00:10:51,042
time but the last thing I want is for you
to be doing a linear time computation

146
00:10:51,042 --> 00:10:55,057
every single time step. [inaudible] I only
give you one new number? Do you really

147
00:10:55,057 --> 00:10:59,072
have to do linear time just to re-compute
the median? If I just gave you one new

148
00:10:59,072 --> 00:11:03,074
number. So to make sure that you don't run
a linear time selection algorithm every

149
00:11:03,074 --> 00:11:07,083
time I give you one new number, I'm going
to put a budget on the amount of time that

150
00:11:07,083 --> 00:11:11,039
you can use on each time step to tell me
the median. And it's going to be

151
00:11:11,039 --> 00:11:15,038
logarithmic in the number of numbers I've
passed you so far. So I encourage you to

152
00:11:15,038 --> 00:11:19,023
pause the video at this point and spend
some time thinking about how you would

153
00:11:19,023 --> 00:11:26,011
solve this problem. Alright, so hopefully
you've thoug ht about this problem a

154
00:11:26,011 --> 00:11:30,033
little bit. So let me give you a hint.
What if you use two heaps, do you see a

155
00:11:30,033 --> 00:11:36,078
good way to solve this problem then.
Alright, so let me show you a solution to

156
00:11:36,078 --> 00:11:42,071
this problem that makes use of two heaps.
The first heap we'll call H low. This

157
00:11:42,071 --> 00:11:47,027
equal supports extract max. Remember we
discussed that a heap, you could pick

158
00:11:47,027 --> 00:11:52,020
whether it supports extract min or extract
max. You don't get both, but you can get

159
00:11:52,020 --> 00:11:56,070
either one, it doesn't matter. And then
we'll have another heap H high which

160
00:11:56,070 --> 00:12:01,026
supports extract min. And the key idea is
to maintain the invariant that the

161
00:12:01,026 --> 00:12:06,013
smallest half of the numbers that you've
seen so far are all in the low heap. And

162
00:12:06,013 --> 00:12:10,069
the largest half of the numbers that
you've seen so far are all in the high

163
00:12:10,069 --> 00:12:14,083
heap. So, for example, after you've seen
the first ten elements, the smallest five

164
00:12:14,083 --> 00:12:18,093
of them should reside in H low, and the
biggest five of them should reside in H

165
00:12:18,093 --> 00:12:22,087
high. After you've seen twenty elements
then the bottom ten, the smallest ten,

166
00:12:22,087 --> 00:12:26,087
should, should reside in H low, and the
largest ten should reside in H high. If

167
00:12:26,087 --> 00:12:31,012
you've seen an odd number, either one can
be bigger, it doesn't matter. So if you

168
00:12:31,012 --> 00:12:35,033
have 21 you have the smallest ten in the
one and the biggest eleven in the other,

169
00:12:35,033 --> 00:12:39,069
or vice-versa. It's not, not important.
Now given this key idea of splitting the

170
00:12:39,069 --> 00:12:44,022
elements in half, according to the two
heaps. You need two realizations, which

171
00:12:44,022 --> 00:12:49,090
I'll leave for you to check. So first of
all, you have to prove you can actually

172
00:12:49,090 --> 00:12:54,089
maintain this invariant with only O of log
I work in step I. Second of all, you have

173
00:12:54,089 --> 00:12:59,068
to realize this invariant allows you to
solve the desired problem. So let me just

174
00:12:59,068 --> 00:13:04,014
quickly talk through both of these points,
and then you can think about it in more

175
00:13:04,014 --> 00:13:08,054
detail, on your own time. So let's start
with the first one. How can we maintain

176
00:13:08,054 --> 00:13:13,010
this invariant, using only log I work and
time step I, and this is a little tricky.

177
00:13:13,010 --> 00:13:17,062
So let's suppose we've already processed
the first twenty numbers, and the smallest

178
00:13:17,062 --> 00:13:22,002
ten of them we've all worked hard to, to
put only in H low. And the biggest ten of

179
00:13:22,002 --> 00:13:25,082
th ''em we've worked hard to put only in H
high. Now, here's a preliminary

180
00:13:25,082 --> 00:13:30,011
observation. What's true, so what do we
know about the maximum element in h low?

181
00:13:30,011 --> 00:13:34,080
Well these are the tenth smallest overall
and the maximum then is the biggest of the

182
00:13:34,080 --> 00:13:38,093
tenth smallest. So that would be a tenth
order statistic, so the tenth order

183
00:13:38,093 --> 00:13:43,023
overall. Now what about in the, the hi
key? What s its minimum value? Well those

184
00:13:43,023 --> 00:13:47,086
are the biggest ten values. So the minimum
of, of the ten biggest values would be the

185
00:13:47,086 --> 00:13:52,077
eleventh order statistic. Okay, so the
maximum of H low is the tenth order

186
00:13:52,077 --> 00:13:57,022
statistic. The minimum of H high Is the
[inaudible] statistic, they're right next

187
00:13:57,022 --> 00:14:02,017
to each other; these are in fact the two
medians Right now So When this new element

188
00:14:02,017 --> 00:14:05,087
comes in, the twenty-first element comes
in, we need to know which heap to insert

189
00:14:05,087 --> 00:14:09,065
it into and well it just, if it's smaller
than the tenth order statistic then it's

190
00:14:09,065 --> 00:14:13,053
still gonna be in the bottom, then it's in
the bottom half of the elements and needs

191
00:14:13,053 --> 00:14:17,018
to go in the low heap. If it's bigger than
the eleventh order statistic, if it's

192
00:14:17,018 --> 00:14:21,015
bigger than the minimum value of the high
heap then that's where it belongs, in the

193
00:14:21,015 --> 00:14:25,024
high heap. If it's wedged in between the
tenth and eleventh order of statistics, it

194
00:14:25,024 --> 00:14:29,017
doesn't matter. We can put it in either
one. This is the new median anyways. Now,

195
00:14:29,017 --> 00:14:33,030
we're not done yet with this first point,
because there's a problem with potential

196
00:14:33,030 --> 00:14:36,078
imbalance. So imagine that the
twenty-first element comes up and it's

197
00:14:36,078 --> 00:14:40,091
less than the maximum of the low heap, so
we stick it in the low heap and now that

198
00:14:40,091 --> 00:14:44,078
has a population of eleven. And now
imagine the twenty-second number comes up

199
00:14:44,078 --> 00:14:48,087
and that again is less than the maximum
element in the low heap, so again we have

200
00:14:48,087 --> 00:14:52,089
to insert it in the low heap. Now we have
twelve elements in the low heap, but we

201
00:14:52,089 --> 00:14:56,082
only have ten in the right heap. So we
don't have a 50. 50, 50 split of the

202
00:14:56,082 --> 00:15:02,020
numbers but we could easily re-balance we
just extract the max from the low heap and

203
00:15:02,020 --> 00:15:06,093
we insert it into the high heap. And boom.
Now they both have eleven, and the low

204
00:15:06,093 --> 00:15:11,013
heap has the smallest el even, and the
high heap has the biggest eleven. So

205
00:15:11,013 --> 00:15:15,096
that's how you maintain, the invariant
that you have this 50/50 split in terms of

206
00:15:15,096 --> 00:15:20,019
the small and the high, and between the
two heaps. You check Where it lies with

207
00:15:20,019 --> 00:15:23,081
respect to the max of the low heap and the
mid of the high heap. You put it in the

208
00:15:23,081 --> 00:15:27,016
appropriate place. And whenever you need
to do some re-balancing, you do some

209
00:15:27,016 --> 00:15:30,052
re-balancing. Now, this uses only a
constant number of heap operations when a

210
00:15:30,052 --> 00:15:34,064
new number shows up. So that's log I work.
So now given this discussion, it's easy to

211
00:15:34,064 --> 00:15:38,089
see the second point given that this
invariant is true at each time step. How

212
00:15:38,089 --> 00:15:43,041
do we compute the median? Well, it's going
to be either the maximum of the low heap

213
00:15:43,041 --> 00:15:47,083
and/or the minimum of the high heap
depending on whether I is even or odd. If

214
00:15:47,083 --> 00:15:52,035
it's even, both of those are medians. If I
is odd, then it's just whichever heap has

215
00:15:52,035 --> 00:15:56,067
one more element than the other one. So
the final application we'll talk about in

216
00:15:56,067 --> 00:16:00,037
detail in a different video. A video
concerned with the running time of

217
00:16:00,037 --> 00:16:04,058
Dijkstra's shortest path algorithm. But I
do wanna mention it here as well just to

218
00:16:04,058 --> 00:16:08,085
reiterate the point of how careful use of
data structures can speed up algorithms.

219
00:16:09,000 --> 00:16:13,027
Especially when you're doing things like
minimum computations in an inner loop. So

220
00:16:13,027 --> 00:16:17,010
Dijkstra's shortest path algorithm,
hopefully, many of you have watched that

221
00:16:17,010 --> 00:16:21,019
video at this point. But basically, what
it does is it has a central wild loop. And

222
00:16:21,019 --> 00:16:25,017
so it operates once per vertex of the
graph. And at least naively, it seems like

223
00:16:25,017 --> 00:16:29,041
what each iteration of the wild loop does
is an exhaustive search through the edges

224
00:16:29,041 --> 00:16:33,061
of the graph, computing a minimum. So if
we think about the work performed in this

225
00:16:33,061 --> 00:16:38,016
naive implementation, it's exactly in the
wheel-house of a heap, right. So what we

226
00:16:38,016 --> 00:16:42,055
do in each of these loop iterations is do
an exhaustive search computing a minimum.

227
00:16:42,055 --> 00:16:46,078
You see repeated minimum computations, a
light bulb should go off and you should

228
00:16:46,078 --> 00:16:50,096
think maybe a heap can help. And a heap
can help in Dijkstra's algorithm. The

229
00:16:50,096 --> 00:16:54,093
details are a bit subtle, and they're
discussed i n a separate video, but the

230
00:16:54,093 --> 00:16:59,037
upshot is, we get a tremendous improvement
in the running time. So we're calling that

231
00:16:59,037 --> 00:17:04,033
M denotes the number of edges. And N
denotes the number of vertices of a graph.

232
00:17:04,033 --> 00:17:09,074
With a careful deployment of heaps in
Dijkstra's algorithm, the run time drops

233
00:17:09,074 --> 00:17:15,057
from this really rather large polynomial.
The product of the number of vertices and

234
00:17:15,057 --> 00:17:22,098
the number of edges. Down to something
which is almost linear time. Anyway, o of

235
00:17:22,098 --> 00:17:27,053
m log n. Where m is the number of edges
and n is the number of vertices. So the

236
00:17:27,053 --> 00:17:32,021
linear time here would be o of m. The
liner of the number of edges we're picking

237
00:17:32,021 --> 00:17:37,005
up an extra log factor but still this is
basically as good as sorting. So this is a

238
00:17:37,005 --> 00:17:41,049
fantastically fast shortest path
algorithm. Certainly, way, way better that

239
00:17:41,049 --> 00:17:46,046
what you get if you don't use heaps and do
just repeated exhaustive searches for the

240
00:17:46,046 --> 00:17:50,054
minimum. So that, that's wraps up our
discussion of what I think you really want

241
00:17:50,054 --> 00:17:54,045
to know about heaps. Namely, what are the
key operations that it supports? What is

242
00:17:54,045 --> 00:17:58,060
the running time you can expect from those
operations? What are the types of problems

243
00:17:58,060 --> 00:18:02,056
that the data structure will yield speed
ups for? And a suite of applications. For

244
00:18:02,056 --> 00:18:06,061
those of you that want to take it to the
next level and see a little bit about the

245
00:18:06,061 --> 00:18:10,062
guts of the implementation, there is a
separate optional video that talks a bit

246
00:18:10,062 --> 00:18:11,016
about that.