1
00:00:00,000 --> 00:00:04,045
The goal of this video is to provide more
details about the implementation of the

2
00:00:04,045 --> 00:00:08,085
QuickSort algorithm and, in particular,
if you're ever going to drill down on the

3
00:00:08,085 --> 00:00:13,008
key Partition subroutine, just let me
remind you what the job of the Partition

4
00:00:13,008 --> 00:00:17,036
subroutine is in the context of sorting an
array. So recall that key idea in QuickSort

5
00:00:17,036 --> 00:00:21,021
is to partition the input array around a
pivot element. So this has two steps.

6
00:00:21,021 --> 00:00:25,046
First, you somehow choose a pivot element,
and in this video, we're not going to worry

7
00:00:25,046 --> 00:00:29,046
about how you choose the pivot element.
For concreteness, you might just want to

8
00:00:29,046 --> 00:00:33,045
think about you pick the first element
in the array to serve as

9
00:00:33,045 --> 00:00:37,050
your pivot. So in this example array, the
first element happens to be 3, so we

10
00:00:37,050 --> 00:00:41,045
can choose 3 as the pivot element.
Now, there's a key rearrangement step. So

11
00:00:41,045 --> 00:00:45,070
you rearrange the array so that it has the
following properties. Any entries

12
00:00:45,070 --> 00:00:49,075
that are to the left of the pivot element
should be less than the pivot element.

13
00:00:49,075 --> 00:00:53,014
Whereas any entries, which are to the
right of the pivot element, should be

14
00:00:53,014 --> 00:00:56,075
greater than the pivot element. So, for
example, in this, version of, the second

15
00:00:56,075 --> 00:01:00,024
version of the array, we see to the left
of the 3 is the 2 and the 1.

16
00:01:00,024 --> 00:01:04,000
They're in reverse order, but that's okay.
Both the 2 and the 1 are to the left

17
00:01:04,000 --> 00:01:07,071
of the 3, and they're both less than 3.
And the five elements to the right

18
00:01:07,071 --> 00:01:11,028
of the 3, they're jumbled up, but
they're all bigger than the pivot element.

19
00:01:11,028 --> 00:01:14,090
So, this is a legitimate rearrangement
that satisfies the partitioning property.

20
00:01:14,090 --> 00:01:18,043
And, again, recall that this definitely
makes partial progress toward having a

21
00:01:18,043 --> 00:01:21,096
sorted array. The pivot element winds up
in its rightful position. It winds up

22
00:01:21,096 --> 00:01:25,085
where it's supposed to be in the final
sorted array, to the right of everything

23
00:01:25,085 --> 00:01:30,006
less than it, to the left of everything
bigger than it. Moreover, we've correctly

24
00:01:30,006 --> 00:01:34,027
bucketed the other N-1 elements to
the left and to the right of the pivot

25
00:01:34,027 --> 00:01:38,069
according to where they should wind up in
the final sorted array. So that's the job,

26
00:01:38,069 --> 00:01:42,064
that the Partition subroutine is
responsible for. Now what's cool is we'll

27
00:01:42,064 --> 00:01:46,024
be able to implement this Partition
subroutine in linear time. Even better,

28
00:01:46,024 --> 00:01:50,004
we'll be able to implement it so that all
it does, really, is swaps in the array.

29
00:01:50,004 --> 00:01:53,056
That is, it works in-place. It needs no
additional, essentially constant

30
00:01:53,056 --> 00:01:57,056
additional memory, to rearrange the array
according to those properties. And then,

31
00:01:57,056 --> 00:02:01,065
as we saw on the high-level description of
the QuickSort algorithm, what partitioning

32
00:02:01,065 --> 00:02:05,036
does is, it enables a divide-and-conquer
approach. It reduces the problem size.

33
00:02:05,036 --> 00:02:09,036
After you've partitioned the array around
the pivot, all you gotta do is recurse on

34
00:02:09,036 --> 00:02:13,021
the left side, recurse on the right side,
and you're done. So, what I owe you is

35
00:02:13,021 --> 00:02:17,054
this implementation. How do you actually
satisfy the partitioning property, stuff

36
00:02:17,054 --> 00:02:21,076
to the left of the pivot is smaller than
it, stuff to the right of the pivot is

37
00:02:21,076 --> 00:02:25,092
bigger than it, in linear time, and in-
place. Well, first, let's observe that, if

38
00:02:25,092 --> 00:02:29,081
we didn't care about the in-place
requirement, if we were happy to just

39
00:02:29,081 --> 00:02:34,020
allocate a second array and copy stuff
over, it would actually be pretty easy to

40
00:02:34,020 --> 00:02:39,077
implement a Partition subroutine in
linear time. That is, using O(N) extra

41
00:02:39,077 --> 00:02:43,082
memory, it's easy to partition around a
pivot element in O(N) time. And as

42
00:02:43,082 --> 00:02:48,007
usual, you know, probably I should be more precise
and write theta of N, are used in cases that would

43
00:02:48,007 --> 00:02:52,027
be the more accurate stronger statement,
but I'm going to be sloppy and I'm just

44
00:02:52,027 --> 00:02:56,031
going to write the weaker but still
correct statement, using Big-Oh, okay? So

45
00:02:56,031 --> 00:03:00,046
O(N) time using linear extra memory.
So how would you do this? Well let

46
00:03:00,046 --> 00:03:04,056
me just sort of illustrate by example. I
think you'll get the idea. So let's go

47
00:03:04,056 --> 00:03:12,017
back to our running example of an input
array. Well, if we're allowed to use

48
00:03:12,017 --> 00:03:16,036
linear extra space, we can just
preallocate another array of length N.

49
00:03:18,018 --> 00:03:22,017
Then we can just do a simple scan through
the input array, bucketing elements

50
00:03:22,017 --> 00:03:25,096
according to whether they are bigger than
or less than the pivot. And, so for

51
00:03:25,096 --> 00:03:29,080
example, we can fill in the additional
array both from the left and the right,

52
00:03:29,080 --> 00:03:33,090
using elements that are less than or
bigger than the pivot respectively. So for

53
00:03:33,090 --> 00:03:37,089
example we start with the 8, we know
that the 8 is bigger than the pivot,

54
00:03:37,089 --> 00:03:41,093
so you put that at the end of the output
array. Then we get to the 2. The 2 is

55
00:03:41,093 --> 00:03:45,093
less than the pivot, so that should go on
the left hand side of the output array.

56
00:03:45,093 --> 00:03:50,075
When you get to the 5, it should go on the
right-hand side, and the 1 should go on

57
00:03:50,075 --> 00:03:57,009
the left-hand side, and so on. When we
complete our scan through the input array,

58
00:03:57,009 --> 00:04:01,017
there'll be one hole left, and that's
exactly where the pivot belongs, to the

59
00:04:01,017 --> 00:04:05,045
right of everything less than it, to the
left of everything bigger than it. So,

60
00:04:05,045 --> 00:04:09,068
what's really interesting, then, is to
have an implementation of Partition, which

61
00:04:09,068 --> 00:04:13,087
is not merely linear time, but also uses
essentially no additional space. It

62
00:04:13,087 --> 00:04:17,062
doesn't re-sort to this cop-out of
pre-allocating an extra array of

63
00:04:17,062 --> 00:04:21,063
length N. So, let's turn to how that
works. First, starting at a high-level,

64
00:04:21,063 --> 00:04:27,050
then filling in the details. So I'm
gonna describe the Partition subroutine

65
00:04:27,050 --> 00:04:33,095
only for the case where the pivot is in
fact the first element. But really this is

66
00:04:33,095 --> 00:04:38,005
without loss of generality. If, instead,
you want to use some pivot from the middle

67
00:04:38,005 --> 00:04:41,074
of the array, you can just have a
preprocessing step that swaps the first

68
00:04:41,074 --> 00:04:46,003
element of the array with the given pivot,
and then run the subroutine that I'm about

69
00:04:46,003 --> 00:04:50,012
to describe, okay. So with constant time
preprocessing, the case of a general pivot

70
00:04:50,012 --> 00:04:56,089
reduces to the case of when the pivot is
the first element. So here's the high-level

71
00:04:56,089 --> 00:05:01,021
idea, and it's very cool. The idea is,
we're gonna be able to able to get

72
00:05:01,021 --> 00:05:10,022
away with just a single linear scan of the
input array. So in any given moment in

73
00:05:10,022 --> 00:05:14,016
this scan, there's just gonna be a single
for-Loop, we'll be keeping track of both

74
00:05:14,016 --> 00:05:18,019
the part of the array we've looked at so
far, and the part that we haven't looked at

75
00:05:18,019 --> 00:05:21,079
so far. So there's gonna be two groups,
what we've seen, what we haven't seen.

76
00:05:21,079 --> 00:05:25,073
Then within the group we've seen, we're
gonna have definitely split further, according

77
00:05:25,073 --> 00:05:29,057
to the elements that are less than the
pivot and those that are bigger than the

78
00:05:29,057 --> 00:05:35,006
pivot. So we're gonna leave the pivot
element just hanging out in the first

79
00:05:35,006 --> 00:05:38,078
element of the array until the very end of
the algorithm, when we correct its

80
00:05:38,078 --> 00:05:44,030
position with a swap. And at any given
snapshot of this algorithm, we will have

81
00:05:44,030 --> 00:05:50,091
some stuff that we've already looked at,
and some stuff that we haven't yet looked

82
00:05:50,091 --> 00:05:56,060
at in our linear scan. Of course, we have
no idea what's up with the elements that

83
00:05:56,060 --> 00:06:00,038
we haven't looked at yet, who knows
what they are, and whether they're bigger

84
00:06:00,038 --> 00:06:03,097
or less than the pivot. But, we're gonna
implement the algorithm, so, among the

85
00:06:03,097 --> 00:06:07,051
stuff that we've already seen, it will be
partitioned, in the sense that all

86
00:06:07,051 --> 00:06:11,029
elements less than the pivot come first,
all elements bigger than the pivot come

87
00:06:11,029 --> 00:06:15,002
last. And, as usual, we don't care about
the relative order, amongst elements less

88
00:06:15,002 --> 00:06:20,016
than the pivot, or amongst elements bigger
than the pivot. So summarizing, we do a

89
00:06:20,016 --> 00:06:25,063
single scan through the input array. And
the trick will be to maintain the

90
00:06:25,063 --> 00:06:29,083
following invariant throughout the linear
scan. But basically, everything we have

91
00:06:29,083 --> 00:06:33,093
looked at the input array is partitioned.
Everything less than the pivot comes

92
00:06:33,093 --> 00:06:37,097
before everything bigger than the pivot.
And, we wanna maintain that invariant,

93
00:06:37,097 --> 00:06:41,081
doing only constant work, and no
additional storage, with each step of our

94
00:06:41,081 --> 00:06:47,081
linear scan.
So, here's what I'm gonna do next. I'm

95
00:06:47,081 --> 00:06:51,067
gonna go through an example, and execute
the Partition subroutine on a concrete

96
00:06:51,067 --> 00:06:55,044
array, the same input array we've been
using as an example, thus far. Now, maybe

97
00:06:55,044 --> 00:06:59,030
it seems weird to give an example before
I've actually given you the algorithm,

98
00:06:59,030 --> 00:07:03,027
before I've given you the code. But, doing
it this way, I think you'll see the gist

99
00:07:03,027 --> 00:07:06,099
of what's going on in the example, and
then when I present the code, it'll be

100
00:07:06,099 --> 00:07:10,085
very clear what's going on. Whereas, if I
presented the code first, it may seem a

101
00:07:10,085 --> 00:07:14,042
little opaque when I first show you the
algorithm. So, let's start with an

102
00:07:14,042 --> 00:07:21,016
example. Throughout the example, we wanna
keep in mind the high-level picture that

103
00:07:21,016 --> 00:07:25,079
we discussed in the previous slide. The
goal is that, at any time in the Partition

104
00:07:25,079 --> 00:07:30,018
subroutine, we've got the pivot hanging
out in the first entry. Then, we've got

105
00:07:30,018 --> 00:07:34,075
stuff that we haven't looked at. So, of
course, who knows whether

106
00:07:34,075 --> 00:07:39,015
those elements are bigger than or less
than the pivot? And then, for the stuff

107
00:07:39,015 --> 00:07:43,060
we've looked at so far, everything less
than the pivot comes before everything

108
00:07:43,060 --> 00:07:48,011
bigger than the pivot. This is the picture
we wanna retain, as we go through the

109
00:07:48,011 --> 00:07:53,014
linear scan. As this high-level picture
would suggest, there is two boundaries

110
00:07:53,014 --> 00:07:57,064
that we're gonna need to keep track of
throughout the algorithm. We're gonna need

111
00:07:57,064 --> 00:08:01,098
to keep track of the boundary between what
we've looked at so far, and what we

112
00:08:01,098 --> 00:08:06,059
haven't looked at yet. So, that's going to
be, we're going to use the index  "j" to keep

113
00:08:06,059 --> 00:08:10,098
track of that boundary. And then, we also
need a second boundary, for amongst the

114
00:08:10,098 --> 00:08:15,037
stuff that we've seen, where is the split
between those less than the pivot and

115
00:08:15,037 --> 00:08:19,048
those bigger than the pivot. So, that's
gonna be "i". So, let's use our running

116
00:08:19,048 --> 00:08:26,004
example array. >> So stuff is pretty
simple when we're starting out. We haven't

117
00:08:26,004 --> 00:08:32,008
looked at anything. So all of this stuff
is unpartitioned. And "i" and  "j" both point

118
00:08:32,008 --> 00:08:38,049
to the boundary between the pivot and all
the stuff that we haven't seen yet. Now to

119
00:08:38,049 --> 00:08:42,017
get a running time reaches linear, we
want to make sure that at each step we

120
00:08:42,017 --> 00:08:46,000
advance  "j", we look at one new element.
That way in a linear number of steps, we'll

121
00:08:46,000 --> 00:08:49,097
have looked at everything, and hopefully
we'll be done, and we'll have a partitioned

122
00:08:49,097 --> 00:08:55,035
array. So, in the next step, we're going to
advance  "j". So the region of the array

123
00:08:55,035 --> 00:08:59,054
which is, which we haven't looked at,
which is unpartitioned, is one smaller

124
00:08:59,054 --> 00:09:03,086
than before. We've now looked at the
8, the first element after the pivot.

125
00:09:04,094 --> 00:09:08,073
Now the 8 itself is indeed a
partitioned array. Everything less than

126
00:09:08,073 --> 00:09:13,000
the pivot comes before, everything after
the pivot turns out there's nothing less

127
00:09:13,000 --> 00:09:17,007
than the pivot. So vacuously this is
indeed partitioned. So  "j" records

128
00:09:17,007 --> 00:09:21,079
delineates the boundary between what we've
looked at and what we haven't looked at, "i"

129
00:09:21,079 --> 00:09:26,018
delineates amongst the stuff we've looked
at, where is the boundary between what's

130
00:09:26,018 --> 00:09:30,031
bigger than and what's less than the
pivot. So the 8 is bigger than the pivot,

131
00:09:30,031 --> 00:09:35,014
so "i" should be right here. Okay, because we
want "i" to be just to the left of all the

132
00:09:35,014 --> 00:09:39,098
stuff bigger than the pivot. Now, what's
gonna happen in the next iteration? This

133
00:09:39,098 --> 00:09:44,095
is where things get interesting. Suppose
we advance  "j" one further. Now the part of

134
00:09:44,095 --> 00:09:49,054
the array that we've seen is an 8
followed by a 2. Now an 8 and a 2

135
00:09:49,054 --> 00:09:53,023
is not a partitioned subarray. Remember
what it means to be a partitioned

136
00:09:53,023 --> 00:09:57,021
subarray? All the stuff less
than the pivot, all the stuff less than

137
00:09:57,021 --> 00:10:01,010
3, should come before everything
bigger than 3. So (8, 2) obviously

138
00:10:01,010 --> 00:10:05,023
fails that property. 2 is less than the
pivot, but it comes after the 8, which

139
00:10:05,023 --> 00:10:09,012
is bigger than the pivot. So, to correct
this, we're going to need to do a swap.

140
00:10:09,012 --> 00:10:15,060
We're going to swap the 2 and the 8.
That gives us the following version of the

141
00:10:15,060 --> 00:10:21,088
original array. So now the stuff that we
have not yet looked at is one smaller than

142
00:10:21,088 --> 00:10:28,084
before. We've advanced  "j". So all other
stuff is unpartitioned. Who knows what's

143
00:10:28,084 --> 00:10:34,001
going on with that stuff?  "j" is one further
entry to the right than it was before, and

144
00:10:34,001 --> 00:10:39,081
at least after we have done this swap, we
do indeed have a partitioned array. So

145
00:10:39,081 --> 00:10:43,098
post-swap, the 2 and the 8, are
indeed partitioned. Now remember, "I"

146
00:10:43,098 --> 00:10:48,062
delineates the boundary between amongst
what we've seen so far, the stuff less

147
00:10:48,062 --> 00:10:53,032
than the pivot, less than 3 in this
case, and that bigger than 3, so "I" is

148
00:10:53,032 --> 00:10:59,060
going to be wedged in between the 2 and
the 8. In the next iteration, our life

149
00:10:59,060 --> 00:11:04,076
is pretty easy. So, in this case, in
advancing  "j", we uncover an element which

150
00:11:04,076 --> 00:11:08,087
is bigger than the pivot. So, this is what
happened in the first iteration, when we

151
00:11:08,087 --> 00:11:12,098
uncovered the 8. It's different than
what happened in the last iteration when

152
00:11:12,098 --> 00:11:16,083
we uncovered the 2. And so, this case,
this third iteration is gonna be more

153
00:11:16,083 --> 00:11:20,088
similar to the first iteration than the
second iteration. In particular, we won't

154
00:11:20,088 --> 00:11:24,074
need to swap. We won't need to advance "i".
We just advance  "j", and we're done. So,

155
00:11:24,074 --> 00:11:30,024
let's see why that's true. So, we've
advanced  "j". We've done one more iteration.

156
00:11:30,024 --> 00:11:35,006
So, now the stuff we haven't seen yet is
only the last four elements. So, who knows

157
00:11:35,006 --> 00:11:40,010
what's up with, the stuff we haven't seen
yet? But if you look at the stuff we have

158
00:11:40,010 --> 00:11:44,050
seen, the 2, the 8, and the 5,
this is, in fact, partitioned, right? All

159
00:11:44,050 --> 00:11:48,089
the numbers that are bigger than 3
succeed, come after, all the numbers

160
00:11:48,089 --> 00:11:54,096
smaller than three. So the "j",
the boundary between what we've seen and

161
00:11:54,096 --> 00:11:58,094
what we haven't is between the 5 and
the 1; and the "i", the boundary between

162
00:11:58,094 --> 00:12:03,028
the stuff less than the pivot and bigger
than the pivot is between the 2 and the

163
00:12:03,028 --> 00:12:10,005
8, just like it was before. Adding a
5 to the end didn't change anything. So

164
00:12:10,005 --> 00:12:16,063
let's wrap up this example in the next
slide. So first, let's just remember where

165
00:12:16,063 --> 00:12:20,069
we left off from the previous slide. So
I'm just gonna redraw that same step after

166
00:12:20,069 --> 00:12:28,080
three iterations of the algorithm. And
notice, in the next generation, we're going

167
00:12:28,080 --> 00:12:33,046
to, again, have to make some modifications
to the array, if we want preserve our

168
00:12:33,046 --> 00:12:37,069
variant. The reason is that when we
advance  "j", when we scan this 1, now

169
00:12:37,069 --> 00:12:42,023
again we're scanning in a new element
which is less than the pivot, and what

170
00:12:42,023 --> 00:12:46,050
that means is that, the partitioned
region, or the region that we've looked at

171
00:12:46,050 --> 00:12:50,036
so far, will not be partitioned. We'll
have 2851. Remember we need everything

172
00:12:50,036 --> 00:12:54,022
less than 3 to precede everything
bigger than 3, and this 1

173
00:12:54,022 --> 00:12:58,038
at end is not going to cut it. So we're
going to have to make a swap. Now what are

174
00:12:58,038 --> 00:13:05,077
we going to swap? We're going to swap the
1 and the 8. So, why do we swap the

175
00:13:05,077 --> 00:13:10,033
1 and the 8? Well, clearly, we have
to swap the 1 with something. And, what

176
00:13:10,033 --> 00:13:14,078
makes sense? What makes sense is the
left-most array entry, which is currently

177
00:13:14,078 --> 00:13:19,000
bigger than the pivot. And, that's exactly
the 8. Okay, that's the first,

178
00:13:19,000 --> 00:13:23,074
left-most entry bigger than 3, so if
we swap the 1 with it, then the 1 will

179
00:13:23,074 --> 00:13:29,092
become the right-most entry smaller than
3. So after the swap, we're gonna have

180
00:13:29,092 --> 00:13:36,037
the following array. The stuff we haven't
seen is the 4, the 7, and the 6.

181
00:13:37,087 --> 00:13:44,048
So the  "j" will be between the 8 and the
4. The stuff we have seen is the 2,

182
00:13:44,048 --> 00:13:49,057
1, 5, and 8. And notice, that
this is indeed partitioned. All the

183
00:13:49,057 --> 00:13:53,033
elements, which are less than 3, the
2 and the 1, precede all of the

184
00:13:53,033 --> 00:13:57,003
entries, which are bigger than 3, the
5 and the 8. "i", remember, is

185
00:13:57,003 --> 00:14:00,094
supposed to split, be the boundary
between those less than 3 and those

186
00:14:00,094 --> 00:14:04,094
bigger than 3. So, that's gonna lie
between the 1 and the 5. That is one

187
00:14:04,094 --> 00:14:11,017
further to the right than it was in the
previous iteration. Okay, so the, because

188
00:14:11,017 --> 00:14:15,050
the rest of the unseen elements, the 4,
the 7, and the 6, are all bigger

189
00:14:15,050 --> 00:14:20,000
than the pivot, the last three iterations
are easy. No further swaps are necessary.

190
00:14:20,000 --> 00:14:24,073
No increments to "i" are necessary.  "j" is just
going to get incremented until we fall off

191
00:14:24,073 --> 00:14:29,017
the array. And then, fast forwarding, the
Partition subroutine, or this main linear

192
00:14:29,017 --> 00:14:35,047
scan, will terminate with the following
situation. So at this point, all of the

193
00:14:35,047 --> 00:14:40,047
elements have been seen, all the elements
are partitioned.  "j" in effect has fallen

194
00:14:40,047 --> 00:14:45,041
off the end of the array, and "i", the boundary
between those less than and bigger than

195
00:14:45,041 --> 00:14:50,085
the pivot, still lies between the 1 and
the 5. Now, we're not quite done,

196
00:14:50,085 --> 00:14:54,080
because the pivot element 3 is not in
the correct place. Remember, what we're

197
00:14:54,080 --> 00:14:58,067
aiming for is an array where everything
less than the pivot is to the left of it,

198
00:14:58,067 --> 00:15:02,062
and everything bigger than the pivot is to
the right. But right now, the pivot still

199
00:15:02,062 --> 00:15:06,053
is hanging out in the first element. So,
we just have to swap that into the correct

200
00:15:06,053 --> 00:15:10,020
place. Where's the correct place? Well,
it's going to be the right-most element,

201
00:15:10,020 --> 00:15:16,051
which is smaller than the pivot. So, in
this case, the 1. So the subroutine will

202
00:15:16,051 --> 00:15:20,099
terminate with the following array,
12358476. And, indeed, as desired,

203
00:15:20,099 --> 00:15:26,056
everything to the left of the pivot
is less than the pivot, and everything to

204
00:15:26,056 --> 00:15:31,099
the right of the pivot is bigger than the
pivot. The 1 and 2 happen to be in

205
00:15:31,099 --> 00:15:37,008
sorted order, but that was just sorta an
accident. And the 4, 5, 6 and

206
00:15:37,008 --> 00:15:44,087
7 and 8, you'll notice, are jumbled
up. They're not in sorted order. So

207
00:15:44,087 --> 00:15:49,044
hopefully from this example you have a
gist of how the Partition subroutine is

208
00:15:49,044 --> 00:15:53,084
going to work in general. But, just to
make sure the details are clear, let me

209
00:15:53,084 --> 00:16:01,005
now describe the pseudocode for the
Partition subroutine. So the way I'm going

210
00:16:01,005 --> 00:16:05,051
to denote it is, there's going to be an
input array A. But rather than being told

211
00:16:05,051 --> 00:16:09,081
some explicit link, what's going to be
passed to the subroutine are two array

212
00:16:09,081 --> 00:16:13,077
indices. The leftmost index, which
delineates this part of the separator

213
00:16:13,077 --> 00:16:20,017
you're supposed to work on, and the
rightmost index. The reason I'm writing it

214
00:16:20,017 --> 00:16:24,078
this way is because Partition is going to
be called recursively from within a QuickSort

215
00:16:24,078 --> 00:16:28,097
algorithm. So any point in QuickSort,
we're going to be recursing on some

216
00:16:28,097 --> 00:16:33,021
subset, contiguous subset of the original
input array. "l(el)" and "r" meant to denote

217
00:16:33,021 --> 00:16:39,070
what the left boundary and the right
boundary of that subarray are. So, let's

218
00:16:39,070 --> 00:16:44,001
not lose sight of the high-level picture
of the invariant that the algorithm is

219
00:16:44,001 --> 00:16:48,015
meant to maintain. So, as we discussed,
we're assuming the pivot element is the

220
00:16:48,015 --> 00:16:52,030
first element, although that's really
without loss of generality. At any given

221
00:16:52,030 --> 00:16:56,050
time, there's gonna be stuff we haven't
seen yet. Who knows what's up with that?

222
00:16:56,050 --> 00:17:00,091
And, amongst the stuff we've seen, we're
gonna maintain the invariant that all the

223
00:17:00,091 --> 00:17:05,049
stuff less than the pivot comes before all
the stuff bigger than the pivot. And  "j" and

224
00:17:05,049 --> 00:17:09,069
I denote the boundaries, between the seen
and the unseen, and between the small

225
00:17:09,069 --> 00:17:15,012
elements and the large elements,
respectively. So back to the pseudocode,

226
00:17:15,012 --> 00:17:20,041
we initialize the pivot to be the first
entry in the array. And again remember, l

227
00:17:20,041 --> 00:17:26,091
denotes the leftmost index that we're
responsible for looking at. Initial value

228
00:17:26,091 --> 00:17:32,021
of "i", should be just to the right of the
pivot so that's gonna be el+1.

229
00:17:32,021 --> 00:17:40,015
That's also the initial value of  "j", which
will be assigned in the main for-Loop. So this

230
00:17:40,015 --> 00:17:46,074
for-Loop with "j", taking on all values from
el+1 to the rightmost index "r",

231
00:17:46,074 --> 00:17:55,001
denotes the linear scan through the input array. And,
what we saw in the example is that there

232
00:17:55,001 --> 00:17:58,062
were two cases, depending on, for the
newly seen element, whether it's bigger

233
00:17:58,062 --> 00:18:02,050
than the pivot, or less than the pivot.
The easy case is when it's bigger than the

234
00:18:02,050 --> 00:18:06,025
pivot. Then we essentially don't have to
do anything. Remember, we didn't do any

235
00:18:06,025 --> 00:18:10,009
swaps, we didn't change "i", the boundary
didn't change. It was when the new element

236
00:18:10,009 --> 00:18:16,076
was less than the pivot that we had to do
some work. So, we're gonna check that, is

237
00:18:16,076 --> 00:18:26,086
the newly seen element, A[j], less than "p".
And if it's not, we actually don't have to do

238
00:18:26,086 --> 00:18:35,040
anything. So let me just put as a
comment. If the new element is bigger than

239
00:18:35,040 --> 00:18:40,095
the pivot, we do nothing. Of course at the
end of the for-Loop, the value of  "j" will

240
00:18:40,095 --> 00:18:44,084
get in command so that's the only thing
that changes from iteration to iteration,

241
00:18:44,084 --> 00:18:48,073
when we're sucking up new elements that
happen to be bigger than "p". So what do we

242
00:18:48,073 --> 00:18:52,063
do in the example, when we suck up our new
element less than p? Well we have to do

243
00:18:52,063 --> 00:19:01,000
two things. So, in the event that the
newly seen element is less than "p", I'll

244
00:19:01,000 --> 00:19:06,024
circle that here in pink. We need to do a
rearrangement, so we, again, have a

245
00:19:06,024 --> 00:19:11,084
partitioned, sub-array amongst those
elements we've seen so far. And, the best

246
00:19:11,084 --> 00:19:17,079
way to do that is to swap this new element
with the left-most element that's bigger

247
00:19:17,079 --> 00:19:24,031
than the pivot. And because we have an
index "i", which is keeping track of the

248
00:19:24,031 --> 00:19:28,079
boundary between the elements less than
the pivot and bigger than the pivot, we can

249
00:19:28,079 --> 00:19:33,022
immediately access the leftmost element
bigger than the pivot.

250
00:19:33,022 --> 00:19:39,023
That's just the "i"th
entry in the array. Now I am

251
00:19:39,023 --> 00:19:43,098
doing something a little sneaky here,
I should be honest about. Which is there is

252
00:19:43,098 --> 00:19:48,056
the case where you haven't yet seen any
elements bigger than the pivot, and then

253
00:19:48,056 --> 00:19:52,087
you don't actually have a leftmost
element bigger than the pivot to swap

254
00:19:52,087 --> 00:19:56,082
with. Turns out this code still works,
I'll let you verify that, but it does do

255
00:19:56,082 --> 00:20:01,002
some redundant swaps. Really, you don't
need to do any swaps until you first see

256
00:20:01,002 --> 00:20:04,088
some elements bigger than the pivot, and
then see some elements less than the

257
00:20:04,088 --> 00:20:08,036
pivot. So, you can imagine a different
limitation of this, where you

258
00:20:08,036 --> 00:20:11,080
actually keep track of whether or not
that's happened to avoid the redundant

259
00:20:11,080 --> 00:20:14,097
swaps. I'm just gonna give you the
simple pseudocode. And again, for

260
00:20:14,097 --> 00:20:18,063
intuition, you wanna think about the case
just like, in the picture here in blue,

261
00:20:18,063 --> 00:20:22,011
where we've already seen some elements
that are bigger than the pivot, and the

262
00:20:22,011 --> 00:20:25,077
next newly seen element is less than the
pivot. That's really sort of the key case

263
00:20:25,077 --> 00:20:31,048
here. Now the other thing we have to do
after one of these swaps is, now the

264
00:20:31,048 --> 00:20:35,060
boundary, between where the array elements
less than the pivot and

265
00:20:35,060 --> 00:20:39,073
those bigger than the pivot, has moved.
It's moved one to the right, so

266
00:20:39,073 --> 00:20:45,029
we have to increment "i". So, that's the
main linear scan. Once this concludes,  "j"

267
00:20:45,029 --> 00:20:49,093
will have fallen off the end of the array.
And, everything that we've seen the final

268
00:20:50,009 --> 00:20:54,025
elements, except for the pivot, will be
arranged so that those less than "p" are

269
00:20:54,025 --> 00:20:58,067
first, those bigger than "p" will be last.
The final thing we have to do is just swap

270
00:20:58,067 --> 00:21:03,009
the pivot into its rightful position. And,
recall for that, we just swap it with the

271
00:21:03,009 --> 00:21:10,027
right-most element less than it. So, that
is it. That is the Partition subroutine.

272
00:21:10,027 --> 00:21:14,055
There's a number of variants of partition.
This is certainly not the unique

273
00:21:14,055 --> 00:21:18,093
implementation. If you look on the web, or
if you look in certain textbooks, you'll

274
00:21:18,093 --> 00:21:23,021
find some other implementations as well as
discussion of the various merits. But, I

275
00:21:23,021 --> 00:21:27,018
hope this gives you, I mean, this is a
canonical implementation, and I hope it

276
00:21:27,018 --> 00:21:31,025
gives you a clear picture of how you
rearrange the array using in-place swaps

277
00:21:31,025 --> 00:21:35,042
to get the desired property, that all the
stuff before the pivot comes first, all

278
00:21:35,042 --> 00:21:41,048
the stuff after the pivot comes last.
Let me just add a few details about why

279
00:21:41,048 --> 00:21:46,069
this pseudocode I just gave you does,
indeed, have the properties required. The

280
00:21:46,069 --> 00:21:52,016
running time is O(N), really theta of N,
but again, I'll be sloppy and write

281
00:21:52,016 --> 00:21:57,030
O(N). Where N is the number of array
elements that we have to look at. So, N is

282
00:21:57,030 --> 00:22:02,037
r-el+1, which is the length of
the sub-array that this Partition

283
00:22:02,037 --> 00:22:08,002
subroutine is invoked upon. And why is
this true? Well if you just go inspect the

284
00:22:08,002 --> 00:22:12,024
pseudocode, you can just count it up
naively and you'll find that this is true.

285
00:22:12,024 --> 00:22:16,066
We just do a linear scan through the array
and all we do is basically a comparison

286
00:22:16,066 --> 00:22:22,064
and possibly a swap and an increment for
each array entry that we see. Also, if you

287
00:22:22,064 --> 00:22:27,006
inspect the code, it is evident that it
works in-place. We do not allocate some

288
00:22:27,006 --> 00:22:31,026
second copy of an array to populate, like
we did in the naive Partition

289
00:22:31,026 --> 00:22:36,083
subroutine. All we do is repeated swaps.
Correctness of the subroutine follows by

290
00:22:36,083 --> 00:22:41,060
induction, so in particular the best way
to argue it is by invariant. So I'll state

291
00:22:41,060 --> 00:22:45,092
the invariant here, but mostly leave it
for you to check that indeed, every

292
00:22:45,092 --> 00:22:51,071
iteration of the for-Loop maintains this
invariant. So first of all, all of the

293
00:22:51,071 --> 00:22:56,072
stuff to the right of the pivot element,
to the right of the leftmost entry and up

294
00:22:56,072 --> 00:23:01,002
to the index "i", is indeed less than the
pivot element, as suggested by the

295
00:23:01,002 --> 00:23:07,027
picture. And also suggested by the
picture, everything beginning with the "i"th

296
00:23:07,027 --> 00:23:13,018
entry, leading just up before the "j"th entry,
is bigger than the pivot. And I'll

297
00:23:13,018 --> 00:23:17,068
leave it as a good exercise for you to
check that this holds by induction.

298
00:23:18,064 --> 00:23:23,013
The invariant holds initially,
when both "i" and  "j" are equal to el+1,

299
00:23:23,013 --> 00:23:27,002
because both of these sets are vacuous, okay?
So, there are no such elements, so they're

300
00:23:27,002 --> 00:23:30,072
trivially satisfied these properties. And
then, every time we advance  "j", well, in

301
00:23:30,072 --> 00:23:34,037
one case it's very easy, where the new
element is bigger than the pivot. It's

302
00:23:34,037 --> 00:23:38,012
clear that, if the invariant held before,
it also holds at, at the next iteration.

303
00:23:38,012 --> 00:23:42,001
And then, if you think about it carefully,
this swap in this increment of "i" that we

304
00:23:42,001 --> 00:23:46,012
do, in the case where the new element is
less than the pivot. After the swap, once

305
00:23:46,012 --> 00:23:50,099
the fold is complete, again if this
invariant was true at the beginning of it,

306
00:23:50,099 --> 00:23:55,080
it's also true at the end. So what good is
that? Well, by this claim, at the

307
00:23:55,080 --> 00:24:00,068
conclusion of the linear scan at which
point "j" has fallen off the end of the

308
00:24:00,068 --> 00:24:06,048
array, the array must look like this. At
the end of the for-Loop, the question

309
00:24:06,048 --> 00:24:10,052
mark part of the array has vanished, so
everything other than the pivot has been

310
00:24:10,052 --> 00:24:14,071
organized so that all this stuff less than
the pivot comes before everything after

311
00:24:14,071 --> 00:24:18,054
the pivot, and that means once you do the
final swap, once you swap the pivot

312
00:24:18,054 --> 00:24:22,053
element from its first and left most
entry, with the right most entry less than

313
00:24:22,053 --> 00:24:26,046
the pivot, you're done. Okay? You've got
the desired property that everything to

314
00:24:26,046 --> 00:24:30,045
the left of the pivot is less than, and
everything to the right of the pivot is

315
00:24:30,045 --> 00:24:35,099
bigger than. So now that given a pivot
element we understand how to very quickly

316
00:24:35,099 --> 00:24:40,031
rearrange the array so that it's
partitioned around that pivot element,

317
00:24:40,031 --> 00:24:45,012
let's move on to understanding how that
pivot element should be chosen and how,

318
00:24:45,012 --> 00:24:49,081
given suitable choices of that pivot
element, we can implement the QuickSort

319
00:24:49,081 --> 00:24:54,050
algorithm, to run very quickly, in
particular, on average in O(N) log time.