1
00:00:00,290 --> 00:00:02,620
Welcome to part one of
our probability review.

2
00:00:02,620 --> 00:00:05,290
The first time that we need these
concepts in the course, is for

3
00:00:05,290 --> 00:00:07,750
those of you who want to understand
the analysis of Quicksort.

4
00:00:07,750 --> 00:00:10,610
Why it runs in big O of
n log n time on average.

5
00:00:10,610 --> 00:00:12,955
And these topics will also come up
a couple of other times in the course.

6
00:00:12,955 --> 00:00:16,671
For example when we study a randomized
algorithm for the minimum cut

7
00:00:16,671 --> 00:00:21,120
program in graphs and also when we try to
understand the performance of hashing.

8
00:00:21,120 --> 00:00:22,570
Here are the topics we're going to cover.

9
00:00:22,570 --> 00:00:26,453
We'll start at the beginning with sample
spaces and then we'll discuss events and

10
00:00:26,453 --> 00:00:27,780
their probabilities.

11
00:00:27,780 --> 00:00:29,010
We'll talk about random variables,

12
00:00:29,010 --> 00:00:31,530
which are real valued
functions on a sample space.

13
00:00:31,530 --> 00:00:32,710
We'll talk about expectation,

14
00:00:32,710 --> 00:00:35,820
which is basically the average
value of a random variable.

15
00:00:35,820 --> 00:00:38,990
We'll identify and
prove a very important property,

16
00:00:38,990 --> 00:00:42,260
called the linearity of expectation,
which will come up over and over again.

17
00:00:42,260 --> 00:00:44,490
In our analyses of randomized processes.

18
00:00:44,490 --> 00:00:46,300
So that's going to be the topics for
part one.

19
00:00:46,300 --> 00:00:49,980
Then we'll conclude the video with one
example tying these concepts together in

20
00:00:49,980 --> 00:00:51,250
load balancing.

21
00:00:51,250 --> 00:00:54,756
And this video is by no means the only
source you can turn to to learn about

22
00:00:54,756 --> 00:00:55,643
these concepts.

23
00:00:55,643 --> 00:00:58,975
A couple of other sources I recommend are
the online lecture notes by Eric Lehman

24
00:00:58,975 --> 00:01:00,170
and Tom Leighton.

25
00:01:00,170 --> 00:01:03,920
Also, there's a Wikibook on discrete
probability, which you could check out.

26
00:01:03,920 --> 00:01:06,775
And I want to be clear this is
really not meant to be a course or

27
00:01:06,775 --> 00:01:10,755
a tutorial on probability concepts,
it's really only meant to be a refresher.

28
00:01:10,755 --> 00:01:12,844
So I'm going to go at
a reasonably fast pace and

29
00:01:12,844 --> 00:01:15,450
it's going to be a pretty
cursory presentation.

30
00:01:15,450 --> 00:01:16,838
And if you want a more thorough review,

31
00:01:16,838 --> 00:01:18,780
I would check out one
of these other sources.

32
00:01:18,780 --> 00:01:20,984
Or your favorite book on
Discrete Probability.

33
00:01:20,984 --> 00:01:22,125
And along those same lines,

34
00:01:22,125 --> 00:01:25,200
I'm thinking that many of you have
seen some of this material before.

35
00:01:25,200 --> 00:01:29,160
Don't feel compelled to watch this video
straight from the beginning to the end.

36
00:01:29,160 --> 00:01:30,769
Feel free to just sort of dip in and

37
00:01:30,769 --> 00:01:33,194
review the concepts that
you need a refresher on.

38
00:01:35,780 --> 00:01:38,290
So, let's start at the beginning
with sample spaces.

39
00:01:41,960 --> 00:01:43,170
So what is a sample space?

40
00:01:43,170 --> 00:01:46,200
Well, we're analyzing random processes so
any number of things could happen.

41
00:01:46,200 --> 00:01:48,860
And in the sample space is just
the collection of all of the things

42
00:01:48,860 --> 00:01:49,460
that could happen.

43
00:01:49,460 --> 00:01:53,211
So this is basically the universe in which
we're going to discuss probabilities and

44
00:01:53,211 --> 00:01:54,035
average values.

45
00:01:57,860 --> 00:02:02,139
So I'll often use the notation big
omega to describe the sample space.

46
00:02:03,820 --> 00:02:07,284
So one thing we've got going for
us in the design of algorithms

47
00:02:07,284 --> 00:02:10,100
is typically we can take
omega to be a finite set.

48
00:02:10,100 --> 00:02:12,630
So that's why we're dealing
only with discrete probability

49
00:02:12,630 --> 00:02:13,830
which is a very happy thing.

50
00:02:13,830 --> 00:02:16,830
because that's much more elementary
than more general probability.

51
00:02:19,760 --> 00:02:23,601
In addition to defining the outcomes,
everything that could possibly happen,

52
00:02:23,601 --> 00:02:26,940
we need to define what is the probability
of each individual outcome.

53
00:02:30,310 --> 00:02:33,130
So of course the probability of each
outcome should be at least zero,

54
00:02:33,130 --> 00:02:34,257
should be non-negative.

55
00:02:34,257 --> 00:02:37,178
And there's also the obvious constraint
that the sum of the probabilities

56
00:02:37,178 --> 00:02:38,230
should be one.

57
00:02:38,230 --> 00:02:40,009
So exactly one thing is going to happen.

58
00:02:42,100 --> 00:02:44,620
Now I realize this is
a super abstract concept and

59
00:02:44,620 --> 00:02:47,110
the next few definitions
are also a little abstract.

60
00:02:47,110 --> 00:02:49,400
So throughout them I'm going to
use two really simple,

61
00:02:49,400 --> 00:02:53,200
really concrete examples to
illustrate what these concepts mean.

62
00:02:53,200 --> 00:02:56,180
So the first example is just going to
be you take two six sided dice and

63
00:02:56,180 --> 00:02:57,090
you roll them.

64
00:02:57,090 --> 00:03:00,790
And of course, the same space is just
the 36 different outcomes you could have

65
00:03:00,790 --> 00:03:01,959
of these two dice.

66
00:03:04,260 --> 00:03:09,127
And assuming that each of these two dice
is well crafted, then we expect each of

67
00:03:09,127 --> 00:03:14,614
these 36 outcomes to be equally likely, to
occur with a probability of one over 36.

68
00:03:14,614 --> 00:03:17,444
The second running example I'm going to
use is more directly related to

69
00:03:17,444 --> 00:03:20,520
algorithms, and it's motivated
by the quick sort algorithm.

70
00:03:20,520 --> 00:03:23,126
Recall that we're studying
the implementation of Quicksort

71
00:03:23,126 --> 00:03:25,993
that chooses a pivot, uniformly
a random in every recursive call.

72
00:03:25,993 --> 00:03:30,048
So, let's just focus on the very first
outer most call of Quicksort and

73
00:03:30,048 --> 00:03:33,700
think about the random choice
of the pivot just in that call.

74
00:03:33,700 --> 00:03:38,788
So, then in the sample space all of
the different things that could happen is

75
00:03:38,788 --> 00:03:44,211
just all of the end different choices for
a pivot assuming the array has length n.

76
00:03:44,211 --> 00:03:48,021
So we can represent the sample space just
as the integer is one two all the way up

77
00:03:48,021 --> 00:03:51,369
to N corresponding to the array
index of the randomly chosen pivot.

78
00:03:51,369 --> 00:03:56,404
And again by definition by the def
construction of our code each

79
00:03:56,404 --> 00:04:01,270
of these things is equal to
likely probability of one over N.

80
00:04:01,270 --> 00:04:03,170
Now let's talk about events.

81
00:04:03,170 --> 00:04:07,180
An events is nothing more than a subsets
of all of the things that could happen,

82
00:04:07,180 --> 00:04:09,440
that is a subset of
the sample space omega.

83
00:04:11,780 --> 00:04:14,850
The probability of an event isn't exactly
what you think it would be, it's just

84
00:04:14,850 --> 00:04:18,780
the sum of the probabilities of all of
the outcomes contained in that event.

85
00:04:18,780 --> 00:04:22,080
Right, so an event is just a bunch
of stuff that might happen.

86
00:04:22,080 --> 00:04:24,532
We know the probability of each
individual thing that can happen,

87
00:04:24,532 --> 00:04:26,343
we add them up to get
the probability of an event.

88
00:04:29,530 --> 00:04:32,595
So the next two quizzes are meant to give
you some practice with these concepts.

89
00:04:32,595 --> 00:04:33,547
And in particular,

90
00:04:33,547 --> 00:04:37,470
they'll ask you to compute the probability
of events in our two running examples.

91
00:04:39,624 --> 00:04:43,190
So on the first quiz, this is our first
running example where we think about two

92
00:04:43,190 --> 00:04:45,520
dice and we have our 36 possible outcomes.

93
00:04:45,520 --> 00:04:49,846
Consider the subset of outcomes in
which the sum of the two dice equals 7.

94
00:04:49,846 --> 00:04:51,920
What is the probability of that event?

95
00:04:54,830 --> 00:04:56,580
Right so
the correct answer is the third one.

96
00:04:56,580 --> 00:04:59,040
The probability is 1/ 6.

97
00:04:59,040 --> 00:04:59,630
Why is that?

98
00:04:59,630 --> 00:05:02,567
Well first let's be more precise
about what this event is.

99
00:05:05,030 --> 00:05:08,431
What are the outcomes in which
the sum of the dice is equal to 7?

100
00:05:08,431 --> 00:05:12,068
Well there's exactly six such outcomes.

101
00:05:12,068 --> 00:05:16,082
1,6 2,5 3,4

102
00:05:16,082 --> 00:05:21,212
4,3 5,2 and 6,1.

103
00:05:21,212 --> 00:05:25,705
Each of the 36 outcomes is equally likely,
has the probability of one over 36.

104
00:05:25,705 --> 00:05:27,935
So we have six members of the set.

105
00:05:27,935 --> 00:05:29,640
Each has probability of one over 36.

106
00:05:29,640 --> 00:05:31,255
So the probability is 1/6.

107
00:05:33,310 --> 00:05:36,940
Let's move onto the next quiz, which
considers our second running example,

108
00:05:36,940 --> 00:05:38,700
namely, the randomly chosen pivot.

109
00:05:38,700 --> 00:05:42,940
And the outermost call to QuickSort
on an input array of length n.

110
00:05:42,940 --> 00:05:46,300
So recall that in quick sort,
when you choose a pivot,

111
00:05:46,300 --> 00:05:48,450
you then partition
the array around the pivot.

112
00:05:48,450 --> 00:05:51,520
And this splits the input
array into two sub-arrays.

113
00:05:51,520 --> 00:05:52,490
A left one.

114
00:05:52,490 --> 00:05:53,610
Elements less than the pivot.

115
00:05:53,610 --> 00:05:56,220
And a right one,
those bigger than the pivot.

116
00:05:56,220 --> 00:05:59,570
And the more balanced the split into
theses two sub problems the better.

117
00:05:59,570 --> 00:06:01,892
So ideally we'd like a 50 50 split.

118
00:06:01,892 --> 00:06:04,548
So what this quiz asked you
is what fraction of pivots,

119
00:06:04,548 --> 00:06:08,131
that is what's the probability that
a randomly chosen pivot will give you

120
00:06:08,131 --> 00:06:09,600
a reasonably good split?

121
00:06:09,600 --> 00:06:13,649
Meaning both of the sub problems
have size at least 25%.

122
00:06:13,649 --> 00:06:16,366
That is you get a split 25, 75 or better.

123
00:06:16,366 --> 00:06:17,640
That's what this quiz asks about.

124
00:06:17,640 --> 00:06:21,271
What's the probability that your randomly
chosen pivot satisfies that property?

125
00:06:24,650 --> 00:06:27,990
So the correct answer to this
quiz is again the third option.

126
00:06:27,990 --> 00:06:31,320
It's a 50% probability you
get a 25-75 split or better.

127
00:06:31,320 --> 00:06:34,680
So to see why let's again be precise
about what is the event that we're

128
00:06:34,680 --> 00:06:35,290
talking about.

129
00:06:35,290 --> 00:06:36,823
Then we'll compute its probability.

130
00:06:36,823 --> 00:06:40,690
So when does a pivot give
you a 25-75 split or better?

131
00:06:40,690 --> 00:06:41,992
Well for concreteness,

132
00:06:41,992 --> 00:06:45,718
suppose the array contained just
the integers between one and 100.

133
00:06:45,718 --> 00:06:47,200
Now, what's the property we want?

134
00:06:47,200 --> 00:06:50,820
We want that both of the two subarrays
have at least 25% of the elements,

135
00:06:50,820 --> 00:06:52,780
neither one has more
than 75% of the elements.

136
00:06:54,060 --> 00:06:58,840
Well, if we choose an element that's 26 or
bigger in value.

137
00:06:58,840 --> 00:07:01,886
Then the left sub-problem will
have at least 25 elements,

138
00:07:01,886 --> 00:07:03,360
the numbers 1 through 25.

139
00:07:03,360 --> 00:07:08,423
And if we choose an element that's
at most 75, then the right subarray

140
00:07:08,423 --> 00:07:13,584
is going to have at least 25 elements,
namely the numbers 76 to 100.

141
00:07:13,584 --> 00:07:18,140
So anything between 26 and 75, inclusive,
is going to give us a 25-75 split.

142
00:07:19,240 --> 00:07:23,532
More generally, any pivot from
the middle 50% of the quantiles,

143
00:07:23,532 --> 00:07:25,761
is going to give us the desired split.

144
00:07:25,761 --> 00:07:28,048
So we do badly if we get something
within the first quarter,

145
00:07:28,048 --> 00:07:30,600
we do badly if we get something
within the last quarter.

146
00:07:30,600 --> 00:07:32,040
Anything in the middle works.

147
00:07:34,140 --> 00:07:37,320
So more formally, we can say that
the event s that we're analyzing

148
00:07:38,420 --> 00:07:41,860
is among the possible pivot choices.

149
00:07:41,860 --> 00:07:46,940
We're interested in the ones that
is not in the first quarter and

150
00:07:46,940 --> 00:07:47,950
not in the last quarter.

151
00:07:50,930 --> 00:07:54,483
Now the cardinality of this the number of
pivots in this set is essentially half of

152
00:07:54,483 --> 00:07:56,164
the overall number of pivot choices.

153
00:07:56,164 --> 00:07:58,540
I'm ignoring fractions here for
simplicity.

154
00:08:02,024 --> 00:08:06,375
The probability of this event
is the cardinality of this times

155
00:08:06,375 --> 00:08:10,200
the probability of each of
the individual outcomes.

156
00:08:10,200 --> 00:08:12,240
And since we choose the pivot
uniformly at random,

157
00:08:12,240 --> 00:08:13,580
each one has a probability of one over n.

158
00:08:13,580 --> 00:08:16,162
So you get n/2 / n, or 1/2.

159
00:08:20,680 --> 00:08:23,880
Now that we've explored the concept
of events in our one or two examples.

160
00:08:23,880 --> 00:08:27,830
We see that the probability that
the sum of two dice is equal to 1/6.

161
00:08:27,830 --> 00:08:30,830
A useful fact to know if
you're ever playing craps.

162
00:08:30,830 --> 00:08:33,226
We know that a pivot
gives us a 25-75 split or

163
00:08:33,226 --> 00:08:36,740
better in a randomized quick
sort with 50% probability.

164
00:08:36,740 --> 00:08:39,167
A useful fact if you want to
develop intuition for

165
00:08:39,167 --> 00:08:41,056
why quick sort is, in fact, quick.

166
00:08:41,056 --> 00:08:41,916
That's events.

167
00:08:41,916 --> 00:08:44,570
Let's move on to random variables.

168
00:08:44,570 --> 00:08:48,810
Random variables are basically some
statistic measuring what happens in

169
00:08:48,810 --> 00:08:50,049
the random outcome.

170
00:08:50,049 --> 00:08:51,550
Formally, if we want to define it.

171
00:08:51,550 --> 00:08:56,113
It's a real-valued function
defined on the sample space omega.

172
00:08:56,113 --> 00:08:59,140
Given an outcome,
given a realization of the randomness.

173
00:08:59,140 --> 00:09:00,846
This gives you back a number.

174
00:09:05,822 --> 00:09:09,098
The random variable that we most often
care about in algorithm design is

175
00:09:09,098 --> 00:09:11,300
the running time of
a randomized algorithm.

176
00:09:11,300 --> 00:09:15,006
That's the case, for example,
with the quick sort algorithm.

177
00:09:15,006 --> 00:09:16,660
Notice, that is, in fact,
a random variable.

178
00:09:16,660 --> 00:09:18,300
If we know the state of the world.

179
00:09:18,300 --> 00:09:21,770
If we know the outcome of all the coin
flips that our code's going to make.

180
00:09:21,770 --> 00:09:23,983
Then there's just some running
time of our algorithm.

181
00:09:23,983 --> 00:09:25,930
So, in that sense, it's a random variable.

182
00:09:25,930 --> 00:09:28,909
Given the outcomes of the coin flips,
out pops a number.

183
00:09:28,909 --> 00:09:31,885
The running time, say,
in milliseconds, of the algorithm.

184
00:09:34,385 --> 00:09:37,766
Here, I'm going to give you a couple more
modest examples of random variables in our

185
00:09:37,766 --> 00:09:38,749
two running examples.

186
00:09:40,680 --> 00:09:41,880
If we're rolling two dice.

187
00:09:41,880 --> 00:09:45,280
One very simple random variable
takes as input the outcome,

188
00:09:45,280 --> 00:09:46,823
the result of the two dice.

189
00:09:46,823 --> 00:09:48,840
And spits out the sum.

190
00:09:51,931 --> 00:09:53,410
That's certainly a random variable.

191
00:09:53,410 --> 00:09:56,870
On any given outcome, it's going to take
on some some integer value between 2,

192
00:09:56,870 --> 00:09:59,910
at the minimum, and 12, at the maximum.

193
00:09:59,910 --> 00:10:04,404
Our second running example is
the randomly chosen pivot made

194
00:10:04,404 --> 00:10:07,143
by the outermost call to quick sort.

195
00:10:07,143 --> 00:10:09,926
Let's think about the random variable,
which is the size.

196
00:10:09,926 --> 00:10:14,030
Meaning the subarray length,
passed to the first recursive call.

197
00:10:14,030 --> 00:10:18,220
Equivalently, this random variable is
the number of elements of the input array

198
00:10:18,220 --> 00:10:20,629
smaller than the randomly chosen pivot.

199
00:10:23,010 --> 00:10:26,654
This is a random variable that takes
on some interval value between zero,

200
00:10:26,654 --> 00:10:27,548
at the smallest.

201
00:10:27,548 --> 00:10:30,680
That's if we happen to pick the pivot
equal to the minimum of the array.

202
00:10:30,680 --> 00:10:32,700
And n-1 at the largest.

203
00:10:32,700 --> 00:10:37,915
That's if we happen to pick the maximum
element as the pivot element.

204
00:10:37,915 --> 00:10:40,820
Next, let's talk about
the expectation of a random variable.

205
00:10:40,820 --> 00:10:42,855
This is really nothing
more than the average.

206
00:10:42,855 --> 00:10:45,230
Of course, when you take
the average of some statistic.

207
00:10:45,230 --> 00:10:48,564
You want to do it weighted by
the probability of its various values.

208
00:10:48,564 --> 00:10:52,209
Let's just make that precise real quick.

209
00:10:52,209 --> 00:10:55,390
Consider some random variable, X.

210
00:10:55,390 --> 00:11:00,001
The expectation,
this is also called the expected value.

211
00:11:00,001 --> 00:11:07,022
And the notation is capital E, square
bracket, then of the random variable.

212
00:11:07,022 --> 00:11:10,450
Again, in English,
the expectation is just the average value.

213
00:11:10,450 --> 00:11:13,935
Naturally weighted by the probability
of the various possible outcomes.

214
00:11:16,008 --> 00:11:20,320
Or more mathematically,
we sum over everything that could happen.

215
00:11:20,320 --> 00:11:23,290
So let i denote one possible outcome.

216
00:11:23,290 --> 00:11:27,810
We look at the value of this random
variable when that outcome occurs.

217
00:11:27,810 --> 00:11:32,888
And then we weight up times
the probability of that outcome occurring.

218
00:11:32,888 --> 00:11:36,304
The next two quizzes ask you
to compute the expectation of

219
00:11:36,304 --> 00:11:40,373
the two random variables that we
identified on the previous slide.

220
00:11:40,373 --> 00:11:42,200
The first quiz is about two dice.

221
00:11:42,200 --> 00:11:46,750
And the random variable, which is
the sum of the values of those two dice.

222
00:11:46,750 --> 00:11:48,870
What is the average of
that random variable?

223
00:11:48,870 --> 00:11:50,281
What is its expectation?

224
00:11:55,616 --> 00:11:58,291
The answer to this question
is the second option.

225
00:11:58,291 --> 00:12:00,024
The average value is 7.

226
00:12:00,024 --> 00:12:01,860
There's a bunch of
different ways to see that.

227
00:12:01,860 --> 00:12:05,334
In my opinion, the best way to compute
this is using linearity of expectation.

228
00:12:05,334 --> 00:12:07,370
Which is the next concept
we're going to cover.

229
00:12:07,370 --> 00:12:09,940
If you wanted to, you could just
compute this by brute force.

230
00:12:09,940 --> 00:12:14,110
By which I mean, you could iterate
over all 36 possible outcomes.

231
00:12:14,110 --> 00:12:15,999
Look at the value of the two dice in each.

232
00:12:15,999 --> 00:12:20,110
And just evaluate that sum we had in
the definition on the last slide.

233
00:12:20,110 --> 00:12:23,600
A slightly sneakier way to do it, if you
don't know linearity of expectation.

234
00:12:23,600 --> 00:12:25,980
Would be to pair up the various outcomes.

235
00:12:25,980 --> 00:12:30,232
So it's equally likely that the sum
of the two dice is 2 or 12.

236
00:12:30,232 --> 00:12:35,830
It's equally likely to be 3 or
11, 4 and 10, and so on.

237
00:12:35,830 --> 00:12:40,042
Each way of pairing up these values
of the two dice results in 14.

238
00:12:40,042 --> 00:12:41,751
When you average, you get 7.

239
00:12:41,751 --> 00:12:44,580
But, again, the right way to do
this is linearity of expectation.

240
00:12:44,580 --> 00:12:46,607
Which we'll cover next.

241
00:12:46,607 --> 00:12:49,204
The second quiz covers the second
random variable we identified.

242
00:12:49,204 --> 00:12:50,320
Now we're back to QuickSort.

243
00:12:50,320 --> 00:12:52,712
And the random pivot chosen
in the outermost call.

244
00:12:52,712 --> 00:12:54,756
The question is, how big, on average,

245
00:12:54,756 --> 00:12:57,840
an expectation is the subarray
in the first recursive call?

246
00:12:57,840 --> 00:12:59,004
Equivalently, on average,

247
00:12:59,004 --> 00:13:01,666
how many elements are going to be
less than the randomly chosen pivot?

248
00:13:04,393 --> 00:13:07,846
The correct answer to this
quiz is the third option.

249
00:13:07,846 --> 00:13:11,699
In fact, it's actually quantity n-1 / 2,
not n/2.

250
00:13:11,699 --> 00:13:13,440
But, basically, half the elements.

251
00:13:13,440 --> 00:13:15,650
Again, this sort of a sneaky
way to see this if you want.

252
00:13:15,650 --> 00:13:19,210
Which is that, clearly,
the two recursive calls are symmetric.

253
00:13:19,210 --> 00:13:21,840
The expected value of the left recursive
call is going to be the same as

254
00:13:21,840 --> 00:13:24,060
the expected size of
the right recursive call.

255
00:13:24,060 --> 00:13:27,550
The two recursive calls always
comprise n-1 of the elements.

256
00:13:27,550 --> 00:13:30,222
Because they're symmetric,
you expect half in each.

257
00:13:30,222 --> 00:13:32,576
So n-1 / 2 in each.

258
00:13:32,576 --> 00:13:36,452
Though for this problem, I think it's
perfectly fine just to compute this

259
00:13:36,452 --> 00:13:38,491
using the definition of expectation.

260
00:13:38,491 --> 00:13:43,942
If we let X denote the random variable
that we care about, the subarray size.

261
00:13:43,942 --> 00:13:47,346
Then we can just compute directly by
summing over all of the possible outcomes.

262
00:13:47,346 --> 00:13:49,357
All of the possible choices of the pivot.

263
00:13:49,357 --> 00:13:52,033
With probability 1/n,
we choose the minimum of the pivot.

264
00:13:52,033 --> 00:13:55,188
Resulting in 0 elements being
passed to the first recursive call.

265
00:13:55,188 --> 00:13:58,250
With probability 1/n,
we pick the second smallest element.

266
00:13:58,250 --> 00:14:01,659
Resulting in 1 element being passed
to the first recursive call.

267
00:14:01,659 --> 00:14:03,991
With probability 1/n,
we pick the third smallest.

268
00:14:03,991 --> 00:14:06,155
Giving us a subarray size of 2.

269
00:14:06,155 --> 00:14:07,967
And so on.

270
00:14:07,967 --> 00:14:10,149
With probability 1/n,
we pick the maximum element.

271
00:14:10,149 --> 00:14:13,783
Giving us a subarray size of n-1.

272
00:14:13,783 --> 00:14:17,843
If you just compute this sum out,

273
00:14:17,843 --> 00:14:22,924
you will get, as expected, n-1 / 2.

274
00:14:22,924 --> 00:14:26,789
Expectation is the last definition that
I'm going to give you in this part one

275
00:14:26,789 --> 00:14:28,220
of the probability review.

276
00:14:28,220 --> 00:14:30,810
Next, is our fifth and
final concept for this video.

277
00:14:30,810 --> 00:14:32,888
Which is Linearity of Expectation.

278
00:14:32,888 --> 00:14:33,766
That's not a definition.

279
00:14:33,766 --> 00:14:36,142
That's more of a theorem.

280
00:14:36,142 --> 00:14:37,661
What is linearity of expectation?

281
00:14:37,661 --> 00:14:42,860
This is a very simple property of random
variables that's super-super-important.

282
00:14:42,860 --> 00:14:46,919
This comes up all the time when
we analyze randomized algorithms.

283
00:14:46,919 --> 00:14:49,540
And random processes, more generally.

284
00:14:49,540 --> 00:14:51,464
What is linearity of expectation?

285
00:14:51,464 --> 00:14:53,590
It's the following, very simple claim.

286
00:14:55,240 --> 00:14:58,090
Which I'll sometimes denote
just by LIN EXP, for short.

287
00:15:00,350 --> 00:15:03,480
Suppose you got a bunch of random
variables defined on the same sample

288
00:15:03,480 --> 00:15:04,020
space.

289
00:15:05,940 --> 00:15:09,180
Then, if you want to think
of the expected value

290
00:15:09,180 --> 00:15:11,140
of the sum of these random variables.

291
00:15:11,140 --> 00:15:15,294
It doesn't matter if you take the sum
first and then take the expectation.

292
00:15:15,294 --> 00:15:18,650
Or if you take expectations first and
then sum.

293
00:15:18,650 --> 00:15:24,459
That is, the expected value
of a sum of random variables

294
00:15:24,459 --> 00:15:32,382
is equal to the sum of the expectations
of the individual random variables.

295
00:15:36,398 --> 00:15:39,956
One of the reasons linearity
of expectations is so

296
00:15:39,956 --> 00:15:43,524
ubiquitously useful is
because it always works.

297
00:15:43,524 --> 00:15:45,435
No matter what these random variables are.

298
00:15:45,435 --> 00:15:50,530
In particular, even when the random
variables are not independent.

299
00:15:50,530 --> 00:15:52,830
Now, I haven't defined independent
random variables, yet.

300
00:15:52,830 --> 00:15:55,847
That will come in part two,
the probability review.

301
00:15:55,847 --> 00:15:59,003
But hopefully, you have an intuitive
sense of what independence means.

302
00:15:59,003 --> 00:16:01,689
Things are independent if knowing
something about one of the random

303
00:16:01,689 --> 00:16:05,220
variables Doesn't influence what you
expect from the other random variable.

304
00:16:07,280 --> 00:16:10,000
Now I realize the first time you
see linearity of expectation it's

305
00:16:10,000 --> 00:16:11,240
a little hard to appreciate.

306
00:16:11,240 --> 00:16:15,170
So first of all as far as the applications
we'll see plenty throughout this course,

307
00:16:15,170 --> 00:16:18,230
pretty much every single application
of probability that we'll see

308
00:16:18,230 --> 00:16:20,930
the analysis will involve
linearity of expectation.

309
00:16:20,930 --> 00:16:23,910
But it may be hard to appreciate
why this is not a tautology.

310
00:16:23,910 --> 00:16:27,010
Just symbolically,
it may look like it has to be true.

311
00:16:27,010 --> 00:16:31,880
But to point out that there is content
here, if I replace the sums by products,

312
00:16:31,880 --> 00:16:34,560
then this equation would
in general be false,

313
00:16:34,560 --> 00:16:36,440
if the random variables
are not independent.

314
00:16:36,440 --> 00:16:41,480
So the same thing is not true about
products, it's really about sums.

315
00:16:41,480 --> 00:16:46,010
So let me just give you a trivial
illustration of linearity of expectation,

316
00:16:46,010 --> 00:16:50,274
point out how it really easily allows
us to evaluate the sum of two dice.

317
00:16:50,274 --> 00:16:53,914
So in our first running example let's
introduce the random variables x1 and

318
00:16:53,914 --> 00:16:56,836
x2 for the results of the first and
second die respectively.

319
00:16:58,733 --> 00:17:02,277
Now computing the expected
value of a single die is easy.

320
00:17:02,277 --> 00:17:05,531
There's only six outcomes to
a enumerate over contrast that

321
00:17:05,531 --> 00:17:09,950
with the 36 outcomes to enumerate over
when we evaluated the sum of the two dies.

322
00:17:11,130 --> 00:17:15,930
So the average value of a single die you
won't be surprised to hear is 3.5 right?

323
00:17:15,930 --> 00:17:20,850
So it ranges integers between 1 and
6 uniformly so 3.5 on average.

324
00:17:20,850 --> 00:17:22,620
And now using linearity of expectation,

325
00:17:22,620 --> 00:17:27,300
the sum of two dice is simply double
the average value of a single one.

326
00:17:28,670 --> 00:17:31,155
So in the next slide I'm
going to prove this property,

327
00:17:31,155 --> 00:17:34,725
prove linearity of expectation, but
frankly the proof is pretty trivial, so

328
00:17:34,725 --> 00:17:38,292
if you don't care about the proof that's
fine you can skip it without loss I'm

329
00:17:38,292 --> 00:17:39,940
inclusing just for completeness.

330
00:17:39,940 --> 00:17:44,560
And I got to say I don't know of
another mathematical statement which is

331
00:17:44,560 --> 00:17:48,825
simultaneously so trivial to prove and
so unbelievably useful.

332
00:17:48,825 --> 00:17:52,140
It's really something remarkable
linearity of expectations.

333
00:17:52,140 --> 00:17:54,415
So how's the proof go,
well honestly we just write out the sum,

334
00:17:54,415 --> 00:17:57,260
the definition of an expectation,
then we reverse the sums, and we're done.

335
00:17:59,040 --> 00:18:01,880
So let me start with the right
hand side of the equation.

336
00:18:02,960 --> 00:18:05,990
So that was the sum of expectations
of the random variables.

337
00:18:08,490 --> 00:18:11,220
So now let's just apply
the definition of expectation.

338
00:18:11,220 --> 00:18:14,480
So it's just a weighted average
over the possible outcomes.

339
00:18:14,480 --> 00:18:17,660
In that one, instead of summing
first over the random variable j,

340
00:18:17,660 --> 00:18:21,170
and then over realized outcome i,
I'm going to do it in reverse order.

341
00:18:21,170 --> 00:18:24,920
I'm going to sum first over the outcome
i and then over the random variable j.

342
00:18:24,920 --> 00:18:29,370
Now the probability of outcome
i is independent of j so

343
00:18:29,370 --> 00:18:33,100
we can yank the p(i)
outside of that inner sum.

344
00:18:34,720 --> 00:18:35,700
But now what have we got?

345
00:18:35,700 --> 00:18:40,030
So inside the parentheses
we simply have the value

346
00:18:40,030 --> 00:18:43,900
of the sum of the xji's,
xj's on the outcome i.

347
00:18:43,900 --> 00:18:47,320
And then over here,
we're just averaging the sum of the xj's

348
00:18:47,320 --> 00:18:50,200
with respect to the probabilities,
the pi's.

349
00:18:50,200 --> 00:18:54,040
So this is just the definition of
the expectation of the sum of the random

350
00:18:54,040 --> 00:18:54,650
variables.

351
00:18:56,080 --> 00:18:56,590
So that's it.

352
00:18:56,590 --> 00:19:03,051
So linearity of expectation is really
just a reversal of the double sums.

353
00:19:03,051 --> 00:19:05,972
Now for those of you that are rusty
on these kinds of manipulations

354
00:19:05,972 --> 00:19:09,636
I just want to point out, this reversal of
the double sum itself is there's nothing

355
00:19:09,636 --> 00:19:12,200
complicated at all about what's going on.

356
00:19:12,200 --> 00:19:16,350
So if you want a really pedestrian
way to think about what's happening,

357
00:19:16,350 --> 00:19:21,940
just imagine that we take these sum ends,
these xji, pi's.

358
00:19:21,940 --> 00:19:26,180
And we just write them out in a grid,
where one, or let's just say, the columns

359
00:19:26,180 --> 00:19:30,380
are indexed by the random variable j, and
the rows are indexed by the outcome i.

360
00:19:31,850 --> 00:19:37,209
And in a given cell of this grid,
we just write the, sum end, xji times pi.

361
00:19:38,430 --> 00:19:41,150
So if you get lost in the notation
with these double sums,

362
00:19:41,150 --> 00:19:44,920
the point is you can just interpret
each of them in terms of this grid.

363
00:19:44,920 --> 00:19:48,860
Both of these double sums are nothing
more than the sum of the values

364
00:19:48,860 --> 00:19:51,080
in all of the cells of this grid.

365
00:19:51,080 --> 00:19:54,790
One order of summation just says you
group first according to row sums and

366
00:19:54,790 --> 00:19:55,880
then sum those up.

367
00:19:55,880 --> 00:19:56,850
That's the first summation.

368
00:19:56,850 --> 00:20:00,533
The second summation, you first take
column sums and then sum those up.

369
00:20:00,533 --> 00:20:04,140
But of course it doesn't matter, you just
get the result of everything in the grid.

370
00:20:04,140 --> 00:20:07,710
Okay, so there's no tricks up my
sleeve when I reverse these sums,

371
00:20:07,710 --> 00:20:09,900
it's a totally elementary, trivial thing.

372
00:20:09,900 --> 00:20:14,010
Okay, so again linearity of expectation,
trivial to prove, incredibly useful.

373
00:20:14,010 --> 00:20:14,550
Don't forget it.

374
00:20:15,648 --> 00:20:18,850
So I want to conclude this video with
one final example in order to tie

375
00:20:18,850 --> 00:20:22,140
together all of the concepts that
we've just learned, or just reviewed.

376
00:20:22,140 --> 00:20:24,907
And that's going to be
an example about load balancing,

377
00:20:24,907 --> 00:20:26,940
assigning processes to servers.

378
00:20:26,940 --> 00:20:30,536
But this in fact is quite important for
the analysis of hashing that we're

379
00:20:30,536 --> 00:20:32,920
going to see toward the end
of the course as well.

380
00:20:34,570 --> 00:20:37,680
But for now lets just think about
the following simple problem.

381
00:20:37,680 --> 00:20:42,610
For some integer n, you have n computer
processes that have to be assigned to n

382
00:20:42,610 --> 00:20:45,440
servers in some way.

383
00:20:45,440 --> 00:20:48,200
Now, you're feeling very lazy, okay, so
you're just going to take each of these

384
00:20:48,200 --> 00:20:51,200
processes and you're just going to
assign it to a totally random server,

385
00:20:51,200 --> 00:20:53,810
okay with each server equally
likely to get a given process.

386
00:20:55,220 --> 00:20:58,260
And the question I want to study
is does this laziness cost you,

387
00:20:58,260 --> 00:20:59,215
at least on average?

388
00:20:59,215 --> 00:21:01,265
So if you look at the server,
what's the expected load?

389
00:21:02,805 --> 00:21:05,695
So let's proceed to the solution,
the answer of this question.

390
00:21:05,695 --> 00:21:08,815
So before you start talking about
expectations one has to be clear

391
00:21:08,815 --> 00:21:12,115
about the sample space and what are the
probabilities of the various outcomes.

392
00:21:13,725 --> 00:21:16,755
So remember the sample space
omega just denotes every possible

393
00:21:16,755 --> 00:21:17,305
that could happened.

394
00:21:17,305 --> 00:21:21,780
So what are we doing for each process
we're assigning to a random server, so

395
00:21:21,780 --> 00:21:25,039
all of the things that can happen are all
of the different assignments of these n

396
00:21:25,039 --> 00:21:27,260
processes to these n servers.

397
00:21:27,260 --> 00:21:30,530
And if you think about is
there are n raised to the n

398
00:21:30,530 --> 00:21:34,600
possible outcomes cause you have n
choices for each of the n processes.

399
00:21:36,430 --> 00:21:41,776
Moreover, because each process is assigned
to one of the servers uniformly at random,

400
00:21:41,776 --> 00:21:45,098
each of these n to the n
assignments is equally likely,

401
00:21:45,098 --> 00:21:46,994
probability 1 over n to the n.

402
00:21:46,994 --> 00:21:50,810
Now that we have a sample space, we're in
a position to define a random variable.

403
00:21:50,810 --> 00:21:52,440
And we already know what
random variable we care about,

404
00:21:52,440 --> 00:21:54,470
we care about the average
load of the server.

405
00:21:54,470 --> 00:21:56,570
Now all of the servers
are exactly the same,

406
00:21:56,570 --> 00:21:58,090
so we just have to focus on one server,

407
00:21:58,090 --> 00:22:02,050
let's say the first server, and look at
the number of processes assigned to it.

408
00:22:03,080 --> 00:22:06,220
And if you go back to the problem
statement, what we're asked, is to compute

409
00:22:06,220 --> 00:22:09,950
the expected value of Y, the expected
number of processes assigned to a server.

410
00:22:11,450 --> 00:22:15,950
Now of course, in principle, we could
go to the definition of expectation and

411
00:22:15,950 --> 00:22:18,620
just compute by brute force the sum

412
00:22:18,620 --> 00:22:22,780
over all possible outcomes of
the value of y and take the average.

413
00:22:22,780 --> 00:22:26,240
Unfortunately, there are n to the n
different outcomes, and that's a lot.

414
00:22:26,240 --> 00:22:29,930
So what could we do other than
this brute force computation?

415
00:22:29,930 --> 00:22:34,170
Well recall our example of linearity
of expectation in the sum of two dice.

416
00:22:34,170 --> 00:22:38,490
We observe that instead of computing the
sum by enumerating over all 36 outcomes,

417
00:22:38,490 --> 00:22:40,880
it was much better to just
focus on a single die,

418
00:22:40,880 --> 00:22:45,460
compute its expectation and then
conclude with linearity of expectation.

419
00:22:45,460 --> 00:22:46,630
So we'll do the same thing here.

420
00:22:46,630 --> 00:22:51,770
Instead of focusing on the sum y,
we'll focus on constituent parts of y.

421
00:22:51,770 --> 00:22:55,740
So whether or not a single process
gets assigned to the first server.

422
00:22:55,740 --> 00:22:58,050
And then we'll get away with that
by linearity of expectation.

423
00:22:59,660 --> 00:23:05,240
So more precisely, for a given process j
let's define xj to be, whether to be 1,

424
00:23:05,240 --> 00:23:10,500
if and only if the jth process gets
assigned to the first server 0 otherwise.

425
00:23:12,120 --> 00:23:17,060
Zero, one random variables like xj are
often called indicator random variables.

426
00:23:17,060 --> 00:23:21,350
That's because they, in effect, indicate
whether or not a certain event occurs.

427
00:23:21,350 --> 00:23:26,790
In this case, whether or not the jth
process gets assigned to the first server.

428
00:23:26,790 --> 00:23:28,250
Why did I make this definition?

429
00:23:28,250 --> 00:23:33,340
Well, observe that the total number
of processes that gets assigned

430
00:23:33,340 --> 00:23:38,080
to the first server is simply the sum,
j equal 1 to n of xj,

431
00:23:39,630 --> 00:23:43,210
xj says whether or not a given process,
the jth process, is on the first server.

432
00:23:43,210 --> 00:23:47,770
The total number is just
the sum of these over all j.

433
00:23:47,770 --> 00:23:51,646
Now, the benefit from this maneuver is we
only have to compute the expectation of

434
00:23:51,646 --> 00:23:54,570
a extremely simple indicator
random variable xj.

435
00:23:54,570 --> 00:23:58,130
This is like the win that we got
when we were summing up two dice,

436
00:23:58,130 --> 00:24:00,990
by instead of having to compute the sum,
the expected value of the sum,

437
00:24:00,990 --> 00:24:04,750
we just had to focus on the expectation
of a single die, that was really easy.

438
00:24:04,750 --> 00:24:08,710
Similarly here, the expectation
of a single xj is really easy.

439
00:24:08,710 --> 00:24:12,560
Specifically, let's write it out just
using the definition of the expectation.

440
00:24:12,560 --> 00:24:15,530
So the expected value
of an xjis well let's

441
00:24:15,530 --> 00:24:19,310
group together all the outcomes in
which it takes on the value zero.

442
00:24:19,310 --> 00:24:23,560
So the contribution of the expectation
is zero for all of those outcomes, and

443
00:24:23,560 --> 00:24:27,570
then there's the rest of the outcomes
where xj takes on the value one and

444
00:24:27,570 --> 00:24:30,060
in those cases it contributes
one to the expectation.

445
00:24:31,370 --> 00:24:34,760
Now obviously we get some happy
cancellation happening here

446
00:24:34,760 --> 00:24:35,710
with the zero part.

447
00:24:37,160 --> 00:24:40,040
And all we have to worry about
is the probability that xj takes

448
00:24:40,040 --> 00:24:40,960
on the value one.

449
00:24:40,960 --> 00:24:43,040
Okay what was xj again,
how did we define it?

450
00:24:43,040 --> 00:24:44,860
Remember it's the events that,

451
00:24:44,860 --> 00:24:49,890
it's 1 exactly when the jth process
gets assigned to the first server.

452
00:24:49,890 --> 00:24:51,374
How are processes assigned?

453
00:24:51,374 --> 00:24:55,347
Well remember the proposed solution
assigned to each process to each of the n

454
00:24:55,347 --> 00:24:58,219
servers, equally likely
with uniform probability.

455
00:24:58,219 --> 00:25:02,530
So the probability of the jth process is
assigned to the first service is 1 over n.

456
00:25:04,070 --> 00:25:10,800
So this leaves us with just the sum
from j equal 1 to n of 1 over n.

457
00:25:10,800 --> 00:25:14,310
That is we just sum up 1
over n with itself n times,

458
00:25:14,310 --> 00:25:17,500
this of course is equal to 1.

459
00:25:17,500 --> 00:25:20,440
So at the end of the day what we
find is that the expected number of

460
00:25:20,440 --> 00:25:24,025
processes assigned to a given server
say the first server is just 1.

461
00:25:25,330 --> 00:25:29,460
So at least if we only care about
averages, we lose very little from this

462
00:25:29,460 --> 00:25:32,710
trivial process of randomly
spraying the process to the server.

463
00:25:32,710 --> 00:25:36,130
On average, any given server
has just one process on it.

464
00:25:36,130 --> 00:25:39,030
This is characteristic of the role
that randomization plays in algorithm

465
00:25:39,030 --> 00:25:41,150
design in computer science more generally.

466
00:25:41,150 --> 00:25:45,140
Often we can get away with really simple
heuristics just by making random choices.

467
00:25:45,140 --> 00:25:48,605
Of course, quicksort is one example
of that where we get an extremely,

468
00:25:48,605 --> 00:25:51,729
prevalently used practically
sorting algorithm just by making

469
00:25:51,729 --> 00:25:54,290
it randomly chosen pivets
in every recursive call.