1
00:00:00,090 --> 00:00:01,240
In the last video, we talked

2
00:00:01,560 --> 00:00:03,660
about the Gaussian distribution. In

3
00:00:03,810 --> 00:00:05,350
this video lets apply that

4
00:00:05,440 --> 00:00:07,300
to develop an anomaly detection algorithm.

5
00:00:10,360 --> 00:00:11,690
Let's say that we have an

6
00:00:11,840 --> 00:00:13,390
unlabeled training set of M

7
00:00:13,650 --> 00:00:15,410
examples, and each of

8
00:00:15,470 --> 00:00:16,730
these examples is going to

9
00:00:16,760 --> 00:00:18,350
be a feature in Rn so

10
00:00:18,440 --> 00:00:19,420
your training set could be,

11
00:00:20,540 --> 00:00:21,860
feature vectors from the last

12
00:00:22,730 --> 00:00:24,150
M aircraft engines being manufactured.

13
00:00:24,960 --> 00:00:26,730
Or it could be features from m

14
00:00:27,070 --> 00:00:28,290
users or something else.

15
00:00:29,320 --> 00:00:30,460
The way we are going to address

16
00:00:30,840 --> 00:00:32,310
anomaly detection, is we are

17
00:00:32,350 --> 00:00:33,480
going to model p of x

18
00:00:33,860 --> 00:00:35,640
from the data sets.

19
00:00:36,240 --> 00:00:38,530
We're going to try to figure out what are high probability features, what

20
00:00:38,860 --> 00:00:40,620
are lower probability types of features.

21
00:00:41,350 --> 00:00:42,810
So, x is a

22
00:00:43,090 --> 00:00:44,900
vector and what we

23
00:00:45,320 --> 00:00:46,580
are going to do is model p of

24
00:00:47,020 --> 00:00:48,870
x, as probability of x1,

25
00:00:49,440 --> 00:00:50,390
that is of the first component

26
00:00:50,950 --> 00:00:53,180
of x, times the probability

27
00:00:53,990 --> 00:00:54,960
of x2, that is the probability

28
00:00:55,510 --> 00:00:57,350
of the second feature, times the

29
00:00:57,450 --> 00:00:58,860
probability of the third

30
00:00:59,090 --> 00:01:01,230
feature, and so on up

31
00:01:01,410 --> 00:01:03,290
to the probability of the final feature

32
00:01:03,760 --> 00:01:03,930
of Xn.

33
00:01:04,200 --> 00:01:06,320
Now I'm leaving space here cause I'll fill in something in a minute.

34
00:01:08,780 --> 00:01:09,720
So, how do we

35
00:01:09,830 --> 00:01:10,960
model each of these terms,

36
00:01:11,460 --> 00:01:13,020
p of X1, p of X2, and so on.

37
00:01:14,080 --> 00:01:15,380
What we're going to do,

38
00:01:15,680 --> 00:01:16,860
is assume that the feature,

39
00:01:17,480 --> 00:01:19,830
X1, is distributed according

40
00:01:20,340 --> 00:01:22,950
to a Gaussian distribution, with

41
00:01:23,160 --> 00:01:25,140
some mean, which you

42
00:01:25,340 --> 00:01:25,850
want to write as mu1 and

43
00:01:25,920 --> 00:01:26,900
some variance, which I'm going

44
00:01:26,990 --> 00:01:28,560
to write as sigma squared 1,

45
00:01:29,890 --> 00:01:30,690
and so p of X1 is

46
00:01:30,820 --> 00:01:32,020
going to be a Gaussian

47
00:01:32,350 --> 00:01:34,410
probability distribution, with mean

48
00:01:34,610 --> 00:01:37,580
mu1 and variance sigma squared 1.

49
00:01:38,230 --> 00:01:39,660
And similarly I'm

50
00:01:39,720 --> 00:01:40,570
going to assume that X2

51
00:01:40,760 --> 00:01:42,220
is distributed, Gaussian,

52
00:01:42,870 --> 00:01:44,620
that's what this little tilda stands for,

53
00:01:44,800 --> 00:01:47,220
that means distributed Gaussian

54
00:01:47,740 --> 00:01:49,490
with mean mu2 and Sigma

55
00:01:49,830 --> 00:01:51,780
squared 2, so it's distributed according

56
00:01:52,170 --> 00:01:54,230
to a different Gaussian, which has

57
00:01:54,460 --> 00:01:58,010
a different set of parameters, mu2 sigma square 2.

58
00:01:58,120 --> 00:02:00,160
And similarly, you know,

59
00:02:00,360 --> 00:02:04,020
X3 is yet another

60
00:02:04,480 --> 00:02:06,590
Gaussian, so this

61
00:02:06,780 --> 00:02:09,100
can have a different mean and

62
00:02:09,300 --> 00:02:11,630
a different standard deviation than the

63
00:02:11,830 --> 00:02:15,370
other features, and so on, up to XN.

64
00:02:17,000 --> 00:02:17,740
And so that's my model.

65
00:02:19,010 --> 00:02:20,230
Just as a side comment for

66
00:02:20,370 --> 00:02:21,490
those of you that are experts in

67
00:02:21,890 --> 00:02:22,770
statistics, it turns out that

68
00:02:22,990 --> 00:02:23,850
this equation that I just

69
00:02:24,250 --> 00:02:25,590
wrote out actually corresponds to an

70
00:02:25,750 --> 00:02:27,490
independence assumption on the

71
00:02:28,060 --> 00:02:29,550
values of the features x1 through xn.

72
00:02:30,290 --> 00:02:31,520
But in practice it turns out

73
00:02:32,040 --> 00:02:34,010
that the algorithm of this fragment, it works just fine,

74
00:02:34,410 --> 00:02:36,330
whether or not these features are

75
00:02:36,610 --> 00:02:37,780
anywhere close to independent and

76
00:02:38,280 --> 00:02:39,810
even if independence assumption doesn't

77
00:02:40,240 --> 00:02:41,830
hold true this algorithm works just fine.

78
00:02:42,650 --> 00:02:45,870
But in case you don't know

79
00:02:45,970 --> 00:02:47,380
those terms I just used independence assumptions and so on,

80
00:02:47,830 --> 00:02:48,460
don't worry about it.

81
00:02:49,170 --> 00:02:50,840
You'll be able to understand

82
00:02:51,360 --> 00:02:52,690
it and implement this algorithm just fine

83
00:02:53,250 --> 00:02:55,310
and that comment was really meant only for the experts in statistics.

84
00:02:57,790 --> 00:02:58,880
Finally, in order to

85
00:02:59,210 --> 00:03:00,320
wrap this up, let me

86
00:03:00,590 --> 00:03:04,680
take this expression and write it a little bit more compactly.

87
00:03:05,120 --> 00:03:06,200
So, we're going to

88
00:03:06,310 --> 00:03:07,500
write this is a product

89
00:03:07,740 --> 00:03:09,520
from J equals one

90
00:03:10,230 --> 00:03:11,840
through N, of P

91
00:03:12,140 --> 00:03:15,350
of XJ parameterized by

92
00:03:16,020 --> 00:03:17,930
mu j comma sigma squared

93
00:03:19,500 --> 00:03:21,500
j. So this funny

94
00:03:21,790 --> 00:03:23,330
symbol here, there is

95
00:03:23,780 --> 00:03:25,220
capital Greek alphabet pi,

96
00:03:25,490 --> 00:03:27,600
that funny symbol there corresponds to

97
00:03:28,190 --> 00:03:29,980
taking the product of a set of values.

98
00:03:30,590 --> 00:03:32,290
And so, you're familiar with

99
00:03:32,400 --> 00:03:33,930
the summation notation, so the

100
00:03:34,520 --> 00:03:36,460
sum from i equals one through

101
00:03:36,930 --> 00:03:39,070
n, of i. This

102
00:03:39,960 --> 00:03:41,820
means 1 + 2 + 3 plus

103
00:03:42,230 --> 00:03:43,730
dot dot dot, up to

104
00:03:43,910 --> 00:03:45,350
n. Where as this

105
00:03:45,660 --> 00:03:46,910
funny symbol here, this product

106
00:03:47,390 --> 00:03:48,630
symbol, right product from i

107
00:03:48,840 --> 00:03:50,310
equals 1 through n

108
00:03:50,620 --> 00:03:52,210
of i.  Then this

109
00:03:52,520 --> 00:03:54,530
means that, it's just like summation except that we're now multiplying.

110
00:03:55,200 --> 00:03:56,680
This becomes 1 times

111
00:03:56,880 --> 00:03:58,700
2 times 3 times up

112
00:03:59,910 --> 00:04:01,330
to N. And so using

113
00:04:01,860 --> 00:04:03,430
this product notation, this

114
00:04:03,570 --> 00:04:05,880
product from j equals 1 through n of this expression.

115
00:04:06,620 --> 00:04:08,440
It's just more compact, it's

116
00:04:08,820 --> 00:04:09,960
just shorter way for writing

117
00:04:10,330 --> 00:04:12,810
out this product of

118
00:04:13,120 --> 00:04:14,400
of all of these terms up there.

119
00:04:15,200 --> 00:04:16,200
Since we're are taking these p

120
00:04:16,430 --> 00:04:17,510
of x j given mu j

121
00:04:17,730 --> 00:04:18,740
comma sigma squared j terms

122
00:04:19,130 --> 00:04:20,290
and multiplying them together.

123
00:04:21,540 --> 00:04:22,830
And, by the way the problem

124
00:04:23,250 --> 00:04:25,370
of estimating this distribution

125
00:04:25,990 --> 00:04:27,130
p of x, they're sometimes called

126
00:04:28,280 --> 00:04:29,540
the problem of density estimation.

127
00:04:30,420 --> 00:04:31,270
Hence the title of the slide.

128
00:04:33,800 --> 00:04:35,310
So putting everything together, here

129
00:04:35,500 --> 00:04:36,920
is our anomaly detection algorithm.

130
00:04:38,120 --> 00:04:40,290
The first step is to choose

131
00:04:40,650 --> 00:04:41,600
features, or come up with

132
00:04:41,700 --> 00:04:42,740
features xi that we think

133
00:04:43,040 --> 00:04:45,340
might be indicative of anomalous examples.

134
00:04:46,050 --> 00:04:47,020
So what I mean by that,

135
00:04:47,240 --> 00:04:48,490
is, try to come

136
00:04:48,680 --> 00:04:49,990
up with features, so that when there's

137
00:04:50,280 --> 00:04:51,630
an unusual user in your

138
00:04:52,190 --> 00:04:53,000
system that may be doing

139
00:04:53,190 --> 00:04:54,790
fraudulent things, or when the

140
00:04:55,020 --> 00:04:56,670
aircraft engine examples, you know

141
00:04:56,760 --> 00:04:59,500
there's something funny, something strange about one of the aircraft engines.

142
00:05:00,280 --> 00:05:01,230
Choose features X I, that

143
00:05:02,000 --> 00:05:03,330
you think might take on unusually

144
00:05:04,410 --> 00:05:05,860
large values, or unusually

145
00:05:06,020 --> 00:05:08,750
small values, for what an

146
00:05:08,880 --> 00:05:10,160
anomalous example might look like.

147
00:05:10,910 --> 00:05:12,440
But more generally, just try

148
00:05:12,690 --> 00:05:14,340
to choose features that describe general

149
00:05:16,160 --> 00:05:19,380
properties of the things that you're collecting data on.

150
00:05:20,030 --> 00:05:21,360
Next, given a training set,

151
00:05:22,020 --> 00:05:23,980
of M, unlabled examples,

152
00:05:25,000 --> 00:05:26,980
X1 through X M, we

153
00:05:27,170 --> 00:05:28,580
then fit the parameters,

154
00:05:29,090 --> 00:05:30,170
mu 1 through mu n, and

155
00:05:30,340 --> 00:05:31,480
sigma squared 1 through sigma

156
00:05:31,690 --> 00:05:33,460
squared n, and so these

157
00:05:33,840 --> 00:05:34,810
were the formulas similar to

158
00:05:34,840 --> 00:05:36,420
the formulas we have

159
00:05:36,680 --> 00:05:37,610
in the previous video, that we're

160
00:05:37,740 --> 00:05:39,180
going to use the estimate

161
00:05:39,310 --> 00:05:41,120
each of these parameters, and just to give

162
00:05:42,030 --> 00:05:43,670
some interpretation, mu J,

163
00:05:44,060 --> 00:05:47,830
that's my average value of the j feature.

164
00:05:48,720 --> 00:05:51,580
Mu j goes in this term p of xj.

165
00:05:52,440 --> 00:05:53,870
which is parametrized by mu J

166
00:05:54,220 --> 00:05:55,590
and sigma squared J. And

167
00:05:55,920 --> 00:05:57,890
so this says for the

168
00:05:58,360 --> 00:05:59,620
mu J just take the

169
00:05:59,700 --> 00:06:00,720
mean over my training

170
00:06:01,070 --> 00:06:02,930
set of the values of the j feature.

171
00:06:03,860 --> 00:06:05,100
And, just to mention, that you

172
00:06:05,220 --> 00:06:07,410
do this, you compute these

173
00:06:07,620 --> 00:06:08,830
formulas for j equals

174
00:06:09,420 --> 00:06:10,360
one through n. So use

175
00:06:10,700 --> 00:06:11,960
these formulas to estimate mu

176
00:06:12,230 --> 00:06:14,020
1, to estimate mu

177
00:06:14,070 --> 00:06:15,620
2, and so on up to

178
00:06:16,170 --> 00:06:17,460
mu n, and similarly for sigma

179
00:06:17,770 --> 00:06:19,060
squared, and it's also

180
00:06:19,390 --> 00:06:21,530
possible to come up with vectorized versions of these.

181
00:06:21,830 --> 00:06:22,900
So if you think of

182
00:06:23,000 --> 00:06:25,220
mu as a vector, so mu

183
00:06:25,920 --> 00:06:27,430
if is a vector there's mu 1,

184
00:06:27,760 --> 00:06:29,230
mu 2, down to mu

185
00:06:29,570 --> 00:06:31,180
n, then a vectorized

186
00:06:31,660 --> 00:06:33,510
version of that set

187
00:06:33,910 --> 00:06:35,530
of parameters can be written

188
00:06:36,440 --> 00:06:37,830
like so sum from 1

189
00:06:37,880 --> 00:06:39,610
equals one through n xi.

190
00:06:40,290 --> 00:06:41,290
So, this formula that I

191
00:06:41,410 --> 00:06:43,530
just wrote out estimates this

192
00:06:43,990 --> 00:06:45,160
xi as the feature vectors

193
00:06:45,660 --> 00:06:48,140
that estimates mu for all the values of n simultaneously.

194
00:06:49,140 --> 00:06:50,070
And it's also possible to come

195
00:06:50,430 --> 00:06:52,130
up with a vectorized formula for

196
00:06:52,290 --> 00:06:55,110
estimating sigma squared j. Finally,

197
00:06:56,500 --> 00:06:57,890
when you're given a new example, so

198
00:06:58,100 --> 00:06:59,270
when you have a new aircraft engine

199
00:06:59,740 --> 00:07:01,420
and you want to know is this aircraft engine anomalous.

200
00:07:02,470 --> 00:07:03,430
What we need to do is then

201
00:07:03,570 --> 00:07:05,610
compute p of x, what's the probability of this new example?

202
00:07:06,790 --> 00:07:07,670
So, p of x is equal

203
00:07:07,990 --> 00:07:09,990
to this product, and

204
00:07:10,100 --> 00:07:11,140
what you implement, what you compute,

205
00:07:11,750 --> 00:07:14,040
is this formula and

206
00:07:15,000 --> 00:07:16,610
where over here, this thing

207
00:07:16,840 --> 00:07:17,900
here this is just the

208
00:07:18,260 --> 00:07:19,250
formula for the Gaussian

209
00:07:19,800 --> 00:07:21,000
probability, so you compute

210
00:07:21,240 --> 00:07:22,880
this thing, and finally if

211
00:07:22,940 --> 00:07:24,420
this probability is very small,

212
00:07:24,860 --> 00:07:26,370
then you flag this thing as an anomaly.

213
00:07:27,570 --> 00:07:29,380
Here's an example of an application of this method.

214
00:07:30,870 --> 00:07:31,860
Let's say we have this data

215
00:07:32,210 --> 00:07:35,430
set plotted on the upper left of this slide.

216
00:07:36,670 --> 00:07:38,860
if you look at this, well, lets look the feature of x1.

217
00:07:39,610 --> 00:07:40,640
If you look at this data set, it

218
00:07:40,750 --> 00:07:42,600
looks like on average, the features

219
00:07:42,950 --> 00:07:44,330
x1 has a mean of about 5

220
00:07:45,540 --> 00:07:47,420
and the standard deviation, if

221
00:07:47,590 --> 00:07:48,660
you only look at just the x1

222
00:07:49,010 --> 00:07:50,030
values of this data set

223
00:07:50,310 --> 00:07:51,720
has the standard deviation of maybe 2.

224
00:07:52,370 --> 00:07:55,110
So that sigma 1 and

225
00:07:55,460 --> 00:07:57,380
looks like x2 the

226
00:07:57,670 --> 00:07:59,070
values of the features as

227
00:07:59,250 --> 00:08:00,370
measured on the vertical axis,

228
00:08:00,840 --> 00:08:01,730
looks like it has an average

229
00:08:02,010 --> 00:08:03,110
value of about 3, and

230
00:08:03,380 --> 00:08:05,750
a standard deviation of about 1. So if

231
00:08:05,880 --> 00:08:06,940
you take this data set and if

232
00:08:07,010 --> 00:08:08,690
you estimate mu1, mu2, sigma1,

233
00:08:09,030 --> 00:08:11,410
sigma2, this is what you get.

234
00:08:11,610 --> 00:08:12,930
And again, I'm writing sigma here,

235
00:08:13,140 --> 00:08:14,620
I'm think about standard deviations, but

236
00:08:15,100 --> 00:08:16,240
the formula on the previous 5

237
00:08:16,280 --> 00:08:17,640
actually gave the estimates of the squares

238
00:08:18,120 --> 00:08:20,670
of theses things, so sigma squared 1 and sigma squared 2.

239
00:08:20,940 --> 00:08:21,920
So, just be careful whether

240
00:08:22,090 --> 00:08:23,260
you are using sigma 1, sigma

241
00:08:23,380 --> 00:08:25,490
2, or sigma squared 1 or sigma squared 2.

242
00:08:25,960 --> 00:08:26,700
So, sigma squared 1 of course

243
00:08:26,820 --> 00:08:28,500
would be equal to 4, for

244
00:08:31,130 --> 00:08:32,260
example, as the square of 2.

245
00:08:32,310 --> 00:08:34,010
And in pictures what p of

246
00:08:34,180 --> 00:08:35,550
x1 parametrized by mu1 and

247
00:08:35,660 --> 00:08:36,830
sigma squared 1 and p

248
00:08:37,120 --> 00:08:38,130
of x2, parametrized by mu

249
00:08:38,230 --> 00:08:39,050
2 and sigma squared 2, that

250
00:08:39,190 --> 00:08:41,360
would look like these two distributions over here.

251
00:08:42,650 --> 00:08:44,280
And, turns out that

252
00:08:44,480 --> 00:08:45,960
if were to plot of p

253
00:08:46,210 --> 00:08:47,540
of x, right, which

254
00:08:47,710 --> 00:08:49,000
is the product of these two

255
00:08:49,210 --> 00:08:50,450
things, you can actually get

256
00:08:50,800 --> 00:08:52,770
a surface plot that looks like this.

257
00:08:53,360 --> 00:08:54,370
This is a plot of p

258
00:08:54,640 --> 00:08:55,920
of x, where the height

259
00:08:56,390 --> 00:08:57,730
above of this, where the

260
00:08:57,830 --> 00:08:58,950
height of this surface at

261
00:08:58,990 --> 00:09:01,360
a particular point, so given a

262
00:09:01,470 --> 00:09:03,670
particular x1 x2

263
00:09:03,930 --> 00:09:05,640
values of x2 if

264
00:09:05,800 --> 00:09:07,830
x1 equals 2, x equal 2, that's this point.

265
00:09:08,510 --> 00:09:09,450
And the height of this 3-D

266
00:09:09,710 --> 00:09:11,280
surface here, that's p

267
00:09:13,020 --> 00:09:14,420
of x. So p of x, that is the height

268
00:09:14,710 --> 00:09:16,220
of this plot, is

269
00:09:16,340 --> 00:09:17,520
literally just p of x1

270
00:09:18,640 --> 00:09:20,010
parametrized by mu 1 sigma

271
00:09:20,290 --> 00:09:22,540
squared 1, times p

272
00:09:23,200 --> 00:09:25,050
of x2 parametrized by

273
00:09:25,120 --> 00:09:27,530
mu 2 sigma squared 2.

274
00:09:27,720 --> 00:09:29,180
Now, so this is

275
00:09:29,320 --> 00:09:31,400
how we fit the parameters to this data.

276
00:09:31,930 --> 00:09:32,950
Let's see if we have a couple of new examples.

277
00:09:33,530 --> 00:09:35,090
Maybe I have a new example there.

278
00:09:36,700 --> 00:09:38,340
Is this an anomaly or not?

279
00:09:38,550 --> 00:09:39,220
Or, maybe I have a different

280
00:09:39,570 --> 00:09:41,860
example, maybe I have a different second example over there.

281
00:09:42,140 --> 00:09:43,400
So, is that an anomaly or not?

282
00:09:44,360 --> 00:09:47,050
They way we do that is, we

283
00:09:47,190 --> 00:09:48,470
would set some value for

284
00:09:48,620 --> 00:09:49,490
Epsilon, let's say I've chosen

285
00:09:50,020 --> 00:09:51,220
Epsilon equals 0.02.

286
00:09:51,980 --> 00:09:54,110
I'll say later how we choose Epsilon.

287
00:09:55,180 --> 00:09:56,110
But let's take this first

288
00:09:56,540 --> 00:09:57,360
example, let me call this

289
00:09:57,500 --> 00:09:59,500
example X1 test.

290
00:10:00,200 --> 00:10:01,010
And let me call the second example

291
00:10:02,800 --> 00:10:03,900
X2 test.

292
00:10:04,780 --> 00:10:05,670
What we do is, we

293
00:10:05,820 --> 00:10:07,380
then compute p of

294
00:10:07,540 --> 00:10:08,740
X1 test, so we use

295
00:10:08,990 --> 00:10:10,400
this formula to compute it and

296
00:10:11,140 --> 00:10:12,760
this looks like a pretty large value.

297
00:10:13,250 --> 00:10:15,560
In particular, this is greater

298
00:10:15,920 --> 00:10:18,480
than, or greater than or equal to epsilon.

299
00:10:18,670 --> 00:10:19,670
And so this is a pretty

300
00:10:19,810 --> 00:10:21,290
high probability at least bigger

301
00:10:21,490 --> 00:10:22,510
than epsilon, so we'll say that

302
00:10:22,970 --> 00:10:24,490
X1 test is not an anomaly.

303
00:10:25,650 --> 00:10:27,370
Whereas, if you compute p of

304
00:10:27,440 --> 00:10:29,810
X2 test, well that is just a much smaller value.

305
00:10:30,170 --> 00:10:31,340
So this is less than

306
00:10:31,490 --> 00:10:32,490
epsilon and so we'll say

307
00:10:32,720 --> 00:10:34,400
that that is indeed an anomaly,

308
00:10:34,860 --> 00:10:37,350
because it is much smaller than that epsilon that we then chose.

309
00:10:38,450 --> 00:10:39,950
And in fact, I'd improve it here.

310
00:10:40,460 --> 00:10:43,340
What this is really saying is that, you look through the 3d surface plot.

311
00:10:44,660 --> 00:10:46,270
It's saying that all the

312
00:10:46,350 --> 00:10:47,940
values of x1 and x2

313
00:10:48,210 --> 00:10:50,570
that have a high height

314
00:10:50,810 --> 00:10:52,770
above the surface, corresponds to

315
00:10:52,910 --> 00:10:55,160
an a non-anomalous example of an OK or normal example.

316
00:10:55,970 --> 00:10:57,450
Whereas all the points far out

317
00:10:57,640 --> 00:10:58,940
here, all the points out

318
00:10:59,150 --> 00:11:00,460
here, all of those

319
00:11:00,580 --> 00:11:01,740
points have very low

320
00:11:01,910 --> 00:11:02,940
probability, so we are

321
00:11:03,020 --> 00:11:04,310
going to flag those points

322
00:11:04,620 --> 00:11:06,350
as anomalous, and so it's gonna define

323
00:11:06,760 --> 00:11:07,790
some region, that maybe looks

324
00:11:08,000 --> 00:11:09,480
like this, so that everything

325
00:11:09,810 --> 00:11:12,160
outside this, it flags

326
00:11:12,380 --> 00:11:12,580
as anomalous,

327
00:11:14,940 --> 00:11:16,260
whereas the things inside this

328
00:11:16,770 --> 00:11:18,340
ellipse I just drew, if it

329
00:11:18,570 --> 00:11:21,320
considers okay, or non-anomalous, not anomalous examples.

330
00:11:22,110 --> 00:11:24,040
And so this example x2

331
00:11:24,250 --> 00:11:26,260
test lies outside

332
00:11:26,650 --> 00:11:27,510
that region, and so it

333
00:11:27,620 --> 00:11:30,280
has very small probability, and so we consider it an anomalous example.

334
00:11:31,400 --> 00:11:32,990
In this video we talked about how to

335
00:11:33,460 --> 00:11:35,440
estimate p of x, the probability of

336
00:11:35,590 --> 00:11:36,840
x, for the purpose of

337
00:11:36,930 --> 00:11:38,740
developing an anomaly detection algorithm.

338
00:11:39,880 --> 00:11:40,890
And in this video, we also

339
00:11:41,260 --> 00:11:42,970
stepped through an entire process

340
00:11:43,830 --> 00:11:45,090
of giving data set, we

341
00:11:45,340 --> 00:11:47,740
have, fitting the parameters, doing parameter estimations.

342
00:11:48,370 --> 00:11:50,570
We get mu and sigma parameters, and

343
00:11:50,700 --> 00:11:52,180
then taking new examples and deciding

344
00:11:52,740 --> 00:11:54,110
if the new examples are anomalous or not.

345
00:11:55,490 --> 00:11:56,800
In the next few videos we

346
00:11:56,880 --> 00:11:58,580
will delve deeper into this algorithm,

347
00:11:58,980 --> 00:11:59,930
and talk a bit more

348
00:12:00,230 --> 00:12:02,310
about how to actually get this to work well.