1
00:00:00,120 --> 00:00:01,220
In the last video, we developed

2
00:00:01,850 --> 00:00:03,200
an anomaly detection algorithm.

3
00:00:04,150 --> 00:00:05,240
In this video, I like to

4
00:00:05,300 --> 00:00:06,870
talk about the process of how

5
00:00:07,090 --> 00:00:08,750
to go about developing a specific

6
00:00:09,060 --> 00:00:10,790
application of anomaly detection

7
00:00:11,410 --> 00:00:12,810
to a problem and in particular

8
00:00:13,470 --> 00:00:14,500
this will focus on the problem

9
00:00:15,090 --> 00:00:18,700
of how to evaluate an anomaly detection algorithm. In

10
00:00:18,880 --> 00:00:20,490
previous videos, we've already talked

11
00:00:20,800 --> 00:00:22,380
about the importance of real

12
00:00:22,570 --> 00:00:24,770
number evaluation and this captures the idea that

13
00:00:25,170 --> 00:00:26,810
when you're trying to develop

14
00:00:27,270 --> 00:00:28,460
a learning algorithm for a

15
00:00:28,690 --> 00:00:30,300
specific application, you need to

16
00:00:30,560 --> 00:00:31,540
often make a lot of choices

17
00:00:31,710 --> 00:00:34,410
like, you know, choosing what features to use and then so on.

18
00:00:35,010 --> 00:00:36,800
And making decisions about all

19
00:00:36,880 --> 00:00:38,540
of these choices is often much

20
00:00:38,780 --> 00:00:39,890
easier, and if you have

21
00:00:40,040 --> 00:00:41,330
a way to evaluate your learning

22
00:00:41,410 --> 00:00:43,190
algorithm that just gives you back a number.

23
00:00:44,200 --> 00:00:44,950
So if you're trying to decide,

24
00:00:45,980 --> 00:00:47,130
you know, I have an idea for

25
00:00:47,220 --> 00:00:49,730
one extra feature, do I include this feature or not.

26
00:00:50,560 --> 00:00:51,560
If you can run the algorithm

27
00:00:51,760 --> 00:00:52,830
with the feature, and run the

28
00:00:52,960 --> 00:00:54,420
algorithm without the feature, and

29
00:00:54,570 --> 00:00:55,960
just get back a number that

30
00:00:56,100 --> 00:00:57,350
tells you, you know, did

31
00:00:57,460 --> 00:01:00,070
it improve or worsen performance to add this feature?

32
00:01:00,670 --> 00:01:01,480
Then it gives you a much better

33
00:01:01,670 --> 00:01:04,370
way, a much simpler way, with which

34
00:01:04,590 --> 00:01:06,110
to decide whether or not to include that feature.

35
00:01:07,570 --> 00:01:09,010
So in order to be

36
00:01:09,200 --> 00:01:10,850
able to develop an anomaly

37
00:01:11,410 --> 00:01:13,880
detection system quickly, it would

38
00:01:14,080 --> 00:01:14,960
be a really helpful to have

39
00:01:15,150 --> 00:01:17,820
a way of evaluating an anomaly detection system.

40
00:01:19,260 --> 00:01:20,420
In order to do this,

41
00:01:20,790 --> 00:01:22,380
in order to evaluate an anomaly

42
00:01:22,730 --> 00:01:24,080
detection system, we're

43
00:01:24,310 --> 00:01:26,380
actually going to assume have some labeled data.

44
00:01:27,270 --> 00:01:28,270
So, so far, we'll be treating

45
00:01:28,420 --> 00:01:29,870
anomaly detection as an

46
00:01:30,310 --> 00:01:31,770
unsupervised learning problem, using

47
00:01:32,210 --> 00:01:33,560
unlabeled data.

48
00:01:34,010 --> 00:01:35,190
But if you have some labeled

49
00:01:35,560 --> 00:01:37,390
data that specifies what

50
00:01:37,700 --> 00:01:39,570
are some anomalous examples, and

51
00:01:39,670 --> 00:01:42,030
what are some non-anomalous examples, then

52
00:01:42,470 --> 00:01:43,350
this is how we actually

53
00:01:43,630 --> 00:01:45,670
think of as the standard way of evaluating an anomaly detection algorithm.

54
00:01:45,820 --> 00:01:49,020
So taking the

55
00:01:49,300 --> 00:01:50,580
aircraft engine example again.

56
00:01:51,010 --> 00:01:52,680
Let's say that, you know, we have some

57
00:01:53,070 --> 00:01:55,840
label data of just a few anomalous

58
00:01:56,330 --> 00:01:57,890
examples of some aircraft engines

59
00:01:58,400 --> 00:02:00,780
that were manufactured in the past that turns out to be anomalous.

60
00:02:01,520 --> 00:02:02,360
Turned out to be flawed or strange in some way.

61
00:02:02,400 --> 00:02:04,130
Let's say we

62
00:02:04,360 --> 00:02:05,750
use we also have some non-anomalous

63
00:02:06,100 --> 00:02:07,810
examples, so some

64
00:02:08,050 --> 00:02:10,200
perfectly okay examples.

65
00:02:10,940 --> 00:02:12,050
I'm going to use y equals 0

66
00:02:12,110 --> 00:02:13,600
to denote the normal or the

67
00:02:13,790 --> 00:02:15,470
non-anomalous example and

68
00:02:15,530 --> 00:02:21,450
y equals 1 to denote the anomalous examples.

69
00:02:22,450 --> 00:02:24,670
The process of developing and evaluating an anomaly

70
00:02:25,130 --> 00:02:26,450
detection algorithm is as follows.

71
00:02:27,500 --> 00:02:28,300
We're going to think of it as

72
00:02:28,560 --> 00:02:29,830
a training set and talk

73
00:02:30,000 --> 00:02:31,310
about the cross validation in test

74
00:02:31,440 --> 00:02:32,580
sets later, but the training set we usually

75
00:02:33,280 --> 00:02:34,000
think of this as still the unlabeled

76
00:02:35,040 --> 00:02:36,180
training set.

77
00:02:36,510 --> 00:02:37,250
And so this is our large

78
00:02:37,560 --> 00:02:39,580
collection of normal, non-anomalous

79
00:02:40,190 --> 00:02:41,130
or not anomalous examples.

80
00:02:42,400 --> 00:02:43,530
And usually we think

81
00:02:43,690 --> 00:02:44,750
of this as being as non-anomalous,

82
00:02:45,010 --> 00:02:46,490
but it's actually okay even

83
00:02:46,740 --> 00:02:48,660
if a few anomalies slip into

84
00:02:48,660 --> 00:02:51,240
your unlabeled training set.

85
00:02:51,420 --> 00:02:52,100
And next we are going to

86
00:02:52,310 --> 00:02:53,830
define a cross validation set

87
00:02:54,100 --> 00:02:55,510
and a test set, with which

88
00:02:55,750 --> 00:02:58,360
to evaluate a particular anomaly detection algorithm.

89
00:02:59,230 --> 00:03:00,850
So, specifically, for both the

90
00:03:01,000 --> 00:03:01,960
cross validation test sets we're

91
00:03:02,080 --> 00:03:03,590
going to assume that, you know, we

92
00:03:03,800 --> 00:03:05,030
can include a few examples

93
00:03:05,670 --> 00:03:06,720
in the cross validation set and

94
00:03:06,900 --> 00:03:08,150
the test set that contain examples

95
00:03:08,910 --> 00:03:09,660
that are known to be anomalous.

96
00:03:10,200 --> 00:03:11,410
So the test sets say

97
00:03:11,950 --> 00:03:13,270
we have a few examples with

98
00:03:13,340 --> 00:03:14,770
y equals 1 that

99
00:03:15,040 --> 00:03:17,470
correspond to anomalous aircraft engines.

100
00:03:18,640 --> 00:03:19,800
So here's a specific example.

101
00:03:20,930 --> 00:03:23,120
Let's say that, altogether, this

102
00:03:23,280 --> 00:03:24,990
is the data that we have.

103
00:03:25,260 --> 00:03:27,410
We have manufactured 10,000 examples

104
00:03:28,130 --> 00:03:29,140
of engines that, as far

105
00:03:29,450 --> 00:03:30,740
as we know we're perfectly normal,

106
00:03:31,220 --> 00:03:33,110
perfectly good aircraft engines.

107
00:03:34,060 --> 00:03:35,240
And again, it turns out to be okay even

108
00:03:35,560 --> 00:03:37,310
if a few flawed engine

109
00:03:37,740 --> 00:03:39,400
slips into the set of

110
00:03:39,550 --> 00:03:40,860
10,000 is actually okay, but

111
00:03:40,970 --> 00:03:41,970
we kind of assumed that the vast

112
00:03:42,410 --> 00:03:44,300
majority of these

113
00:03:44,500 --> 00:03:47,660
10,000 examples are, you know, good and normal non-anomalous engines.

114
00:03:48,480 --> 00:03:50,940
And let's say that, you know, historically, however

115
00:03:51,200 --> 00:03:52,120
long we've been running on manufacturing

116
00:03:52,650 --> 00:03:54,130
plant, let's say that

117
00:03:54,480 --> 00:03:55,930
we end up getting features,

118
00:03:56,440 --> 00:03:57,970
getting 24 to 28

119
00:03:58,240 --> 00:04:00,180
anomalous engines as well.

120
00:04:01,120 --> 00:04:03,030
And for a pretty typical application of

121
00:04:03,310 --> 00:04:05,490
anomaly detection, you know, the number non-anomalous

122
00:04:06,740 --> 00:04:08,090
examples, that is with y equals

123
00:04:08,760 --> 00:04:10,650
1, we may have anywhere from, you know, 20 to 50.

124
00:04:10,820 --> 00:04:12,920
It would be a pretty typical

125
00:04:13,360 --> 00:04:14,570
range of examples, number of

126
00:04:14,830 --> 00:04:16,710
examples that we have with y equals 1.

127
00:04:16,910 --> 00:04:17,730
And usually we will have a

128
00:04:17,860 --> 00:04:20,000
much larger number of good examples.

129
00:04:21,810 --> 00:04:23,150
So, given this data set,

130
00:04:24,180 --> 00:04:25,410
a fairly typical way to split

131
00:04:25,850 --> 00:04:27,150
it into the training set,

132
00:04:27,430 --> 00:04:29,210
cross validation set and test set would be as follows.

133
00:04:30,390 --> 00:04:31,880
Let's take 10,000 good aircraft

134
00:04:32,360 --> 00:04:34,060
engines and put 6,000

135
00:04:34,260 --> 00:04:37,100
of that into the unlabeled training set.

136
00:04:37,620 --> 00:04:38,800
So, I'm calling this an unlabeled training

137
00:04:39,130 --> 00:04:40,050
set but all of these examples

138
00:04:40,640 --> 00:04:42,510
are really ones that correspond to

139
00:04:42,810 --> 00:04:44,380
y equals 0, as far as we know.

140
00:04:45,300 --> 00:04:46,350
And so, we will use this to

141
00:04:46,520 --> 00:04:48,840
fit p of x, right.

142
00:04:49,150 --> 00:04:49,850
So, we will use these 6000 engines

143
00:04:50,350 --> 00:04:51,180
to fit p of x, which

144
00:04:51,360 --> 00:04:52,190
is that p of x

145
00:04:52,420 --> 00:04:53,930
one parametrized by Mu

146
00:04:54,330 --> 00:04:56,380
1, sigma squared 1, up

147
00:04:56,540 --> 00:04:57,700
to p of Xn parametrized

148
00:04:58,370 --> 00:04:59,570
by Mu N sigma squared

149
00:05:00,790 --> 00:05:02,300
n. And so it would be these

150
00:05:02,500 --> 00:05:03,930
6,000 examples that we would

151
00:05:04,110 --> 00:05:05,370
use to estimate the parameters

152
00:05:05,590 --> 00:05:06,760
Mu 1, sigma squared 1,

153
00:05:07,140 --> 00:05:08,960
up to Mu N, sigma

154
00:05:09,200 --> 00:05:10,280
squared N. And so that's our training

155
00:05:10,500 --> 00:05:11,960
set of all, you know,

156
00:05:12,150 --> 00:05:13,980
good, or the vast majority of good examples.

157
00:05:15,430 --> 00:05:16,950
Next we will take our good

158
00:05:17,140 --> 00:05:18,380
aircraft engines and put some

159
00:05:18,660 --> 00:05:19,470
number of them in a cross

160
00:05:19,580 --> 00:05:21,320
validation set plus some number

161
00:05:21,570 --> 00:05:22,970
of them in the test sets.

162
00:05:23,280 --> 00:05:24,300
So 6,000 plus 2,000 plus 2,000,

163
00:05:24,480 --> 00:05:25,470
that's how we split up our

164
00:05:25,740 --> 00:05:28,820
10,000 good aircraft engines.

165
00:05:29,260 --> 00:05:31,460
And then we also have 20

166
00:05:31,930 --> 00:05:33,380
flawed aircraft engines, and we'll

167
00:05:33,490 --> 00:05:34,890
take that and maybe split it

168
00:05:35,160 --> 00:05:36,100
up, you know, put ten of them in

169
00:05:36,200 --> 00:05:37,230
the cross validation set and put

170
00:05:37,370 --> 00:05:39,560
ten of them in the test sets.

171
00:05:39,850 --> 00:05:41,320
And in the next slide

172
00:05:41,660 --> 00:05:42,460
we will talk about how to

173
00:05:42,750 --> 00:05:43,800
actually use this to evaluate

174
00:05:44,520 --> 00:05:46,330
the anomaly detection algorithm.

175
00:05:48,130 --> 00:05:49,140
So what I have

176
00:05:49,220 --> 00:05:50,610
just described here is a you

177
00:05:50,790 --> 00:05:52,300
know probably the recommend a

178
00:05:52,440 --> 00:05:55,290
good way of splitting the labeled and unlabeled example.

179
00:05:55,820 --> 00:05:57,970
The good and the flawed aircraft engines.

180
00:05:58,480 --> 00:06:00,380
Where we use like

181
00:06:00,730 --> 00:06:01,650
a 60, 20, 20% split for

182
00:06:01,800 --> 00:06:03,350
the good engines and we take

183
00:06:03,570 --> 00:06:04,780
the flawed engines, and we

184
00:06:04,910 --> 00:06:05,750
put them just in the cross

185
00:06:05,870 --> 00:06:06,940
validation set, and just in

186
00:06:07,030 --> 00:06:09,200
the test set, then we'll see in the next slide why that's the case.

187
00:06:10,370 --> 00:06:12,080
Just as an aside, if you

188
00:06:12,360 --> 00:06:13,360
look at how people apply anomaly

189
00:06:13,750 --> 00:06:15,400
detection algorithms, sometimes you see

190
00:06:15,510 --> 00:06:16,980
other peoples' split the data differently as well.

191
00:06:17,460 --> 00:06:19,400
So, another alternative, this is really

192
00:06:19,660 --> 00:06:21,290
not a recommended alternative, but

193
00:06:21,470 --> 00:06:23,650
some people want to

194
00:06:23,790 --> 00:06:24,770
take off your 10,000 good engines, maybe put 6000

195
00:06:24,820 --> 00:06:26,020
of them in your training set

196
00:06:26,320 --> 00:06:27,130
and then put the same

197
00:06:27,650 --> 00:06:28,800
4000 in the cross validation

198
00:06:30,380 --> 00:06:31,020
set and the test set.

199
00:06:31,170 --> 00:06:32,030
And so, you know, we like to think of the cross

200
00:06:32,360 --> 00:06:33,340
validation set and the

201
00:06:33,400 --> 00:06:34,620
test set as being completely

202
00:06:35,280 --> 00:06:36,370
different data sets to each other.

203
00:06:37,690 --> 00:06:39,030
But you know, in anomaly detection, you know,

204
00:06:39,230 --> 00:06:40,360
for sometimes you see

205
00:06:40,600 --> 00:06:41,760
people, sort of, use the

206
00:06:42,070 --> 00:06:42,970
same set of good engines

207
00:06:43,370 --> 00:06:44,640
in the cross validation sets, and

208
00:06:44,710 --> 00:06:46,150
the test sets, and sometimes you

209
00:06:46,250 --> 00:06:48,070
see people use exactly the

210
00:06:48,180 --> 00:06:49,800
same sets of anomalous

211
00:06:50,880 --> 00:06:54,190
engines in the cross validation set and the test set.

212
00:06:54,590 --> 00:06:55,970
And so, all of these are considered, you know,

213
00:06:56,850 --> 00:06:59,030
less good practices and definitely less recommended.

214
00:07:00,250 --> 00:07:01,360
Certainly using the same

215
00:07:01,650 --> 00:07:02,530
data in the cross validation

216
00:07:03,200 --> 00:07:04,220
set and the test set, that

217
00:07:04,510 --> 00:07:06,400
is not considered a good machine learning practice.

218
00:07:07,180 --> 00:07:08,780
But, sometimes you see people do this too.

219
00:07:09,790 --> 00:07:11,330
So, given the training cross

220
00:07:11,550 --> 00:07:12,780
validation and test sets,

221
00:07:13,260 --> 00:07:15,220
here's how you evaluate or

222
00:07:15,370 --> 00:07:16,920
here is how you develop and evaluate an algorithm.

223
00:07:18,490 --> 00:07:19,510
First, we take the training sets

224
00:07:19,910 --> 00:07:20,750
and we fit the model p of

225
00:07:20,840 --> 00:07:21,860
x. So, we fit, you know, all these

226
00:07:22,210 --> 00:07:24,600
Gaussians to my m

227
00:07:25,060 --> 00:07:26,690
unlabeled examples of aircraft engines,

228
00:07:27,090 --> 00:07:28,140
and these, I am calling them

229
00:07:28,270 --> 00:07:29,560
unlabeled examples, but these are

230
00:07:29,660 --> 00:07:31,370
really examples that we're assuming

231
00:07:31,790 --> 00:07:33,390
our goods are the normal aircraft engines.

232
00:07:34,580 --> 00:07:36,510
Then imagine that your

233
00:07:36,650 --> 00:07:38,590
anomaly detection algorithm is actually making prediction.

234
00:07:39,030 --> 00:07:40,070
So, on the cross validation

235
00:07:40,630 --> 00:07:42,280
of the test set, given that,

236
00:07:42,610 --> 00:07:44,660
say, test example X, think

237
00:07:44,840 --> 00:07:46,490
of the algorithm as predicting that

238
00:07:46,730 --> 00:07:48,090
y is equal to 1, p

239
00:07:48,230 --> 00:07:49,320
of x is less than epsilon,

240
00:07:50,040 --> 00:07:51,740
we must be taking zero, if

241
00:07:52,280 --> 00:07:54,760
p of x is

242
00:07:54,950 --> 00:07:57,340
greater than or equal to epsilon.

243
00:07:58,390 --> 00:07:59,280
So, given x, it's trying to predict, what is

244
00:07:59,340 --> 00:08:00,270
the label, given y equals 1 corresponding

245
00:08:00,500 --> 00:08:01,470
to an anomaly or is

246
00:08:01,630 --> 00:08:06,380
it y equals 0 corresponding to a normal example?

247
00:08:07,290 --> 00:08:09,480
So given the training, cross validation, and test sets.

248
00:08:09,940 --> 00:08:11,080
How do you develop an algorithm?

249
00:08:11,480 --> 00:08:12,920
And more specifically, how do

250
00:08:13,010 --> 00:08:15,450
you evaluate an anomaly detection algorithm?

251
00:08:15,790 --> 00:08:17,470
Well, to this whole,

252
00:08:17,820 --> 00:08:19,410
the first step is to take

253
00:08:19,670 --> 00:08:21,130
the unlabeled training set, and

254
00:08:21,290 --> 00:08:23,520
to fit the model p of x lead training data.

255
00:08:23,990 --> 00:08:25,060
So you take this, you know

256
00:08:25,130 --> 00:08:26,620
on I'm coming, unlabeled training set,

257
00:08:26,910 --> 00:08:28,390
but really, these are examples

258
00:08:28,870 --> 00:08:30,290
that we are assuming, vast majority

259
00:08:30,750 --> 00:08:32,400
of which are normal aircraft engines,

260
00:08:32,900 --> 00:08:34,020
not because they're not anomalies

261
00:08:34,150 --> 00:08:35,380
and it will

262
00:08:35,490 --> 00:08:36,470
fit the model p of x. It

263
00:08:36,640 --> 00:08:38,110
will fit all those parameters for all

264
00:08:38,240 --> 00:08:40,330
the Gaussians on this data.

265
00:08:41,560 --> 00:08:43,190
Next on the cross validation

266
00:08:43,300 --> 00:08:44,480
of the test set, we're

267
00:08:44,600 --> 00:08:45,460
going to think of the anomaly

268
00:08:46,100 --> 00:08:47,530
detention algorithm as trying to

269
00:08:47,640 --> 00:08:48,580
predict the value of

270
00:08:49,540 --> 00:08:51,670
y. So in each of like

271
00:08:52,430 --> 00:08:53,470
say test examples.

272
00:08:54,140 --> 00:08:56,110
We have these X-I tests,

273
00:08:57,200 --> 00:08:58,720
Y-I test, where y is

274
00:08:58,870 --> 00:09:00,100
going to be equal to

275
00:09:00,320 --> 00:09:03,230
1 or 0 depending on whether this was an anomalous example.

276
00:09:04,370 --> 00:09:05,510
So given input x in

277
00:09:05,600 --> 00:09:07,340
my test set, my anomaly detection

278
00:09:07,730 --> 00:09:08,850
algorithm think of it as

279
00:09:09,100 --> 00:09:11,880
predicting the y as 1 if p of x is less than epsilon.

280
00:09:12,240 --> 00:09:15,120
So predicting that it is an anomaly, it is probably is very low.

281
00:09:15,960 --> 00:09:17,810
And we think of the algorithm is predicting that y is equal to 0.

282
00:09:17,970 --> 00:09:20,830
If p of x is greater then or equals epsilon.

283
00:09:21,290 --> 00:09:23,600
So predicting those normal example

284
00:09:24,200 --> 00:09:26,340
if the p of x is reasonably large.

285
00:09:27,350 --> 00:09:28,510
And so we can now

286
00:09:28,720 --> 00:09:29,930
think of the anomaly detection algorithm

287
00:09:30,580 --> 00:09:32,040
as making predictions for what

288
00:09:32,240 --> 00:09:33,490
are the values of these y

289
00:09:33,830 --> 00:09:35,080
labels in the test sets

290
00:09:36,050 --> 00:09:37,000
or on the cross validation set.

291
00:09:37,720 --> 00:09:39,140
And this puts us somewhat more similar

292
00:09:39,670 --> 00:09:41,250
to the supervised learning setting, right?

293
00:09:41,620 --> 00:09:42,870
Where we have label test

294
00:09:43,170 --> 00:09:44,550
set and our algorithm is

295
00:09:44,800 --> 00:09:46,060
making predictions on these labels

296
00:09:46,190 --> 00:09:48,050
and so we can evaluate it you

297
00:09:48,480 --> 00:09:50,930
know by seeing how often it gets these labels right.

298
00:09:52,180 --> 00:09:53,820
Of course these labels are

299
00:09:54,540 --> 00:09:56,420
will be very skewed because y

300
00:09:56,710 --> 00:09:57,930
equals zero, that is normal

301
00:09:58,300 --> 00:10:00,490
examples, usually be much

302
00:10:00,780 --> 00:10:01,930
more common than y equals

303
00:10:02,310 --> 00:10:03,520
1 than anomalous examples.

304
00:10:04,660 --> 00:10:05,610
But, you know, this is much closer

305
00:10:06,040 --> 00:10:06,990
to the source of evaluation

306
00:10:07,690 --> 00:10:09,770
metrics we can use in supervised learning.

307
00:10:12,390 --> 00:10:14,500
So what's a good evaluation metric to use.

308
00:10:14,790 --> 00:10:18,530
Well, because the data is

309
00:10:18,840 --> 00:10:20,450
very skewed, because y equals 0 is

310
00:10:20,880 --> 00:10:22,980
much more common, classification accuracy

311
00:10:23,520 --> 00:10:24,950
would not be a good the evaluation metrics.

312
00:10:25,360 --> 00:10:26,760
So, we talked about this in the earlier video.

313
00:10:28,360 --> 00:10:29,130
So, if you have a very

314
00:10:29,410 --> 00:10:31,360
skewed data set, then predicting

315
00:10:31,740 --> 00:10:32,750
y equals 0 all the time,

316
00:10:33,430 --> 00:10:34,300
will have very high classification accuracy.

317
00:10:35,690 --> 00:10:36,820
Instead, we should use evaluation

318
00:10:37,330 --> 00:10:38,920
metrics, like computing the fraction

319
00:10:39,530 --> 00:10:41,030
of true positives, false positives,

320
00:10:41,670 --> 00:10:42,940
false negatives, true negatives or

321
00:10:43,580 --> 00:10:44,830
compute the position of the

322
00:10:44,890 --> 00:10:46,370
v curve of this algorithm or

323
00:10:46,790 --> 00:10:48,370
do things like compute the

324
00:10:48,520 --> 00:10:50,510
f1 score, right, which is

325
00:10:50,630 --> 00:10:51,680
a single real number way of summarizing

326
00:10:52,600 --> 00:10:53,450
the position and the recall numbers.

327
00:10:54,340 --> 00:10:55,090
And so these would be ways

328
00:10:55,750 --> 00:10:56,940
to evaluate an anomaly detection

329
00:10:57,320 --> 00:10:59,090
algorithm on your cross validation set or on your test set.

330
00:11:01,550 --> 00:11:02,960
Finally, earlier in the

331
00:11:03,100 --> 00:11:05,050
anomaly detection algorithm, we

332
00:11:05,200 --> 00:11:06,720
also had this parameter epsilon, right?

333
00:11:07,400 --> 00:11:09,100
So, epsilon is this threshold

334
00:11:09,920 --> 00:11:11,320
that we would use to decide

335
00:11:11,790 --> 00:11:13,630
when to flag something as an anomaly.

336
00:11:14,840 --> 00:11:16,740
And so, if you have

337
00:11:16,840 --> 00:11:18,380
a cross validation set, another way

338
00:11:19,000 --> 00:11:20,320
to and to choose this parameter

339
00:11:20,710 --> 00:11:22,020
epsilon, would be to

340
00:11:22,240 --> 00:11:24,090
try a different, try many

341
00:11:24,340 --> 00:11:26,220
different values of epsilon, and

342
00:11:26,380 --> 00:11:27,520
then pick the value of epsilon

343
00:11:27,990 --> 00:11:30,670
that, let's say, maximizes f1

344
00:11:31,620 --> 00:11:33,940
score, or that otherwise does well on your cross validation set.

345
00:11:35,340 --> 00:11:36,770
And more generally, the way

346
00:11:37,000 --> 00:11:38,230
to reduce the training, testing,

347
00:11:38,630 --> 00:11:40,230
and cross validation sets, is that

348
00:11:41,760 --> 00:11:43,020
when we are trying to make decisions,

349
00:11:43,640 --> 00:11:45,430
like what features to include, or

350
00:11:45,570 --> 00:11:46,580
trying to, you know, tune the parameter

351
00:11:47,100 --> 00:11:48,160
epsilon, we would then

352
00:11:48,410 --> 00:11:51,010
continually evaluate the algorithm

353
00:11:51,500 --> 00:11:52,870
on the cross validation sets and

354
00:11:53,000 --> 00:11:54,120
make all those decisions like what

355
00:11:54,320 --> 00:11:55,700
features did you use, you know,

356
00:11:55,790 --> 00:11:57,650
how to set epsilon, use that, evaluate

357
00:11:58,240 --> 00:11:59,410
the algorithm on the cross validation

358
00:11:59,880 --> 00:12:00,870
set, and then when we've

359
00:12:01,320 --> 00:12:02,770
picked the set of features, when

360
00:12:02,910 --> 00:12:03,860
we've found the value of

361
00:12:04,070 --> 00:12:05,150
epsilon that we're happy with, we

362
00:12:05,250 --> 00:12:07,030
can then take the final model and

363
00:12:07,270 --> 00:12:08,680
evaluate it, you know, do the

364
00:12:08,770 --> 00:12:11,360
final evaluation of the algorithm on the test sets.

365
00:12:12,680 --> 00:12:13,900
So, in this video, we talked

366
00:12:14,180 --> 00:12:15,240
about the process of how

367
00:12:15,520 --> 00:12:17,060
to evaluate an anomaly

368
00:12:17,520 --> 00:12:18,970
detection algorithm, and again,

369
00:12:19,960 --> 00:12:21,220
having being able to evaluate

370
00:12:21,410 --> 00:12:22,540
an algorithm, you know, with

371
00:12:22,640 --> 00:12:23,830
a single real number evaluation,

372
00:12:24,730 --> 00:12:25,630
with a number like an F1 score

373
00:12:26,530 --> 00:12:27,660
that often allows you to

374
00:12:28,080 --> 00:12:29,710
much more efficient use

375
00:12:29,960 --> 00:12:31,160
of your time when you are

376
00:12:31,280 --> 00:12:33,250
trying to develop an anomaly detection system.

377
00:12:33,800 --> 00:12:34,970
And we try to make these sorts of decisions.

378
00:12:35,650 --> 00:12:38,020
I have to chose epsilon, what features to include, and so on.

379
00:12:38,970 --> 00:12:39,920
In this video, we started

380
00:12:40,330 --> 00:12:40,830
to use a bit of labeled

381
00:12:41,590 --> 00:12:42,750
data in order to

382
00:12:43,020 --> 00:12:43,550
evaluate the anomaly detection algorithm and

383
00:12:43,570 --> 00:12:45,710
this takes us a

384
00:12:45,890 --> 00:12:48,340
little bit closer to a supervised learning setting.

385
00:12:49,610 --> 00:12:50,810
In the next video, I'm going

386
00:12:51,000 --> 00:12:52,780
to say a bit more about that.

387
00:12:52,940 --> 00:12:54,240
And in particular we'll talk about when should

388
00:12:54,440 --> 00:12:55,860
you be using an anomaly detection

389
00:12:56,310 --> 00:12:57,130
algorithm and when should we

390
00:12:57,560 --> 00:13:00,770
be thinking about using supervised learning instead, and what are the differences between these two formalisms.