1
00:00:00,200 --> 00:00:01,770
By now you've seen the anomaly

2
00:00:02,250 --> 00:00:03,540
detection algorithm and we've

3
00:00:03,740 --> 00:00:05,240
also talked about how to

4
00:00:05,570 --> 00:00:06,870
evaluate an anomaly detection

5
00:00:07,330 --> 00:00:08,880
algorithm. It turns out,

6
00:00:09,530 --> 00:00:10,800
that when you're applying anomaly

7
00:00:11,170 --> 00:00:12,400
detection, one of the

8
00:00:12,460 --> 00:00:13,290
things that has a huge

9
00:00:13,720 --> 00:00:14,860
effect on how well it

10
00:00:14,940 --> 00:00:16,440
does, is what features you

11
00:00:16,520 --> 00:00:17,720
use, and what features you choose,

12
00:00:18,530 --> 00:00:19,910
to give the anomaly detection algorithm.

13
00:00:20,830 --> 00:00:22,170
So in this video, what I'd

14
00:00:22,280 --> 00:00:23,390
like to do is say a few

15
00:00:23,480 --> 00:00:24,890
words, give some suggestions and

16
00:00:25,000 --> 00:00:26,250
guidelines for how to

17
00:00:26,370 --> 00:00:27,920
go about designing or selecting

18
00:00:28,470 --> 00:00:30,950
features give to an anomaly detection algorithm.

19
00:00:33,920 --> 00:00:35,310
In our anomaly detection algorithm,

20
00:00:36,120 --> 00:00:37,270
one of the things we did was

21
00:00:37,510 --> 00:00:40,330
model the features using this sort of Gaussian distribution.

22
00:00:41,180 --> 00:00:42,810
With xi to mu

23
00:00:43,120 --> 00:00:46,050
i, sigma squared i, lets say.

24
00:00:46,550 --> 00:00:47,890
And so one thing that

25
00:00:47,950 --> 00:00:49,620
I often do would be to plot the

26
00:00:50,670 --> 00:00:52,260
data or the histogram of

27
00:00:52,330 --> 00:00:53,490
the data, to make sure that

28
00:00:53,940 --> 00:00:55,210
the data looks vaguely

29
00:00:55,540 --> 00:00:57,320
Gaussian before feeding it

30
00:00:57,470 --> 00:00:58,830
to my anomaly detection algorithm.

31
00:00:59,810 --> 00:01:01,040
And, it'll usually work okay,

32
00:01:01,610 --> 00:01:02,820
even if your data isn't Gaussian,

33
00:01:03,400 --> 00:01:05,700
but this is sort of a nice sanitary check to run.

34
00:01:05,970 --> 00:01:06,860
And by the way, in case your data

35
00:01:07,400 --> 00:01:09,540
looks non-Gaussian, the algorithms will often work just find.

36
00:01:10,410 --> 00:01:12,070
But, concretely if I

37
00:01:12,430 --> 00:01:13,510
plot the data like this,

38
00:01:13,850 --> 00:01:15,280
and if it looks like a histogram like

39
00:01:15,370 --> 00:01:16,480
this, and the way

40
00:01:16,630 --> 00:01:17,800
to plot a histogram is to

41
00:01:17,950 --> 00:01:19,910
use the HIST, or the

42
00:01:20,130 --> 00:01:21,820
HIST command in Octave,

43
00:01:21,910 --> 00:01:22,800
but it looks like this, this looks

44
00:01:23,010 --> 00:01:24,770
vaguely Gaussian, so if

45
00:01:24,940 --> 00:01:26,200
my features look like this,

46
00:01:26,480 --> 00:01:29,970
I would be pretty happy feeding into my algorithm.

47
00:01:30,180 --> 00:01:31,830
But if i were to plot a histogram of my

48
00:01:31,950 --> 00:01:33,070
data, and it were

49
00:01:33,210 --> 00:01:34,800
to look like this well, this

50
00:01:35,060 --> 00:01:36,090
doesn't look at all like a

51
00:01:36,220 --> 00:01:38,430
bell shaped curve, this is a very asymmetric distribution,

52
00:01:39,410 --> 00:01:40,660
it has a peak way off to one side.

53
00:01:41,750 --> 00:01:42,660
If this is what my data

54
00:01:42,800 --> 00:01:43,960
looks like, what I'll often

55
00:01:44,190 --> 00:01:45,370
do is play with different

56
00:01:45,730 --> 00:01:46,920
transformations of the data in order

57
00:01:47,010 --> 00:01:48,850
to make it look more Gaussian.

58
00:01:49,480 --> 00:01:51,940
And again the algorithm will usually work okay, even if you don't.

59
00:01:52,590 --> 00:01:53,660
But if you use these transformations

60
00:01:54,630 --> 00:01:56,590
to make your data more gaussian, it might work a bit better.

61
00:01:58,030 --> 00:01:59,780
So given the data set

62
00:02:00,140 --> 00:02:01,340
that looks like this, what I

63
00:02:01,430 --> 00:02:02,810
might do is take a

64
00:02:03,010 --> 00:02:04,520
log transformation of the

65
00:02:04,660 --> 00:02:05,930
data and if i

66
00:02:06,060 --> 00:02:07,810
do that and re-plot the

67
00:02:08,150 --> 00:02:09,110
histogram, what I end up

68
00:02:09,330 --> 00:02:10,500
with in this particular example,

69
00:02:11,130 --> 00:02:12,400
is a histogram that looks like this.

70
00:02:12,540 --> 00:02:14,470
And this looks much more Gaussian, right?

71
00:02:14,650 --> 00:02:15,720
This looks much more like the classic

72
00:02:16,690 --> 00:02:18,020
bell shaped curve, that we

73
00:02:18,710 --> 00:02:21,000
can fit with some mean and variance paramater sigma.

74
00:02:22,180 --> 00:02:22,940
So what I mean by taking

75
00:02:23,230 --> 00:02:24,610
a log transform, is really that

76
00:02:24,860 --> 00:02:26,140
if I have some feature x1 and

77
00:02:26,860 --> 00:02:28,260
then the histogram of x1 looks

78
00:02:28,720 --> 00:02:30,500
like this then I might

79
00:02:31,070 --> 00:02:32,210
take my feature x1

80
00:02:32,410 --> 00:02:33,890
and replace it with log

81
00:02:34,800 --> 00:02:36,730
of x1 and this is

82
00:02:36,860 --> 00:02:37,880
my new x1 that I'll plot

83
00:02:38,170 --> 00:02:40,000
to the histogram over on the right, and this looks much

84
00:02:40,430 --> 00:02:42,350
more Guassian.

85
00:02:44,000 --> 00:02:44,730
Rather than just a log transform some other things you can

86
00:02:44,920 --> 00:02:46,020
do, might be, let's say

87
00:02:46,110 --> 00:02:47,720
I have a different feature x2,

88
00:02:48,690 --> 00:02:49,840
maybe I'll replace that will

89
00:02:50,120 --> 00:02:52,560
log x plus 1,

90
00:02:52,630 --> 00:02:54,720
or more generally with log

91
00:02:56,360 --> 00:02:57,690
x with x2 and

92
00:02:58,430 --> 00:03:00,350
some constant c and this

93
00:03:00,520 --> 00:03:01,540
constant could be something

94
00:03:01,890 --> 00:03:04,390
that I play with, to try to make it look as Gaussian as possible.

95
00:03:05,610 --> 00:03:06,820
Or for a different feature x3, maybe

96
00:03:07,200 --> 00:03:08,610
I'll replace it with x3,

97
00:03:09,730 --> 00:03:11,250
I might take the square root.

98
00:03:11,610 --> 00:03:14,180
The square root is just x3 to the power of one half, right?

99
00:03:15,260 --> 00:03:16,660
And this one half

100
00:03:17,130 --> 00:03:19,220
is another example of a parameter I can play with.

101
00:03:19,640 --> 00:03:21,600
So, I might have x4 and

102
00:03:22,450 --> 00:03:23,820
maybe I might instead replace

103
00:03:24,410 --> 00:03:25,370
that with x4 to the power

104
00:03:25,730 --> 00:03:26,790
of something else, maybe to the

105
00:03:26,890 --> 00:03:28,460
power of 1/3.

106
00:03:28,940 --> 00:03:30,830
And these, all of

107
00:03:30,900 --> 00:03:32,320
these, this one, this

108
00:03:32,540 --> 00:03:33,670
exponent parameter, or the

109
00:03:33,810 --> 00:03:35,110
C parameter, all of these

110
00:03:35,380 --> 00:03:36,880
are examples of parameters that

111
00:03:36,960 --> 00:03:38,110
you can play with in order

112
00:03:38,460 --> 00:03:40,420
to make your data look a little bit more Gaussian.

113
00:03:45,180 --> 00:03:46,210
So, let me show you a live demo

114
00:03:46,740 --> 00:03:48,720
of how I actually go about

115
00:03:49,150 --> 00:03:50,690
playing with my data to make it look more Gaussian.

116
00:03:51,650 --> 00:03:52,370
So, I have already loaded

117
00:03:52,750 --> 00:03:54,730
in to octave here a set

118
00:03:54,860 --> 00:03:56,170
of features x I have a thousand examples

119
00:03:57,150 --> 00:03:57,870
loaded over there.

120
00:03:58,580 --> 00:04:00,100
So let's pull up the histogram of my data.

121
00:04:01,560 --> 00:04:02,570
Use the hist x command.

122
00:04:03,190 --> 00:04:04,100
So there's my histogram.

123
00:04:05,660 --> 00:04:06,580
By default, I think this

124
00:04:06,680 --> 00:04:08,250
uses 10 bins of histograms,

125
00:04:08,610 --> 00:04:10,400
but I want to see a more fine grid histogram.

126
00:04:11,330 --> 00:04:12,950
So we do hist to the x, 50,

127
00:04:13,050 --> 00:04:14,970
so, this plots it in 50 different bins.

128
00:04:15,310 --> 00:04:15,660
Okay, that looks better.

129
00:04:16,180 --> 00:04:18,570
Now, this doesn't look very Gaussian, does it?

130
00:04:18,930 --> 00:04:20,720
So, lets start playing around with the data.

131
00:04:20,900 --> 00:04:22,310
Lets try a hist of

132
00:04:22,610 --> 00:04:24,810
x to the 0.5.

133
00:04:25,090 --> 00:04:26,590
So we take the

134
00:04:26,870 --> 00:04:28,820
square root of the data, and plot that histogram.

135
00:04:30,670 --> 00:04:31,680
And, okay, it looks

136
00:04:31,800 --> 00:04:32,870
a little bit more Gaussian, but not

137
00:04:32,960 --> 00:04:34,550
quite there, so let's play at the 0.5 parameter.

138
00:04:34,790 --> 00:04:35,330
Let's see.

139
00:04:36,520 --> 00:04:38,110
Set this to 0.2.

140
00:04:38,280 --> 00:04:39,780
Looks a little bit more Gaussian.

141
00:04:40,930 --> 00:04:43,150
Let's reduce a little bit more 0.1.

142
00:04:44,450 --> 00:04:45,220
Yeah, that looks pretty good.

143
00:04:45,500 --> 00:04:48,440
I could actually just use 0.1.

144
00:04:48,880 --> 00:04:50,190
Well, let's reduce it to 0.05.

145
00:04:50,520 --> 00:04:50,910
And, you know?

146
00:04:51,740 --> 00:04:52,750
Okay, this looks pretty Gaussian,

147
00:04:53,230 --> 00:04:54,090
so I can define a new

148
00:04:54,190 --> 00:04:55,510
feature which is x mu equals

149
00:04:56,110 --> 00:04:58,940
x to the 0.05,

150
00:04:59,620 --> 00:05:01,380
and now my new

151
00:05:01,610 --> 00:05:03,050
feature x Mu looks more

152
00:05:03,250 --> 00:05:04,490
Gaussian than my previous one

153
00:05:04,510 --> 00:05:05,560
and then I might instead use

154
00:05:05,850 --> 00:05:07,070
this new feature to feed

155
00:05:07,380 --> 00:05:09,390
into my anomaly detection algorithm.

156
00:05:10,150 --> 00:05:12,100
And of course, there is more than one way to do this.

157
00:05:12,410 --> 00:05:14,530
You could also have hist of log of

158
00:05:14,710 --> 00:05:17,320
x, that's another example of a transformation you can use.

159
00:05:18,270 --> 00:05:20,410
And, you know, that also look pretty Gaussian.

160
00:05:20,870 --> 00:05:22,040
So, I can also define x

161
00:05:22,230 --> 00:05:23,760
mu equals log of x.

162
00:05:24,220 --> 00:05:25,120
and that would be another

163
00:05:25,300 --> 00:05:26,890
pretty good choice of a feature to use.

164
00:05:28,040 --> 00:05:29,400
So to summarize, if you

165
00:05:29,520 --> 00:05:30,580
plot a histogram with the data,

166
00:05:31,000 --> 00:05:31,690
and find that it looks pretty

167
00:05:31,940 --> 00:05:33,460
non-Gaussian, it's worth playing

168
00:05:33,740 --> 00:05:35,110
around a little bit with

169
00:05:35,280 --> 00:05:37,120
different transformations like these, to

170
00:05:37,290 --> 00:05:38,190
see if you can make

171
00:05:38,300 --> 00:05:39,410
your data look a little bit more

172
00:05:39,570 --> 00:05:40,520
Gaussian, before you feed it to

173
00:05:40,770 --> 00:05:41,970
your learning algorithm, although even if

174
00:05:42,050 --> 00:05:43,550
you don't, it might work okay.

175
00:05:43,850 --> 00:05:45,070
But I usually do take this step.

176
00:05:45,850 --> 00:05:46,880
Now, the second thing I want

177
00:05:46,970 --> 00:05:48,280
to talk about is, how do

178
00:05:48,400 --> 00:05:51,540
you come up with features for an anomaly detection algorithm.

179
00:05:52,650 --> 00:05:53,780
And the way I often do

180
00:05:53,990 --> 00:05:56,490
so, is via an error analysis procedure.

181
00:05:57,630 --> 00:05:58,590
So what I mean by that,

182
00:05:58,970 --> 00:05:59,960
is that this is really similar

183
00:06:00,320 --> 00:06:02,320
to the error analysis procedure that

184
00:06:02,450 --> 00:06:04,600
we have for supervised learning, where

185
00:06:04,860 --> 00:06:06,810
we would train a

186
00:06:06,860 --> 00:06:08,220
complete algorithm, and run the

187
00:06:08,350 --> 00:06:09,980
algorithm on a cross validation set,

188
00:06:10,840 --> 00:06:11,870
and look at the examples it gets

189
00:06:12,230 --> 00:06:13,500
wrong, and see if

190
00:06:13,580 --> 00:06:14,800
we can come up with extra features

191
00:06:15,370 --> 00:06:16,440
to help the algorithm do

192
00:06:16,580 --> 00:06:17,870
better on the examples

193
00:06:18,280 --> 00:06:19,850
that it got wrong in the cross-validation set.

194
00:06:21,060 --> 00:06:23,380
So lets try

195
00:06:24,040 --> 00:06:25,960
to reason through an example of this process.

196
00:06:26,950 --> 00:06:28,680
In anomaly detection, we are

197
00:06:28,880 --> 00:06:29,690
hoping that p of x will

198
00:06:29,840 --> 00:06:30,910
be large for the normal examples

199
00:06:31,760 --> 00:06:33,180
and it will be small for the anomalous examples.

200
00:06:34,400 --> 00:06:35,370
And so a pretty common problem

201
00:06:35,950 --> 00:06:37,780
would be if p of x is comparable,

202
00:06:38,480 --> 00:06:41,540
maybe both are large for both the normal and the anomalous examples.

203
00:06:42,940 --> 00:06:44,380
Lets look at a specific example of that.

204
00:06:45,150 --> 00:06:46,760
Let's say that this is my unlabeled data.

205
00:06:47,120 --> 00:06:47,970
So, here I have just one

206
00:06:48,210 --> 00:06:51,130
feature, x1 and so I'm gonna fit a Gaussian to this.

207
00:06:52,160 --> 00:06:55,990
And maybe my Gaussian that I fit to my data looks like that.

208
00:06:57,300 --> 00:06:59,130
And now let's say I have an anomalous example,

209
00:06:59,670 --> 00:07:00,480
and let's say that my anomalous example

210
00:07:01,080 --> 00:07:02,850
takes on an x value of 2.5.

211
00:07:03,020 --> 00:07:06,420
So I plot my anomalous example there.

212
00:07:07,200 --> 00:07:08,120
And you know, it's kind of buried

213
00:07:08,650 --> 00:07:09,730
in the middle of a bunch

214
00:07:09,880 --> 00:07:11,690
of normal examples, and so,

215
00:07:13,450 --> 00:07:14,850
just this anomalous example

216
00:07:15,460 --> 00:07:16,780
that I've drawn in green, it gets a

217
00:07:16,820 --> 00:07:18,550
pretty high probability, where it's the

218
00:07:18,730 --> 00:07:20,000
height of the blue curve,

219
00:07:20,960 --> 00:07:22,280
and the algorithm fails to

220
00:07:22,390 --> 00:07:23,840
flag this as an anomalous example.

221
00:07:25,320 --> 00:07:26,600
Now, if this were maybe aircraft

222
00:07:27,000 --> 00:07:29,540
engine manufacturing or something, what

223
00:07:29,680 --> 00:07:30,490
I would do is, I would actually

224
00:07:30,860 --> 00:07:32,370
look at my training examples and

225
00:07:32,840 --> 00:07:34,500
look at what went wrong with

226
00:07:34,730 --> 00:07:36,920
that particular aircraft engine, and

227
00:07:37,030 --> 00:07:38,360
see, if looking at that

228
00:07:38,720 --> 00:07:40,720
example can inspire me to

229
00:07:40,860 --> 00:07:41,800
come up with a new feature

230
00:07:42,290 --> 00:07:43,890
x2, that helps to distinguish

231
00:07:44,650 --> 00:07:46,530
between this bad example, compared

232
00:07:46,900 --> 00:07:47,850
to the rest of my

233
00:07:48,530 --> 00:07:49,850
red examples, compared to all

234
00:07:50,980 --> 00:07:51,600
of my normal aircraft engines.

235
00:07:52,790 --> 00:07:53,840
And if I managed to do

236
00:07:54,000 --> 00:07:54,910
so, the hope would be then,

237
00:07:55,150 --> 00:07:56,540
that, if I can create a

238
00:07:56,610 --> 00:07:59,360
new feature, X2, so that

239
00:07:59,610 --> 00:08:01,490
when I re-plot my data, if

240
00:08:01,580 --> 00:08:02,530
I take all my normal examples

241
00:08:02,770 --> 00:08:04,420
of my training set, hopefully

242
00:08:04,750 --> 00:08:05,560
I find that all my training

243
00:08:05,710 --> 00:08:07,380
examples are these red crosses here.

244
00:08:08,210 --> 00:08:09,580
And hopefully, if I find

245
00:08:09,860 --> 00:08:11,390
that for my anomalous example, the

246
00:08:11,480 --> 00:08:13,490
feature x2 takes on the the unusual value.

247
00:08:14,470 --> 00:08:15,820
So for my green example

248
00:08:16,290 --> 00:08:18,670
here, this anomaly, right, my

249
00:08:18,940 --> 00:08:20,800
X1 value, is still 2.5.

250
00:08:21,260 --> 00:08:22,900
Then maybe my X2 value, hopefully

251
00:08:23,290 --> 00:08:24,530
it takes on a very large

252
00:08:24,840 --> 00:08:26,710
value like 3.5 over there,

253
00:08:27,940 --> 00:08:28,450
or a very small value.

254
00:08:29,450 --> 00:08:30,530
But now, if I model

255
00:08:30,970 --> 00:08:32,480
my data, I'll find that

256
00:08:33,050 --> 00:08:34,660
my anomaly detection algorithm gives

257
00:08:35,240 --> 00:08:36,830
high probability to data

258
00:08:37,190 --> 00:08:39,160
in the central regions, slightly lower

259
00:08:39,200 --> 00:08:42,470
probability to that, sightly lower probability to that.

260
00:08:42,660 --> 00:08:43,960
An example that's all the

261
00:08:44,070 --> 00:08:45,450
way out there, my algorithm will

262
00:08:45,630 --> 00:08:46,720
now give very low probability

263
00:08:48,360 --> 00:08:48,360
to.

264
00:08:48,510 --> 00:08:49,170
And so, the process of this

265
00:08:49,230 --> 00:08:50,320
is, really look at the

266
00:08:51,430 --> 00:08:52,570
mistakes that it is making.

267
00:08:52,830 --> 00:08:54,370
Look at the anomaly that the algorithm

268
00:08:54,580 --> 00:08:56,020
is failing to flag, and see

269
00:08:56,320 --> 00:08:59,100
if that inspires you to create some new feature.

270
00:08:59,590 --> 00:09:01,180
So find something unusual about

271
00:09:01,470 --> 00:09:02,590
that aircraft engine and use

272
00:09:02,800 --> 00:09:03,640
that to create a new feature,

273
00:09:04,530 --> 00:09:05,780
so that with this new

274
00:09:05,900 --> 00:09:07,140
feature it becomes easier to

275
00:09:07,400 --> 00:09:09,250
distinguish the anomalies from your good examples.

276
00:09:09,880 --> 00:09:11,170
And so that's the

277
00:09:11,280 --> 00:09:12,600
process of error analysis

278
00:09:14,020 --> 00:09:15,360
and using that to create

279
00:09:15,750 --> 00:09:17,100
new features for anomaly detection.

280
00:09:17,770 --> 00:09:18,980
Finally, let me share with

281
00:09:19,090 --> 00:09:20,440
you my thinking on how I

282
00:09:20,630 --> 00:09:23,190
usually go about choosing features for anomaly detection.

283
00:09:24,350 --> 00:09:27,700
So, usually, the way I think about choosing features is

284
00:09:27,960 --> 00:09:29,160
I want to choose features that will

285
00:09:29,270 --> 00:09:30,610
take on either very, very

286
00:09:30,860 --> 00:09:32,000
large values, or very, very

287
00:09:32,110 --> 00:09:33,890
small values, for examples

288
00:09:34,750 --> 00:09:36,420
that I think might turn out to be anomalies.

289
00:09:37,850 --> 00:09:38,710
So let's use our example

290
00:09:39,060 --> 00:09:41,820
again of monitoring the computers in a data center.

291
00:09:42,250 --> 00:09:43,560
And so you have lots of

292
00:09:43,630 --> 00:09:44,930
machines, maybe thousands, or tens

293
00:09:45,170 --> 00:09:47,830
of thousands of machines in a data center.

294
00:09:48,310 --> 00:09:49,410
And we want to know if one

295
00:09:49,580 --> 00:09:50,640
of the machines, one of our

296
00:09:50,710 --> 00:09:53,320
computers is acting up, so doing something strange.

297
00:09:54,180 --> 00:09:56,050
So here are examples of features you may choose,

298
00:09:57,020 --> 00:09:59,630
maybe memory used, number of disc accesses, CPU load, network traffic.

299
00:10:01,040 --> 00:10:01,960
But now, lets say that I

300
00:10:02,220 --> 00:10:03,040
suspect one of the failure

301
00:10:03,470 --> 00:10:04,580
cases, let's say that

302
00:10:05,230 --> 00:10:06,970
in my data set I think

303
00:10:07,150 --> 00:10:08,460
that CPU load the network traffic

304
00:10:08,990 --> 00:10:10,820
tend to grow linearly with each other.

305
00:10:11,110 --> 00:10:12,120
Maybe I'm running a bunch of

306
00:10:12,220 --> 00:10:13,370
web servers, and so, here

307
00:10:13,750 --> 00:10:15,050
if one of my servers is

308
00:10:15,310 --> 00:10:16,530
serving a lot of users,

309
00:10:16,850 --> 00:10:19,050
I have a very high CPU load, and have a very high network traffic.

310
00:10:20,230 --> 00:10:21,360
But let's say, I think,

311
00:10:21,840 --> 00:10:23,280
let's say I have a suspicion, that

312
00:10:23,390 --> 00:10:24,890
one of the failure cases is

313
00:10:25,180 --> 00:10:26,240
if one of my computers

314
00:10:26,530 --> 00:10:29,590
has a job that gets stuck in some infinite loop.

315
00:10:29,670 --> 00:10:30,750
So if I think one of

316
00:10:30,800 --> 00:10:32,240
the failure cases, is one of

317
00:10:32,420 --> 00:10:33,470
my machines, one of my

318
00:10:34,380 --> 00:10:36,020
web servers--server code--

319
00:10:36,680 --> 00:10:37,990
gets stuck in some infinite loop,

320
00:10:38,230 --> 00:10:39,550
and so the CPU load grows,

321
00:10:40,380 --> 00:10:41,490
but the network traffic doesn't because

322
00:10:41,560 --> 00:10:42,790
it's just spinning it's

323
00:10:42,940 --> 00:10:44,570
wheels and doing a lot of CPU work, you know,

324
00:10:44,870 --> 00:10:46,000
stuck in some infinite loop.

325
00:10:46,930 --> 00:10:47,850
In that case, to detect

326
00:10:48,240 --> 00:10:49,610
that type of anomaly, I might

327
00:10:49,780 --> 00:10:52,440
create a new feature, X5,

328
00:10:53,170 --> 00:10:55,130
which might be CPU load

329
00:10:56,600 --> 00:11:00,120
divided by network traffic.

330
00:11:01,230 --> 00:11:02,810
And so here X5 will take

331
00:11:03,180 --> 00:11:04,860
on a unusually large value

332
00:11:05,700 --> 00:11:06,410
if one of the machines has a

333
00:11:06,790 --> 00:11:08,190
very large CPU load but

334
00:11:08,470 --> 00:11:09,980
not that much network traffic and

335
00:11:10,250 --> 00:11:11,030
so this will be a

336
00:11:11,160 --> 00:11:12,390
feature that will help your

337
00:11:12,490 --> 00:11:14,180
anomaly detection capture, a certain type of anomaly.

338
00:11:15,000 --> 00:11:16,700
And you can

339
00:11:16,840 --> 00:11:19,060
also get creative and come up with other features as well.

340
00:11:19,230 --> 00:11:20,090
Like maybe I have a feature

341
00:11:20,570 --> 00:11:22,050
x6 thats CPU load

342
00:11:22,880 --> 00:11:25,540
squared divided by network traffic.

343
00:11:27,030 --> 00:11:28,280
And this would be another variant

344
00:11:28,950 --> 00:11:29,910
of a feature like x5 to try

345
00:11:30,020 --> 00:11:32,120
to capture anomalies where one

346
00:11:32,280 --> 00:11:33,650
of your machines has a very

347
00:11:33,800 --> 00:11:35,030
high CPU load, that maybe

348
00:11:35,290 --> 00:11:37,100
doesn't have a commensurately large network traffic.

349
00:11:38,540 --> 00:11:40,080
And by creating features like

350
00:11:40,290 --> 00:11:41,560
these, you can start to capture

351
00:11:42,770 --> 00:11:44,550
anomalies that correspond to

352
00:11:45,690 --> 00:11:48,270
unusual combinations of values of the features.

353
00:11:50,990 --> 00:11:52,090
So in this video we

354
00:11:52,260 --> 00:11:53,550
talked about how to and

355
00:11:53,690 --> 00:11:54,670
take a feature, and maybe transform

356
00:11:55,120 --> 00:11:56,680
it a little bit, so that

357
00:11:56,830 --> 00:11:57,910
it becomes a bit more Gaussian,

358
00:11:58,260 --> 00:12:00,480
before feeding into an anomaly detection algorithm.

359
00:12:00,950 --> 00:12:02,110
And also the error analysis

360
00:12:02,740 --> 00:12:04,220
in this process of creating features

361
00:12:04,870 --> 00:12:06,710
to try to capture different types of anomalies.

362
00:12:07,550 --> 00:12:10,300
And with these sorts of guidelines hopefully that will help you

363
00:12:10,850 --> 00:12:12,180
to choose good features, to give to

364
00:12:12,460 --> 00:12:14,310
your anomaly detection algorithm, to

365
00:12:14,430 --> 00:12:15,920
help it capture all sorts of anomalies.