1
00:00:00,330 --> 00:00:01,420
In the last video we talked

2
00:00:01,750 --> 00:00:03,510
about the Multivariate Gaussian Distribution

3
00:00:04,720 --> 00:00:06,990
and saw some examples of the

4
00:00:07,230 --> 00:00:08,830
sorts of distributions you can model, as

5
00:00:08,960 --> 00:00:10,880
you vary the parameters, mu and sigma.

6
00:00:11,830 --> 00:00:13,190
In this video, let's take those

7
00:00:13,420 --> 00:00:14,690
ideas, and apply them

8
00:00:14,890 --> 00:00:17,550
to develop a different anomaly detection algorithm.

9
00:00:19,880 --> 00:00:21,890
To recap the multivariate Gaussian

10
00:00:22,270 --> 00:00:23,080
distribution and the multivariate normal

11
00:00:23,770 --> 00:00:26,640
distribution has two parameters, mu and sigma.

12
00:00:27,210 --> 00:00:28,850
Where mu this an n

13
00:00:28,990 --> 00:00:31,110
dimensional vector and sigma,

14
00:00:32,110 --> 00:00:34,430
the covariance matrix, is an

15
00:00:34,810 --> 00:00:36,110
n by n matrix.

16
00:00:37,330 --> 00:00:38,710
And here's the formula for

17
00:00:38,740 --> 00:00:39,780
the probability of X, as

18
00:00:40,480 --> 00:00:41,870
parameterized by mu and

19
00:00:42,240 --> 00:00:43,770
sigma, and as you

20
00:00:43,890 --> 00:00:45,010
vary mu and sigma, you

21
00:00:45,100 --> 00:00:45,830
can get a range of different

22
00:00:46,240 --> 00:00:47,700
distributions, like, you know,

23
00:00:47,760 --> 00:00:48,990
these are three examples of the

24
00:00:49,060 --> 00:00:50,660
ones that we saw in the previous video.

25
00:00:51,800 --> 00:00:53,100
So let's talk about the

26
00:00:53,260 --> 00:00:54,600
parameter fitting or the

27
00:00:54,670 --> 00:00:56,260
parameter estimation problem. The

28
00:00:56,800 --> 00:00:58,480
question, as usual, is if

29
00:00:58,610 --> 00:00:59,890
I have a set of examples

30
00:01:00,500 --> 00:01:02,140
X1 through XM and here each

31
00:01:02,410 --> 00:01:03,750
of these examples is an

32
00:01:04,420 --> 00:01:05,820
n dimensional vector and I think my

33
00:01:06,000 --> 00:01:08,280
examples come from a multivariate Gaussian distribution.

34
00:01:09,470 --> 00:01:12,450
How do I try to estimate my parameters mu and sigma?

35
00:01:13,440 --> 00:01:15,070
Well the standard formulas for

36
00:01:15,270 --> 00:01:17,170
estimating them is you

37
00:01:17,330 --> 00:01:18,270
set mu to be just

38
00:01:18,580 --> 00:01:19,960
the average of your training examples.

39
00:01:21,010 --> 00:01:22,770
And you set sigma to be equal to this.

40
00:01:23,130 --> 00:01:24,120
And this is actually just

41
00:01:24,250 --> 00:01:25,200
like the sigma that we had

42
00:01:25,490 --> 00:01:26,860
written out, when we were

43
00:01:27,150 --> 00:01:28,740
using the PCA or

44
00:01:28,850 --> 00:01:30,750
the Principal Components Analysis algorithm.

45
00:01:31,820 --> 00:01:32,730
So you just plug in these

46
00:01:32,850 --> 00:01:34,290
two formulas and this

47
00:01:34,570 --> 00:01:36,720
would give you your estimated parameter

48
00:01:37,160 --> 00:01:39,440
mu and your estimated parameter sigma.

49
00:01:41,580 --> 00:01:43,860
So given the data set here is how you estimate mu and sigma.

50
00:01:44,270 --> 00:01:45,350
Let's take this method

51
00:01:46,020 --> 00:01:47,410
and just plug it

52
00:01:47,610 --> 00:01:49,130
into an anomaly detection algorithm.

53
00:01:50,050 --> 00:01:51,020
So how do we

54
00:01:51,080 --> 00:01:52,200
put all of this together to

55
00:01:52,420 --> 00:01:54,160
develop an anomaly detection algorithm?

56
00:01:54,640 --> 00:01:55,780
Here 's what we do.

57
00:01:56,580 --> 00:01:57,480
First we take our training

58
00:01:57,960 --> 00:01:59,110
set, and we fit the

59
00:01:59,170 --> 00:02:00,210
model, we fit P

60
00:02:00,390 --> 00:02:01,640
of X, by, you know, setting

61
00:02:02,100 --> 00:02:02,720
mu and sigma as described

62
00:02:03,780 --> 00:02:05,410
on the previous slide.

63
00:02:07,370 --> 00:02:08,510
Next when you are given

64
00:02:08,720 --> 00:02:10,170
a new example X. So

65
00:02:10,510 --> 00:02:11,430
if you are given a test example,

66
00:02:12,450 --> 00:02:15,240
lets take an earlier example to have a new example out here.

67
00:02:15,880 --> 00:02:16,790
And that is my test example.

68
00:02:18,220 --> 00:02:19,670
Given the new example X, what

69
00:02:19,810 --> 00:02:21,220
we are going to do is compute

70
00:02:21,770 --> 00:02:23,420
P of X, using this

71
00:02:23,690 --> 00:02:26,250
formula for the multivariate Gaussian distribution.

72
00:02:27,720 --> 00:02:29,220
And then, if P of

73
00:02:29,470 --> 00:02:30,840
X is very small, then we

74
00:02:30,950 --> 00:02:31,800
flagged it as an anomaly,

75
00:02:32,440 --> 00:02:33,570
whereas, if P of X is greater

76
00:02:33,750 --> 00:02:35,520
than that parameter epsilon, then

77
00:02:35,670 --> 00:02:39,190
we don't flag it as an anomaly.

78
00:02:39,400 --> 00:02:42,240
So it turns out, if we were to fit a multivariate Gaussian distribution to this data set,

79
00:02:42,560 --> 00:02:44,220
so just the red crosses, not the green example,

80
00:02:45,190 --> 00:02:46,100
you end up with a Gaussian

81
00:02:46,300 --> 00:02:48,080
distribution that places lots

82
00:02:48,350 --> 00:02:49,690
of probability in the central

83
00:02:49,910 --> 00:02:51,840
region, slightly less probability here,

84
00:02:52,440 --> 00:02:53,360
slightly less probability here,

85
00:02:54,110 --> 00:02:55,010
slightly less probability here,

86
00:02:56,020 --> 00:02:59,280
and very low probability at the point that is way out here.

87
00:03:01,260 --> 00:03:02,350
And so, if you apply

88
00:03:02,840 --> 00:03:04,730
the multivariate Gaussian distribution to

89
00:03:04,920 --> 00:03:06,530
this example, it will actually

90
00:03:06,930 --> 00:03:08,610
correctly flag that example.

91
00:03:09,520 --> 00:03:09,920
as an anomaly.

92
00:03:16,860 --> 00:03:18,080
Finally it's worth saying

93
00:03:18,430 --> 00:03:19,640
a few words about what is

94
00:03:19,760 --> 00:03:21,900
the relationship between the

95
00:03:21,950 --> 00:03:23,810
multivariate Gaussian distribution model, and

96
00:03:24,030 --> 00:03:25,440
the original model, where we

97
00:03:25,500 --> 00:03:26,870
were modeling P of

98
00:03:26,940 --> 00:03:28,000
X as a product of this

99
00:03:28,110 --> 00:03:28,890
P of X1, P of X2,

100
00:03:29,150 --> 00:03:31,420
up to P of Xn.

101
00:03:32,750 --> 00:03:33,890
It turns out that you can

102
00:03:34,090 --> 00:03:35,310
prove mathematically, I'm not

103
00:03:35,590 --> 00:03:36,470
going to do the proof here, but

104
00:03:36,540 --> 00:03:38,120
you can prove mathematically that this

105
00:03:38,300 --> 00:03:40,610
relationship, between the

106
00:03:40,650 --> 00:03:42,240
multivariate Gaussian model and

107
00:03:42,400 --> 00:03:44,030
this original one. And in

108
00:03:44,110 --> 00:03:45,420
particular, it turns out

109
00:03:45,660 --> 00:03:47,500
that the original model corresponds

110
00:03:48,440 --> 00:03:50,330
to multivariate Gaussians, where

111
00:03:50,660 --> 00:03:51,980
the contours of the

112
00:03:52,040 --> 00:03:54,060
Gaussian are always axis aligned.

113
00:03:55,410 --> 00:03:57,350
So all three of

114
00:03:57,470 --> 00:03:59,390
these are examples of

115
00:03:59,510 --> 00:04:01,300
Gaussian distributions that you

116
00:04:01,480 --> 00:04:02,930
can fit using the original model.

117
00:04:03,190 --> 00:04:04,090
It turns out that that corresponds

118
00:04:05,040 --> 00:04:06,920
to multivariate Gaussian, where, you

119
00:04:07,300 --> 00:04:09,830
know, the ellipsis here, the contours

120
00:04:10,730 --> 00:04:13,600
of this distribution--it

121
00:04:13,800 --> 00:04:15,190
turns out that this model actually

122
00:04:15,470 --> 00:04:17,030
corresponds to a special

123
00:04:17,490 --> 00:04:19,160
case of a multivariate Gaussian distribution.

124
00:04:19,740 --> 00:04:21,110
And in particular, this special

125
00:04:21,410 --> 00:04:22,930
case is defined by constraining

126
00:04:24,460 --> 00:04:25,710
the distribution of p

127
00:04:25,880 --> 00:04:27,110
of x, the multivariate a Gaussian

128
00:04:27,270 --> 00:04:28,070
distribution of p of x,

129
00:04:28,980 --> 00:04:30,740
so that the contours of

130
00:04:30,920 --> 00:04:32,340
the probability density function, of

131
00:04:32,440 --> 00:04:35,010
the probability distribution function, are axis aligned.

132
00:04:35,700 --> 00:04:37,400
And so you can get a p

133
00:04:37,940 --> 00:04:39,550
of x with a

134
00:04:39,860 --> 00:04:41,430
multivariate Gaussian that looks like

135
00:04:41,630 --> 00:04:43,850
this, or like this, or like this.

136
00:04:44,050 --> 00:04:44,990
And you notice, that in all

137
00:04:45,210 --> 00:04:47,820
3 of these examples, these ellipses,

138
00:04:48,740 --> 00:04:50,490
or these ovals that I'm drawing, have

139
00:04:50,690 --> 00:04:53,190
their axes aligned with the X1 X2 axes.

140
00:04:54,260 --> 00:04:55,920
And what we do not have, is

141
00:04:56,200 --> 00:04:57,310
a set of contours

142
00:04:58,050 --> 00:05:00,450
that are at an angle, right?

143
00:05:00,790 --> 00:05:02,620
And this corresponded to examples where

144
00:05:02,790 --> 00:05:06,780
sigma is equal to 1 1, 0.8, 0.8.

145
00:05:06,840 --> 00:05:08,790
Let's say, with non-0 elements on the

146
00:05:09,070 --> 00:05:10,780
off diagonals.

147
00:05:11,180 --> 00:05:11,970
So, it turns out that

148
00:05:12,380 --> 00:05:13,980
it's possible to show mathematically that

149
00:05:14,260 --> 00:05:16,400
this model actually is the

150
00:05:16,480 --> 00:05:18,300
same as a multivariate Gaussian

151
00:05:18,750 --> 00:05:20,570
distribution but with a constraint.

152
00:05:21,250 --> 00:05:24,400
And the constraint is that the

153
00:05:24,480 --> 00:05:26,710
covariance matrix sigma must

154
00:05:27,240 --> 00:05:28,900
have 0's on the off diagonal elements.

155
00:05:29,360 --> 00:05:30,830
In particular, the covariance matrix sigma,

156
00:05:31,240 --> 00:05:32,450
this thing here, it would

157
00:05:32,550 --> 00:05:33,940
be sigma squared 1, sigma

158
00:05:34,190 --> 00:05:36,050
squared 2, down to sigma

159
00:05:36,350 --> 00:05:38,660
squared n, and then

160
00:05:39,530 --> 00:05:40,550
everything on the off diagonal

161
00:05:41,020 --> 00:05:42,210
entries, all of these elements

162
00:05:43,640 --> 00:05:45,110
above and below the diagonal of the matrix,

163
00:05:45,640 --> 00:05:46,850
all of those are going to be zero.

164
00:05:47,900 --> 00:05:49,380
And in fact if you take

165
00:05:49,680 --> 00:05:50,980
these values of sigma, sigma

166
00:05:51,330 --> 00:05:52,280
squared 1, sigma squared 2,

167
00:05:52,320 --> 00:05:53,380
down to sigma squared n,

168
00:05:53,930 --> 00:05:56,370
and plug them into here, and

169
00:05:56,690 --> 00:05:57,640
you know, plug them into this

170
00:05:57,760 --> 00:05:59,580
covariance matrix, then the

171
00:05:59,990 --> 00:06:01,130
two models are actually identical.

172
00:06:01,630 --> 00:06:02,560
That is, this new model,

173
00:06:06,210 --> 00:06:07,530
using a multivariate Gaussian distribution,

174
00:06:08,820 --> 00:06:10,340
corresponds exactly to the

175
00:06:10,400 --> 00:06:11,510
old model, if the covariance

176
00:06:12,040 --> 00:06:13,700
matrix sigma, has only

177
00:06:14,230 --> 00:06:15,490
0 elements off the diagonals,

178
00:06:15,580 --> 00:06:17,700
and in pictures that

179
00:06:18,180 --> 00:06:19,570
corresponds to having Gaussian distributions,

180
00:06:20,720 --> 00:06:22,260
where the contours of this

181
00:06:22,950 --> 00:06:25,620
distribution function are axis aligned.

182
00:06:25,940 --> 00:06:28,500
So you aren't allowed to model the correlations between the diffrent features.

183
00:06:30,990 --> 00:06:32,520
So in that sense the original model

184
00:06:33,030 --> 00:06:35,840
is actually a special case of this multivariate Gaussian model.

185
00:06:38,370 --> 00:06:40,370
So when would you use each of these two models?

186
00:06:40,830 --> 00:06:41,750
So when would you the original

187
00:06:42,070 --> 00:06:42,880
model and when would you use

188
00:06:43,040 --> 00:06:45,170
the multivariate Gaussian model?

189
00:06:52,110 --> 00:06:53,670
The original model is probably

190
00:06:54,240 --> 00:06:55,840
used somewhat more often,

191
00:06:58,800 --> 00:07:03,160
and whereas the multivariate Gaussian

192
00:07:03,160 --> 00:07:04,470
distribution is used somewhat

193
00:07:04,800 --> 00:07:06,670
less but it has the advantage of being

194
00:07:07,000 --> 00:07:08,290
able to capture correlations between features. So

195
00:07:10,490 --> 00:07:11,600
suppose you want to

196
00:07:11,770 --> 00:07:13,100
capture anomalies where you

197
00:07:13,210 --> 00:07:14,430
have different features say where

198
00:07:14,640 --> 00:07:16,560
features x1, x2 take

199
00:07:16,790 --> 00:07:19,760
on unusual combinations of values

200
00:07:20,070 --> 00:07:21,320
so in the earlier

201
00:07:21,730 --> 00:07:27,320
example, we had that example where the anomaly was with the CPU load and the memory use taking on unusual combinations of values, if

202
00:07:30,240 --> 00:07:31,220
you want to use the original

203
00:07:31,490 --> 00:07:33,500
model to capture that, then what you

204
00:07:33,650 --> 00:07:34,650
need to do is create an

205
00:07:34,790 --> 00:07:36,780
extra feature, such as X3

206
00:07:37,020 --> 00:07:40,710
equals X1/X2, you know

207
00:07:40,860 --> 00:07:46,480
equals maybe the CPU load divided by the memory used, or something, and you

208
00:07:47,910 --> 00:07:49,030
need to create extra features

209
00:07:49,550 --> 00:07:51,440
if there's unusual combinations of values

210
00:07:51,540 --> 00:07:52,930
where X1 and X2 take

211
00:07:53,220 --> 00:07:54,900
on an unusual combination of

212
00:07:55,000 --> 00:07:56,360
values even though X1 by

213
00:07:56,530 --> 00:07:58,610
itself and X2 by itself

214
00:07:59,850 --> 00:08:03,530
looks like it's taking a perfectly normal value. But if you're willing to spend the time to manually

215
00:08:04,030 --> 00:08:05,240
create an extra feature like this,

216
00:08:05,920 --> 00:08:07,670
then the original model will work

217
00:08:07,890 --> 00:08:14,170
fine. 
Whereas in contrast, the multivariate Gaussian model can automatically capture

218
00:08:14,780 --> 00:08:23,360
correlations between different features. But the original model has some other more significant advantages, too, and one huge

219
00:08:23,740 --> 00:08:24,990
advantage of the original model

220
00:08:28,200 --> 00:08:29,400
is that it is computationally cheaper, and another view on this

221
00:08:29,650 --> 00:08:31,170
is that is scales better to

222
00:08:31,290 --> 00:08:32,720
very large values of n

223
00:08:32,800 --> 00:08:34,270
and very large numbers of

224
00:08:34,460 --> 00:08:36,260
features, and so even

225
00:08:36,510 --> 00:08:38,090
if n were ten thousand,

226
00:08:39,470 --> 00:08:40,350
or even if n were equal

227
00:08:40,990 --> 00:08:42,600
to a hundred thousand, the original

228
00:08:42,820 --> 00:08:47,120
model will usually work just fine.

229
00:08:47,940 --> 00:08:48,930
Whereas in contrast for the multivariate Gaussian model notice here, for

230
00:08:49,070 --> 00:08:49,940
example, that we need to

231
00:08:50,440 --> 00:08:51,730
compute the inverse of the matrix

232
00:08:52,110 --> 00:08:53,760
sigma where sigma is

233
00:08:54,100 --> 00:08:55,230
an n by n matrix

234
00:08:56,300 --> 00:08:57,830
and so computing sigma if

235
00:08:58,160 --> 00:08:59,950
sigma is a hundred thousand by a

236
00:09:00,190 --> 00:09:02,910
hundred thousand matrix that is going to be very computationally expensive.

237
00:09:03,440 --> 00:09:04,650
And so the multivariate Gaussian model

238
00:09:05,350 --> 00:09:06,900
scales less well to large

239
00:09:07,180 --> 00:09:09,210
values of N. And

240
00:09:09,490 --> 00:09:11,030
finally for the original

241
00:09:11,250 --> 00:09:12,630
model, it turns out

242
00:09:12,770 --> 00:09:13,850
to work out ok even if

243
00:09:14,090 --> 00:09:15,520
you have a relatively small training

244
00:09:15,960 --> 00:09:17,010
set this is the small unlabeled

245
00:09:17,410 --> 00:09:19,130
examples that we use to model p of x

246
00:09:20,410 --> 00:09:21,600
of course, and this works fine, even if

247
00:09:21,720 --> 00:09:23,400
M is, you

248
00:09:24,530 --> 00:09:25,150
know, maybe 50, 100, works fine.

249
00:09:25,860 --> 00:09:27,740
Whereas for the multivariate Gaussian, it

250
00:09:27,890 --> 00:09:29,340
is sort of a mathematical property

251
00:09:29,980 --> 00:09:31,230
of the algorithm that you must have m

252
00:09:32,100 --> 00:09:38,810
greater than n, so that the number of examples is greater than the number of features you have. And there's a mathematical property of the way we estimate the parameters

253
00:09:41,840 --> 00:09:43,850
that if this is not true, so if m is less than or equal to n,

254
00:09:44,730 --> 00:09:51,610
then this matrix isn't even invertible, that is this matrix is singular, and so you can't even use the

255
00:09:51,810 --> 00:09:53,230
multivariate Gaussian model unless you make some changes to it. But a

256
00:09:54,630 --> 00:09:55,820
typical rule of thumb

257
00:09:56,030 --> 00:09:58,760
that I use is, I will use the

258
00:09:58,860 --> 00:10:00,500
multivariate Gaussian model only if m is much greater than n, so this is sort of the

259
00:10:04,050 --> 00:10:05,750
narrow mathematical requirement, but

260
00:10:05,900 --> 00:10:07,300
in practice, I would use

261
00:10:07,480 --> 00:10:08,910
the multivariate Gaussian model, only

262
00:10:09,280 --> 00:10:10,420
if m were quite a bit

263
00:10:10,750 --> 00:10:11,870
bigger than n. 
So if

264
00:10:12,040 --> 00:10:13,320
m were greater than or

265
00:10:13,470 --> 00:10:14,780
equal to 10 times n, let's

266
00:10:14,990 --> 00:10:16,460
say, might be a reasonable rule of thumb, and if

267
00:10:18,970 --> 00:10:20,890
it doesn't satisfy this, then the multivariate Gaussian model

268
00:10:21,300 --> 00:10:23,320
has a lot

269
00:10:23,700 --> 00:10:25,850
of parameters, right, so this covariance matrix sigma is an n by n matrix,

270
00:10:26,510 --> 00:10:27,590
so it has, you know, roughly

271
00:10:27,820 --> 00:10:31,230
n squared parameters, because it's a symmetric matrix,

272
00:10:31,710 --> 00:10:33,040
it's actually closer to n squared

273
00:10:33,430 --> 00:10:35,230
over 2 parameters, but this is a

274
00:10:35,670 --> 00:10:37,220
lot of parameters, so you need

275
00:10:37,600 --> 00:10:38,720
make sure you have a fairly

276
00:10:38,930 --> 00:10:48,350
large value for m, make sure you have enough data to fit all these parameters. And m greater than

277
00:10:49,010 --> 00:10:52,220
or equal to 10 n would be a reasonable rule of thumb to make sure that you can estimate this covariance matrix sigma reasonably well.

278
00:10:55,090 --> 00:10:56,240
So in practice the original

279
00:10:56,750 --> 00:10:58,940
model shown on the left that is used more often.

280
00:10:59,520 --> 00:11:00,840
And if you suspect that you

281
00:11:01,060 --> 00:11:02,680
need to capture correlations between features

282
00:11:03,450 --> 00:11:08,150
what people will often do is just manually design extra features like these to capture specific

283
00:11:08,780 --> 00:11:13,020
unusual combinations of values. But in problems where you

284
00:11:13,120 --> 00:11:15,310
have a very large training set or m is very large and n is

285
00:11:17,700 --> 00:11:20,160
not too large, then the multivariate Gaussian

286
00:11:20,520 --> 00:11:21,720
model is well worth considering and may work better as well, and can

287
00:11:24,360 --> 00:11:25,960
save you from having to

288
00:11:26,070 --> 00:11:27,400
spend your time to manually

289
00:11:28,070 --> 00:11:30,350
create extra features in case

290
00:11:31,380 --> 00:11:33,520
the anomalies turn out to be captured by unusual

291
00:11:34,040 --> 00:11:35,790
combinations of values of the features.

292
00:11:37,430 --> 00:11:38,330
Finally I just want to

293
00:11:38,600 --> 00:11:40,220
briefly mention one somewhat technical

294
00:11:40,770 --> 00:11:42,200
property, but if you're

295
00:11:42,370 --> 00:11:43,210
fitting multivariate Gaussian

296
00:11:43,690 --> 00:11:44,590
model, and if you find

297
00:11:44,910 --> 00:11:46,340
that the covariance matrix sigma

298
00:11:47,150 --> 00:11:48,160
is singular, or you find

299
00:11:48,340 --> 00:11:51,340
it's non-invertible, they're usually 2 cases for this.

300
00:11:51,680 --> 00:11:52,990
One is if it's failing to

301
00:11:53,100 --> 00:11:54,270
satisfy this m greater than

302
00:11:54,640 --> 00:11:56,200
n condition, and the

303
00:11:56,570 --> 00:11:58,970
second case is if you have redundant features.

304
00:11:59,570 --> 00:12:00,980
So by redundant features, I mean,

305
00:12:01,520 --> 00:12:02,760
if you have 2 features that are the same.

306
00:12:02,980 --> 00:12:04,700
Somehow you accidentally made two

307
00:12:04,830 --> 00:12:11,220
copies of the feature, so your x1 is just equal to x2. Or if you have redundant features like maybe

308
00:12:12,860 --> 00:12:14,920
your features X3 is equal to feature X4, plus feature X5.

309
00:12:15,870 --> 00:12:16,960
Okay, so if you have highly

310
00:12:17,250 --> 00:12:18,500
redundant features like these, you

311
00:12:18,680 --> 00:12:20,110
know, where if X3 is equal

312
00:12:20,380 --> 00:12:21,840
to X4 plus X5, well X3

313
00:12:22,350 --> 00:12:24,420
doesn't contain any extra information, right?

314
00:12:24,590 --> 00:12:26,460
You just take these 2 other features, and add them together.

315
00:12:27,590 --> 00:12:28,900
And if you have this

316
00:12:29,030 --> 00:12:30,960
sort of redundant features, duplicated features,

317
00:12:31,520 --> 00:12:34,030
or this sort of features, than sigma may be non-invertible.

318
00:12:35,060 --> 00:12:37,000
And so there's a debugging set--

319
00:12:37,340 --> 00:12:38,270
this should very rarely happen,

320
00:12:38,750 --> 00:12:40,190
so you probably won't run into this,

321
00:12:40,250 --> 00:12:42,780
it is very unlikely that you have to worry about this--

322
00:12:42,940 --> 00:12:44,480
but in case you implement a

323
00:12:44,780 --> 00:12:46,850
multivariate Gaussian model you find that sigma is non-invertible.

324
00:12:48,240 --> 00:12:49,350
What I would do is first

325
00:12:49,880 --> 00:12:51,300
make sure that M is quite

326
00:12:51,540 --> 00:12:53,520
a bit bigger than N, and if it

327
00:12:53,670 --> 00:12:54,640
is then, the second thing I

328
00:12:54,760 --> 00:12:56,560
do, is just check for redundant features.

329
00:12:57,360 --> 00:12:58,070
And so if there are 2 features

330
00:12:58,150 --> 00:12:59,360
that are equal, just get rid

331
00:12:59,480 --> 00:13:00,590
of one of them, or if

332
00:13:00,970 --> 00:13:02,580
you have redundant if these

333
00:13:02,880 --> 00:13:04,100
, X3 equals X4 plus X5,

334
00:13:04,490 --> 00:13:05,160
just get rid of the redundant

335
00:13:05,720 --> 00:13:08,650
feature, and then it should work fine again.

336
00:13:08,840 --> 00:13:09,610
As an aside for those of you

337
00:13:09,840 --> 00:13:11,210
who are experts in linear algebra,

338
00:13:11,810 --> 00:13:13,280
by redundant features, what I

339
00:13:13,410 --> 00:13:14,970
mean is the formal term is

340
00:13:15,300 --> 00:13:17,680
features that are linearly dependent.

341
00:13:18,460 --> 00:13:19,180
But in practice what that really means

342
00:13:19,620 --> 00:13:21,710
is one of these problems tripping

343
00:13:22,040 --> 00:13:24,130
up the algorithm if you just make you features non-redundant.,

344
00:13:24,790 --> 00:13:26,390
that should solve the problem of sigma being non-invertable.

345
00:13:26,720 --> 00:13:29,100
But once again

346
00:13:29,530 --> 00:13:30,630
the odds of your running into this

347
00:13:30,850 --> 00:13:32,190
at all are pretty low so

348
00:13:32,540 --> 00:13:33,800
chances are, you can

349
00:13:34,130 --> 00:13:35,460
just apply the multivariate Gaussian

350
00:13:35,990 --> 00:13:37,240
model, without having to

351
00:13:37,450 --> 00:13:39,150
worry about sigma being non-invertible,

352
00:13:40,090 --> 00:13:41,180
so long as m is greater

353
00:13:41,470 --> 00:13:42,780
than or equal to n.
So

354
00:13:43,200 --> 00:13:45,180
that's it for anomaly detection,

355
00:13:45,810 --> 00:13:47,230
with the multivariate Gaussian distribution.

356
00:13:48,220 --> 00:13:49,240
And if you apply this method

357
00:13:49,950 --> 00:13:51,160
you would be able to have an

358
00:13:51,310 --> 00:13:53,250
anomaly detection algorithm that automatically

359
00:13:54,010 --> 00:13:55,430
captures positive and negative

360
00:13:55,610 --> 00:13:56,690
correlations between your different

361
00:13:57,030 --> 00:13:58,520
features and flags an anomaly

362
00:13:59,450 --> 00:14:00,820
if it sees is unusual combination

363
00:14:01,630 --> 00:14:02,770
of the values of the features.