1 00:00:00,330 --> 00:00:01,420 In the last video we talked 2 00:00:01,750 --> 00:00:03,510 about the Multivariate Gaussian Distribution 3 00:00:04,720 --> 00:00:06,990 and saw some examples of the 4 00:00:07,230 --> 00:00:08,830 sorts of distributions you can model, as 5 00:00:08,960 --> 00:00:10,880 you vary the parameters, mu and sigma. 6 00:00:11,830 --> 00:00:13,190 In this video, let's take those 7 00:00:13,420 --> 00:00:14,690 ideas, and apply them 8 00:00:14,890 --> 00:00:17,550 to develop a different anomaly detection algorithm. 9 00:00:19,880 --> 00:00:21,890 To recap the multivariate Gaussian 10 00:00:22,270 --> 00:00:23,080 distribution and the multivariate normal 11 00:00:23,770 --> 00:00:26,640 distribution has two parameters, mu and sigma. 12 00:00:27,210 --> 00:00:28,850 Where mu this an n 13 00:00:28,990 --> 00:00:31,110 dimensional vector and sigma, 14 00:00:32,110 --> 00:00:34,430 the covariance matrix, is an 15 00:00:34,810 --> 00:00:36,110 n by n matrix. 16 00:00:37,330 --> 00:00:38,710 And here's the formula for 17 00:00:38,740 --> 00:00:39,780 the probability of X, as 18 00:00:40,480 --> 00:00:41,870 parameterized by mu and 19 00:00:42,240 --> 00:00:43,770 sigma, and as you 20 00:00:43,890 --> 00:00:45,010 vary mu and sigma, you 21 00:00:45,100 --> 00:00:45,830 can get a range of different 22 00:00:46,240 --> 00:00:47,700 distributions, like, you know, 23 00:00:47,760 --> 00:00:48,990 these are three examples of the 24 00:00:49,060 --> 00:00:50,660 ones that we saw in the previous video. 25 00:00:51,800 --> 00:00:53,100 So let's talk about the 26 00:00:53,260 --> 00:00:54,600 parameter fitting or the 27 00:00:54,670 --> 00:00:56,260 parameter estimation problem. The 28 00:00:56,800 --> 00:00:58,480 question, as usual, is if 29 00:00:58,610 --> 00:00:59,890 I have a set of examples 30 00:01:00,500 --> 00:01:02,140 X1 through XM and here each 31 00:01:02,410 --> 00:01:03,750 of these examples is an 32 00:01:04,420 --> 00:01:05,820 n dimensional vector and I think my 33 00:01:06,000 --> 00:01:08,280 examples come from a multivariate Gaussian distribution. 34 00:01:09,470 --> 00:01:12,450 How do I try to estimate my parameters mu and sigma? 35 00:01:13,440 --> 00:01:15,070 Well the standard formulas for 36 00:01:15,270 --> 00:01:17,170 estimating them is you 37 00:01:17,330 --> 00:01:18,270 set mu to be just 38 00:01:18,580 --> 00:01:19,960 the average of your training examples. 39 00:01:21,010 --> 00:01:22,770 And you set sigma to be equal to this. 40 00:01:23,130 --> 00:01:24,120 And this is actually just 41 00:01:24,250 --> 00:01:25,200 like the sigma that we had 42 00:01:25,490 --> 00:01:26,860 written out, when we were 43 00:01:27,150 --> 00:01:28,740 using the PCA or 44 00:01:28,850 --> 00:01:30,750 the Principal Components Analysis algorithm. 45 00:01:31,820 --> 00:01:32,730 So you just plug in these 46 00:01:32,850 --> 00:01:34,290 two formulas and this 47 00:01:34,570 --> 00:01:36,720 would give you your estimated parameter 48 00:01:37,160 --> 00:01:39,440 mu and your estimated parameter sigma. 49 00:01:41,580 --> 00:01:43,860 So given the data set here is how you estimate mu and sigma. 50 00:01:44,270 --> 00:01:45,350 Let's take this method 51 00:01:46,020 --> 00:01:47,410 and just plug it 52 00:01:47,610 --> 00:01:49,130 into an anomaly detection algorithm. 53 00:01:50,050 --> 00:01:51,020 So how do we 54 00:01:51,080 --> 00:01:52,200 put all of this together to 55 00:01:52,420 --> 00:01:54,160 develop an anomaly detection algorithm? 56 00:01:54,640 --> 00:01:55,780 Here 's what we do. 57 00:01:56,580 --> 00:01:57,480 First we take our training 58 00:01:57,960 --> 00:01:59,110 set, and we fit the 59 00:01:59,170 --> 00:02:00,210 model, we fit P 60 00:02:00,390 --> 00:02:01,640 of X, by, you know, setting 61 00:02:02,100 --> 00:02:02,720 mu and sigma as described 62 00:02:03,780 --> 00:02:05,410 on the previous slide. 63 00:02:07,370 --> 00:02:08,510 Next when you are given 64 00:02:08,720 --> 00:02:10,170 a new example X. So 65 00:02:10,510 --> 00:02:11,430 if you are given a test example, 66 00:02:12,450 --> 00:02:15,240 lets take an earlier example to have a new example out here. 67 00:02:15,880 --> 00:02:16,790 And that is my test example. 68 00:02:18,220 --> 00:02:19,670 Given the new example X, what 69 00:02:19,810 --> 00:02:21,220 we are going to do is compute 70 00:02:21,770 --> 00:02:23,420 P of X, using this 71 00:02:23,690 --> 00:02:26,250 formula for the multivariate Gaussian distribution. 72 00:02:27,720 --> 00:02:29,220 And then, if P of 73 00:02:29,470 --> 00:02:30,840 X is very small, then we 74 00:02:30,950 --> 00:02:31,800 flagged it as an anomaly, 75 00:02:32,440 --> 00:02:33,570 whereas, if P of X is greater 76 00:02:33,750 --> 00:02:35,520 than that parameter epsilon, then 77 00:02:35,670 --> 00:02:39,190 we don't flag it as an anomaly. 78 00:02:39,400 --> 00:02:42,240 So it turns out, if we were to fit a multivariate Gaussian distribution to this data set, 79 00:02:42,560 --> 00:02:44,220 so just the red crosses, not the green example, 80 00:02:45,190 --> 00:02:46,100 you end up with a Gaussian 81 00:02:46,300 --> 00:02:48,080 distribution that places lots 82 00:02:48,350 --> 00:02:49,690 of probability in the central 83 00:02:49,910 --> 00:02:51,840 region, slightly less probability here, 84 00:02:52,440 --> 00:02:53,360 slightly less probability here, 85 00:02:54,110 --> 00:02:55,010 slightly less probability here, 86 00:02:56,020 --> 00:02:59,280 and very low probability at the point that is way out here. 87 00:03:01,260 --> 00:03:02,350 And so, if you apply 88 00:03:02,840 --> 00:03:04,730 the multivariate Gaussian distribution to 89 00:03:04,920 --> 00:03:06,530 this example, it will actually 90 00:03:06,930 --> 00:03:08,610 correctly flag that example. 91 00:03:09,520 --> 00:03:09,920 as an anomaly. 92 00:03:16,860 --> 00:03:18,080 Finally it's worth saying 93 00:03:18,430 --> 00:03:19,640 a few words about what is 94 00:03:19,760 --> 00:03:21,900 the relationship between the 95 00:03:21,950 --> 00:03:23,810 multivariate Gaussian distribution model, and 96 00:03:24,030 --> 00:03:25,440 the original model, where we 97 00:03:25,500 --> 00:03:26,870 were modeling P of 98 00:03:26,940 --> 00:03:28,000 X as a product of this 99 00:03:28,110 --> 00:03:28,890 P of X1, P of X2, 100 00:03:29,150 --> 00:03:31,420 up to P of Xn. 101 00:03:32,750 --> 00:03:33,890 It turns out that you can 102 00:03:34,090 --> 00:03:35,310 prove mathematically, I'm not 103 00:03:35,590 --> 00:03:36,470 going to do the proof here, but 104 00:03:36,540 --> 00:03:38,120 you can prove mathematically that this 105 00:03:38,300 --> 00:03:40,610 relationship, between the 106 00:03:40,650 --> 00:03:42,240 multivariate Gaussian model and 107 00:03:42,400 --> 00:03:44,030 this original one. And in 108 00:03:44,110 --> 00:03:45,420 particular, it turns out 109 00:03:45,660 --> 00:03:47,500 that the original model corresponds 110 00:03:48,440 --> 00:03:50,330 to multivariate Gaussians, where 111 00:03:50,660 --> 00:03:51,980 the contours of the 112 00:03:52,040 --> 00:03:54,060 Gaussian are always axis aligned. 113 00:03:55,410 --> 00:03:57,350 So all three of 114 00:03:57,470 --> 00:03:59,390 these are examples of 115 00:03:59,510 --> 00:04:01,300 Gaussian distributions that you 116 00:04:01,480 --> 00:04:02,930 can fit using the original model. 117 00:04:03,190 --> 00:04:04,090 It turns out that that corresponds 118 00:04:05,040 --> 00:04:06,920 to multivariate Gaussian, where, you 119 00:04:07,300 --> 00:04:09,830 know, the ellipsis here, the contours 120 00:04:10,730 --> 00:04:13,600 of this distribution--it 121 00:04:13,800 --> 00:04:15,190 turns out that this model actually 122 00:04:15,470 --> 00:04:17,030 corresponds to a special 123 00:04:17,490 --> 00:04:19,160 case of a multivariate Gaussian distribution. 124 00:04:19,740 --> 00:04:21,110 And in particular, this special 125 00:04:21,410 --> 00:04:22,930 case is defined by constraining 126 00:04:24,460 --> 00:04:25,710 the distribution of p 127 00:04:25,880 --> 00:04:27,110 of x, the multivariate a Gaussian 128 00:04:27,270 --> 00:04:28,070 distribution of p of x, 129 00:04:28,980 --> 00:04:30,740 so that the contours of 130 00:04:30,920 --> 00:04:32,340 the probability density function, of 131 00:04:32,440 --> 00:04:35,010 the probability distribution function, are axis aligned. 132 00:04:35,700 --> 00:04:37,400 And so you can get a p 133 00:04:37,940 --> 00:04:39,550 of x with a 134 00:04:39,860 --> 00:04:41,430 multivariate Gaussian that looks like 135 00:04:41,630 --> 00:04:43,850 this, or like this, or like this. 136 00:04:44,050 --> 00:04:44,990 And you notice, that in all 137 00:04:45,210 --> 00:04:47,820 3 of these examples, these ellipses, 138 00:04:48,740 --> 00:04:50,490 or these ovals that I'm drawing, have 139 00:04:50,690 --> 00:04:53,190 their axes aligned with the X1 X2 axes. 140 00:04:54,260 --> 00:04:55,920 And what we do not have, is 141 00:04:56,200 --> 00:04:57,310 a set of contours 142 00:04:58,050 --> 00:05:00,450 that are at an angle, right? 143 00:05:00,790 --> 00:05:02,620 And this corresponded to examples where 144 00:05:02,790 --> 00:05:06,780 sigma is equal to 1 1, 0.8, 0.8. 145 00:05:06,840 --> 00:05:08,790 Let's say, with non-0 elements on the 146 00:05:09,070 --> 00:05:10,780 off diagonals. 147 00:05:11,180 --> 00:05:11,970 So, it turns out that 148 00:05:12,380 --> 00:05:13,980 it's possible to show mathematically that 149 00:05:14,260 --> 00:05:16,400 this model actually is the 150 00:05:16,480 --> 00:05:18,300 same as a multivariate Gaussian 151 00:05:18,750 --> 00:05:20,570 distribution but with a constraint. 152 00:05:21,250 --> 00:05:24,400 And the constraint is that the 153 00:05:24,480 --> 00:05:26,710 covariance matrix sigma must 154 00:05:27,240 --> 00:05:28,900 have 0's on the off diagonal elements. 155 00:05:29,360 --> 00:05:30,830 In particular, the covariance matrix sigma, 156 00:05:31,240 --> 00:05:32,450 this thing here, it would 157 00:05:32,550 --> 00:05:33,940 be sigma squared 1, sigma 158 00:05:34,190 --> 00:05:36,050 squared 2, down to sigma 159 00:05:36,350 --> 00:05:38,660 squared n, and then 160 00:05:39,530 --> 00:05:40,550 everything on the off diagonal 161 00:05:41,020 --> 00:05:42,210 entries, all of these elements 162 00:05:43,640 --> 00:05:45,110 above and below the diagonal of the matrix, 163 00:05:45,640 --> 00:05:46,850 all of those are going to be zero. 164 00:05:47,900 --> 00:05:49,380 And in fact if you take 165 00:05:49,680 --> 00:05:50,980 these values of sigma, sigma 166 00:05:51,330 --> 00:05:52,280 squared 1, sigma squared 2, 167 00:05:52,320 --> 00:05:53,380 down to sigma squared n, 168 00:05:53,930 --> 00:05:56,370 and plug them into here, and 169 00:05:56,690 --> 00:05:57,640 you know, plug them into this 170 00:05:57,760 --> 00:05:59,580 covariance matrix, then the 171 00:05:59,990 --> 00:06:01,130 two models are actually identical. 172 00:06:01,630 --> 00:06:02,560 That is, this new model, 173 00:06:06,210 --> 00:06:07,530 using a multivariate Gaussian distribution, 174 00:06:08,820 --> 00:06:10,340 corresponds exactly to the 175 00:06:10,400 --> 00:06:11,510 old model, if the covariance 176 00:06:12,040 --> 00:06:13,700 matrix sigma, has only 177 00:06:14,230 --> 00:06:15,490 0 elements off the diagonals, 178 00:06:15,580 --> 00:06:17,700 and in pictures that 179 00:06:18,180 --> 00:06:19,570 corresponds to having Gaussian distributions, 180 00:06:20,720 --> 00:06:22,260 where the contours of this 181 00:06:22,950 --> 00:06:25,620 distribution function are axis aligned. 182 00:06:25,940 --> 00:06:28,500 So you aren't allowed to model the correlations between the diffrent features. 183 00:06:30,990 --> 00:06:32,520 So in that sense the original model 184 00:06:33,030 --> 00:06:35,840 is actually a special case of this multivariate Gaussian model. 185 00:06:38,370 --> 00:06:40,370 So when would you use each of these two models? 186 00:06:40,830 --> 00:06:41,750 So when would you the original 187 00:06:42,070 --> 00:06:42,880 model and when would you use 188 00:06:43,040 --> 00:06:45,170 the multivariate Gaussian model? 189 00:06:52,110 --> 00:06:53,670 The original model is probably 190 00:06:54,240 --> 00:06:55,840 used somewhat more often, 191 00:06:58,800 --> 00:07:03,160 and whereas the multivariate Gaussian 192 00:07:03,160 --> 00:07:04,470 distribution is used somewhat 193 00:07:04,800 --> 00:07:06,670 less but it has the advantage of being 194 00:07:07,000 --> 00:07:08,290 able to capture correlations between features. So 195 00:07:10,490 --> 00:07:11,600 suppose you want to 196 00:07:11,770 --> 00:07:13,100 capture anomalies where you 197 00:07:13,210 --> 00:07:14,430 have different features say where 198 00:07:14,640 --> 00:07:16,560 features x1, x2 take 199 00:07:16,790 --> 00:07:19,760 on unusual combinations of values 200 00:07:20,070 --> 00:07:21,320 so in the earlier 201 00:07:21,730 --> 00:07:27,320 example, we had that example where the anomaly was with the CPU load and the memory use taking on unusual combinations of values, if 202 00:07:30,240 --> 00:07:31,220 you want to use the original 203 00:07:31,490 --> 00:07:33,500 model to capture that, then what you 204 00:07:33,650 --> 00:07:34,650 need to do is create an 205 00:07:34,790 --> 00:07:36,780 extra feature, such as X3 206 00:07:37,020 --> 00:07:40,710 equals X1/X2, you know 207 00:07:40,860 --> 00:07:46,480 equals maybe the CPU load divided by the memory used, or something, and you 208 00:07:47,910 --> 00:07:49,030 need to create extra features 209 00:07:49,550 --> 00:07:51,440 if there's unusual combinations of values 210 00:07:51,540 --> 00:07:52,930 where X1 and X2 take 211 00:07:53,220 --> 00:07:54,900 on an unusual combination of 212 00:07:55,000 --> 00:07:56,360 values even though X1 by 213 00:07:56,530 --> 00:07:58,610 itself and X2 by itself 214 00:07:59,850 --> 00:08:03,530 looks like it's taking a perfectly normal value. But if you're willing to spend the time to manually 215 00:08:04,030 --> 00:08:05,240 create an extra feature like this, 216 00:08:05,920 --> 00:08:07,670 then the original model will work 217 00:08:07,890 --> 00:08:14,170 fine. Whereas in contrast, the multivariate Gaussian model can automatically capture 218 00:08:14,780 --> 00:08:23,360 correlations between different features. But the original model has some other more significant advantages, too, and one huge 219 00:08:23,740 --> 00:08:24,990 advantage of the original model 220 00:08:28,200 --> 00:08:29,400 is that it is computationally cheaper, and another view on this 221 00:08:29,650 --> 00:08:31,170 is that is scales better to 222 00:08:31,290 --> 00:08:32,720 very large values of n 223 00:08:32,800 --> 00:08:34,270 and very large numbers of 224 00:08:34,460 --> 00:08:36,260 features, and so even 225 00:08:36,510 --> 00:08:38,090 if n were ten thousand, 226 00:08:39,470 --> 00:08:40,350 or even if n were equal 227 00:08:40,990 --> 00:08:42,600 to a hundred thousand, the original 228 00:08:42,820 --> 00:08:47,120 model will usually work just fine. 229 00:08:47,940 --> 00:08:48,930 Whereas in contrast for the multivariate Gaussian model notice here, for 230 00:08:49,070 --> 00:08:49,940 example, that we need to 231 00:08:50,440 --> 00:08:51,730 compute the inverse of the matrix 232 00:08:52,110 --> 00:08:53,760 sigma where sigma is 233 00:08:54,100 --> 00:08:55,230 an n by n matrix 234 00:08:56,300 --> 00:08:57,830 and so computing sigma if 235 00:08:58,160 --> 00:08:59,950 sigma is a hundred thousand by a 236 00:09:00,190 --> 00:09:02,910 hundred thousand matrix that is going to be very computationally expensive. 237 00:09:03,440 --> 00:09:04,650 And so the multivariate Gaussian model 238 00:09:05,350 --> 00:09:06,900 scales less well to large 239 00:09:07,180 --> 00:09:09,210 values of N. And 240 00:09:09,490 --> 00:09:11,030 finally for the original 241 00:09:11,250 --> 00:09:12,630 model, it turns out 242 00:09:12,770 --> 00:09:13,850 to work out ok even if 243 00:09:14,090 --> 00:09:15,520 you have a relatively small training 244 00:09:15,960 --> 00:09:17,010 set this is the small unlabeled 245 00:09:17,410 --> 00:09:19,130 examples that we use to model p of x 246 00:09:20,410 --> 00:09:21,600 of course, and this works fine, even if 247 00:09:21,720 --> 00:09:23,400 M is, you 248 00:09:24,530 --> 00:09:25,150 know, maybe 50, 100, works fine. 249 00:09:25,860 --> 00:09:27,740 Whereas for the multivariate Gaussian, it 250 00:09:27,890 --> 00:09:29,340 is sort of a mathematical property 251 00:09:29,980 --> 00:09:31,230 of the algorithm that you must have m 252 00:09:32,100 --> 00:09:38,810 greater than n, so that the number of examples is greater than the number of features you have. And there's a mathematical property of the way we estimate the parameters 253 00:09:41,840 --> 00:09:43,850 that if this is not true, so if m is less than or equal to n, 254 00:09:44,730 --> 00:09:51,610 then this matrix isn't even invertible, that is this matrix is singular, and so you can't even use the 255 00:09:51,810 --> 00:09:53,230 multivariate Gaussian model unless you make some changes to it. But a 256 00:09:54,630 --> 00:09:55,820 typical rule of thumb 257 00:09:56,030 --> 00:09:58,760 that I use is, I will use the 258 00:09:58,860 --> 00:10:00,500 multivariate Gaussian model only if m is much greater than n, so this is sort of the 259 00:10:04,050 --> 00:10:05,750 narrow mathematical requirement, but 260 00:10:05,900 --> 00:10:07,300 in practice, I would use 261 00:10:07,480 --> 00:10:08,910 the multivariate Gaussian model, only 262 00:10:09,280 --> 00:10:10,420 if m were quite a bit 263 00:10:10,750 --> 00:10:11,870 bigger than n. So if 264 00:10:12,040 --> 00:10:13,320 m were greater than or 265 00:10:13,470 --> 00:10:14,780 equal to 10 times n, let's 266 00:10:14,990 --> 00:10:16,460 say, might be a reasonable rule of thumb, and if 267 00:10:18,970 --> 00:10:20,890 it doesn't satisfy this, then the multivariate Gaussian model 268 00:10:21,300 --> 00:10:23,320 has a lot 269 00:10:23,700 --> 00:10:25,850 of parameters, right, so this covariance matrix sigma is an n by n matrix, 270 00:10:26,510 --> 00:10:27,590 so it has, you know, roughly 271 00:10:27,820 --> 00:10:31,230 n squared parameters, because it's a symmetric matrix, 272 00:10:31,710 --> 00:10:33,040 it's actually closer to n squared 273 00:10:33,430 --> 00:10:35,230 over 2 parameters, but this is a 274 00:10:35,670 --> 00:10:37,220 lot of parameters, so you need 275 00:10:37,600 --> 00:10:38,720 make sure you have a fairly 276 00:10:38,930 --> 00:10:48,350 large value for m, make sure you have enough data to fit all these parameters. And m greater than 277 00:10:49,010 --> 00:10:52,220 or equal to 10 n would be a reasonable rule of thumb to make sure that you can estimate this covariance matrix sigma reasonably well. 278 00:10:55,090 --> 00:10:56,240 So in practice the original 279 00:10:56,750 --> 00:10:58,940 model shown on the left that is used more often. 280 00:10:59,520 --> 00:11:00,840 And if you suspect that you 281 00:11:01,060 --> 00:11:02,680 need to capture correlations between features 282 00:11:03,450 --> 00:11:08,150 what people will often do is just manually design extra features like these to capture specific 283 00:11:08,780 --> 00:11:13,020 unusual combinations of values. But in problems where you 284 00:11:13,120 --> 00:11:15,310 have a very large training set or m is very large and n is 285 00:11:17,700 --> 00:11:20,160 not too large, then the multivariate Gaussian 286 00:11:20,520 --> 00:11:21,720 model is well worth considering and may work better as well, and can 287 00:11:24,360 --> 00:11:25,960 save you from having to 288 00:11:26,070 --> 00:11:27,400 spend your time to manually 289 00:11:28,070 --> 00:11:30,350 create extra features in case 290 00:11:31,380 --> 00:11:33,520 the anomalies turn out to be captured by unusual 291 00:11:34,040 --> 00:11:35,790 combinations of values of the features. 292 00:11:37,430 --> 00:11:38,330 Finally I just want to 293 00:11:38,600 --> 00:11:40,220 briefly mention one somewhat technical 294 00:11:40,770 --> 00:11:42,200 property, but if you're 295 00:11:42,370 --> 00:11:43,210 fitting multivariate Gaussian 296 00:11:43,690 --> 00:11:44,590 model, and if you find 297 00:11:44,910 --> 00:11:46,340 that the covariance matrix sigma 298 00:11:47,150 --> 00:11:48,160 is singular, or you find 299 00:11:48,340 --> 00:11:51,340 it's non-invertible, they're usually 2 cases for this. 300 00:11:51,680 --> 00:11:52,990 One is if it's failing to 301 00:11:53,100 --> 00:11:54,270 satisfy this m greater than 302 00:11:54,640 --> 00:11:56,200 n condition, and the 303 00:11:56,570 --> 00:11:58,970 second case is if you have redundant features. 304 00:11:59,570 --> 00:12:00,980 So by redundant features, I mean, 305 00:12:01,520 --> 00:12:02,760 if you have 2 features that are the same. 306 00:12:02,980 --> 00:12:04,700 Somehow you accidentally made two 307 00:12:04,830 --> 00:12:11,220 copies of the feature, so your x1 is just equal to x2. Or if you have redundant features like maybe 308 00:12:12,860 --> 00:12:14,920 your features X3 is equal to feature X4, plus feature X5. 309 00:12:15,870 --> 00:12:16,960 Okay, so if you have highly 310 00:12:17,250 --> 00:12:18,500 redundant features like these, you 311 00:12:18,680 --> 00:12:20,110 know, where if X3 is equal 312 00:12:20,380 --> 00:12:21,840 to X4 plus X5, well X3 313 00:12:22,350 --> 00:12:24,420 doesn't contain any extra information, right? 314 00:12:24,590 --> 00:12:26,460 You just take these 2 other features, and add them together. 315 00:12:27,590 --> 00:12:28,900 And if you have this 316 00:12:29,030 --> 00:12:30,960 sort of redundant features, duplicated features, 317 00:12:31,520 --> 00:12:34,030 or this sort of features, than sigma may be non-invertible. 318 00:12:35,060 --> 00:12:37,000 And so there's a debugging set-- 319 00:12:37,340 --> 00:12:38,270 this should very rarely happen, 320 00:12:38,750 --> 00:12:40,190 so you probably won't run into this, 321 00:12:40,250 --> 00:12:42,780 it is very unlikely that you have to worry about this-- 322 00:12:42,940 --> 00:12:44,480 but in case you implement a 323 00:12:44,780 --> 00:12:46,850 multivariate Gaussian model you find that sigma is non-invertible. 324 00:12:48,240 --> 00:12:49,350 What I would do is first 325 00:12:49,880 --> 00:12:51,300 make sure that M is quite 326 00:12:51,540 --> 00:12:53,520 a bit bigger than N, and if it 327 00:12:53,670 --> 00:12:54,640 is then, the second thing I 328 00:12:54,760 --> 00:12:56,560 do, is just check for redundant features. 329 00:12:57,360 --> 00:12:58,070 And so if there are 2 features 330 00:12:58,150 --> 00:12:59,360 that are equal, just get rid 331 00:12:59,480 --> 00:13:00,590 of one of them, or if 332 00:13:00,970 --> 00:13:02,580 you have redundant if these 333 00:13:02,880 --> 00:13:04,100 , X3 equals X4 plus X5, 334 00:13:04,490 --> 00:13:05,160 just get rid of the redundant 335 00:13:05,720 --> 00:13:08,650 feature, and then it should work fine again. 336 00:13:08,840 --> 00:13:09,610 As an aside for those of you 337 00:13:09,840 --> 00:13:11,210 who are experts in linear algebra, 338 00:13:11,810 --> 00:13:13,280 by redundant features, what I 339 00:13:13,410 --> 00:13:14,970 mean is the formal term is 340 00:13:15,300 --> 00:13:17,680 features that are linearly dependent. 341 00:13:18,460 --> 00:13:19,180 But in practice what that really means 342 00:13:19,620 --> 00:13:21,710 is one of these problems tripping 343 00:13:22,040 --> 00:13:24,130 up the algorithm if you just make you features non-redundant., 344 00:13:24,790 --> 00:13:26,390 that should solve the problem of sigma being non-invertable. 345 00:13:26,720 --> 00:13:29,100 But once again 346 00:13:29,530 --> 00:13:30,630 the odds of your running into this 347 00:13:30,850 --> 00:13:32,190 at all are pretty low so 348 00:13:32,540 --> 00:13:33,800 chances are, you can 349 00:13:34,130 --> 00:13:35,460 just apply the multivariate Gaussian 350 00:13:35,990 --> 00:13:37,240 model, without having to 351 00:13:37,450 --> 00:13:39,150 worry about sigma being non-invertible, 352 00:13:40,090 --> 00:13:41,180 so long as m is greater 353 00:13:41,470 --> 00:13:42,780 than or equal to n. So 354 00:13:43,200 --> 00:13:45,180 that's it for anomaly detection, 355 00:13:45,810 --> 00:13:47,230 with the multivariate Gaussian distribution. 356 00:13:48,220 --> 00:13:49,240 And if you apply this method 357 00:13:49,950 --> 00:13:51,160 you would be able to have an 358 00:13:51,310 --> 00:13:53,250 anomaly detection algorithm that automatically 359 00:13:54,010 --> 00:13:55,430 captures positive and negative 360 00:13:55,610 --> 00:13:56,690 correlations between your different 361 00:13:57,030 --> 00:13:58,520 features and flags an anomaly 362 00:13:59,450 --> 00:14:00,820 if it sees is unusual combination 363 00:14:01,630 --> 00:14:02,770 of the values of the features.