1 00:00:00,090 --> 00:00:01,240 In the last video, we talked 2 00:00:01,560 --> 00:00:03,660 about the Gaussian distribution. In 3 00:00:03,810 --> 00:00:05,350 this video lets apply that 4 00:00:05,440 --> 00:00:07,300 to develop an anomaly detection algorithm. 5 00:00:10,360 --> 00:00:11,690 Let's say that we have an 6 00:00:11,840 --> 00:00:13,390 unlabeled training set of M 7 00:00:13,650 --> 00:00:15,410 examples, and each of 8 00:00:15,470 --> 00:00:16,730 these examples is going to 9 00:00:16,760 --> 00:00:18,350 be a feature in Rn so 10 00:00:18,440 --> 00:00:19,420 your training set could be, 11 00:00:20,540 --> 00:00:21,860 feature vectors from the last 12 00:00:22,730 --> 00:00:24,150 M aircraft engines being manufactured. 13 00:00:24,960 --> 00:00:26,730 Or it could be features from m 14 00:00:27,070 --> 00:00:28,290 users or something else. 15 00:00:29,320 --> 00:00:30,460 The way we are going to address 16 00:00:30,840 --> 00:00:32,310 anomaly detection, is we are 17 00:00:32,350 --> 00:00:33,480 going to model p of x 18 00:00:33,860 --> 00:00:35,640 from the data sets. 19 00:00:36,240 --> 00:00:38,530 We're going to try to figure out what are high probability features, what 20 00:00:38,860 --> 00:00:40,620 are lower probability types of features. 21 00:00:41,350 --> 00:00:42,810 So, x is a 22 00:00:43,090 --> 00:00:44,900 vector and what we 23 00:00:45,320 --> 00:00:46,580 are going to do is model p of 24 00:00:47,020 --> 00:00:48,870 x, as probability of x1, 25 00:00:49,440 --> 00:00:50,390 that is of the first component 26 00:00:50,950 --> 00:00:53,180 of x, times the probability 27 00:00:53,990 --> 00:00:54,960 of x2, that is the probability 28 00:00:55,510 --> 00:00:57,350 of the second feature, times the 29 00:00:57,450 --> 00:00:58,860 probability of the third 30 00:00:59,090 --> 00:01:01,230 feature, and so on up 31 00:01:01,410 --> 00:01:03,290 to the probability of the final feature 32 00:01:03,760 --> 00:01:03,930 of Xn. 33 00:01:04,200 --> 00:01:06,320 Now I'm leaving space here cause I'll fill in something in a minute. 34 00:01:08,780 --> 00:01:09,720 So, how do we 35 00:01:09,830 --> 00:01:10,960 model each of these terms, 36 00:01:11,460 --> 00:01:13,020 p of X1, p of X2, and so on. 37 00:01:14,080 --> 00:01:15,380 What we're going to do, 38 00:01:15,680 --> 00:01:16,860 is assume that the feature, 39 00:01:17,480 --> 00:01:19,830 X1, is distributed according 40 00:01:20,340 --> 00:01:22,950 to a Gaussian distribution, with 41 00:01:23,160 --> 00:01:25,140 some mean, which you 42 00:01:25,340 --> 00:01:25,850 want to write as mu1 and 43 00:01:25,920 --> 00:01:26,900 some variance, which I'm going 44 00:01:26,990 --> 00:01:28,560 to write as sigma squared 1, 45 00:01:29,890 --> 00:01:30,690 and so p of X1 is 46 00:01:30,820 --> 00:01:32,020 going to be a Gaussian 47 00:01:32,350 --> 00:01:34,410 probability distribution, with mean 48 00:01:34,610 --> 00:01:37,580 mu1 and variance sigma squared 1. 49 00:01:38,230 --> 00:01:39,660 And similarly I'm 50 00:01:39,720 --> 00:01:40,570 going to assume that X2 51 00:01:40,760 --> 00:01:42,220 is distributed, Gaussian, 52 00:01:42,870 --> 00:01:44,620 that's what this little tilda stands for, 53 00:01:44,800 --> 00:01:47,220 that means distributed Gaussian 54 00:01:47,740 --> 00:01:49,490 with mean mu2 and Sigma 55 00:01:49,830 --> 00:01:51,780 squared 2, so it's distributed according 56 00:01:52,170 --> 00:01:54,230 to a different Gaussian, which has 57 00:01:54,460 --> 00:01:58,010 a different set of parameters, mu2 sigma square 2. 58 00:01:58,120 --> 00:02:00,160 And similarly, you know, 59 00:02:00,360 --> 00:02:04,020 X3 is yet another 60 00:02:04,480 --> 00:02:06,590 Gaussian, so this 61 00:02:06,780 --> 00:02:09,100 can have a different mean and 62 00:02:09,300 --> 00:02:11,630 a different standard deviation than the 63 00:02:11,830 --> 00:02:15,370 other features, and so on, up to XN. 64 00:02:17,000 --> 00:02:17,740 And so that's my model. 65 00:02:19,010 --> 00:02:20,230 Just as a side comment for 66 00:02:20,370 --> 00:02:21,490 those of you that are experts in 67 00:02:21,890 --> 00:02:22,770 statistics, it turns out that 68 00:02:22,990 --> 00:02:23,850 this equation that I just 69 00:02:24,250 --> 00:02:25,590 wrote out actually corresponds to an 70 00:02:25,750 --> 00:02:27,490 independence assumption on the 71 00:02:28,060 --> 00:02:29,550 values of the features x1 through xn. 72 00:02:30,290 --> 00:02:31,520 But in practice it turns out 73 00:02:32,040 --> 00:02:34,010 that the algorithm of this fragment, it works just fine, 74 00:02:34,410 --> 00:02:36,330 whether or not these features are 75 00:02:36,610 --> 00:02:37,780 anywhere close to independent and 76 00:02:38,280 --> 00:02:39,810 even if independence assumption doesn't 77 00:02:40,240 --> 00:02:41,830 hold true this algorithm works just fine. 78 00:02:42,650 --> 00:02:45,870 But in case you don't know 79 00:02:45,970 --> 00:02:47,380 those terms I just used independence assumptions and so on, 80 00:02:47,830 --> 00:02:48,460 don't worry about it. 81 00:02:49,170 --> 00:02:50,840 You'll be able to understand 82 00:02:51,360 --> 00:02:52,690 it and implement this algorithm just fine 83 00:02:53,250 --> 00:02:55,310 and that comment was really meant only for the experts in statistics. 84 00:02:57,790 --> 00:02:58,880 Finally, in order to 85 00:02:59,210 --> 00:03:00,320 wrap this up, let me 86 00:03:00,590 --> 00:03:04,680 take this expression and write it a little bit more compactly. 87 00:03:05,120 --> 00:03:06,200 So, we're going to 88 00:03:06,310 --> 00:03:07,500 write this is a product 89 00:03:07,740 --> 00:03:09,520 from J equals one 90 00:03:10,230 --> 00:03:11,840 through N, of P 91 00:03:12,140 --> 00:03:15,350 of XJ parameterized by 92 00:03:16,020 --> 00:03:17,930 mu j comma sigma squared 93 00:03:19,500 --> 00:03:21,500 j. So this funny 94 00:03:21,790 --> 00:03:23,330 symbol here, there is 95 00:03:23,780 --> 00:03:25,220 capital Greek alphabet pi, 96 00:03:25,490 --> 00:03:27,600 that funny symbol there corresponds to 97 00:03:28,190 --> 00:03:29,980 taking the product of a set of values. 98 00:03:30,590 --> 00:03:32,290 And so, you're familiar with 99 00:03:32,400 --> 00:03:33,930 the summation notation, so the 100 00:03:34,520 --> 00:03:36,460 sum from i equals one through 101 00:03:36,930 --> 00:03:39,070 n, of i. This 102 00:03:39,960 --> 00:03:41,820 means 1 + 2 + 3 plus 103 00:03:42,230 --> 00:03:43,730 dot dot dot, up to 104 00:03:43,910 --> 00:03:45,350 n. Where as this 105 00:03:45,660 --> 00:03:46,910 funny symbol here, this product 106 00:03:47,390 --> 00:03:48,630 symbol, right product from i 107 00:03:48,840 --> 00:03:50,310 equals 1 through n 108 00:03:50,620 --> 00:03:52,210 of i. Then this 109 00:03:52,520 --> 00:03:54,530 means that, it's just like summation except that we're now multiplying. 110 00:03:55,200 --> 00:03:56,680 This becomes 1 times 111 00:03:56,880 --> 00:03:58,700 2 times 3 times up 112 00:03:59,910 --> 00:04:01,330 to N. And so using 113 00:04:01,860 --> 00:04:03,430 this product notation, this 114 00:04:03,570 --> 00:04:05,880 product from j equals 1 through n of this expression. 115 00:04:06,620 --> 00:04:08,440 It's just more compact, it's 116 00:04:08,820 --> 00:04:09,960 just shorter way for writing 117 00:04:10,330 --> 00:04:12,810 out this product of 118 00:04:13,120 --> 00:04:14,400 of all of these terms up there. 119 00:04:15,200 --> 00:04:16,200 Since we're are taking these p 120 00:04:16,430 --> 00:04:17,510 of x j given mu j 121 00:04:17,730 --> 00:04:18,740 comma sigma squared j terms 122 00:04:19,130 --> 00:04:20,290 and multiplying them together. 123 00:04:21,540 --> 00:04:22,830 And, by the way the problem 124 00:04:23,250 --> 00:04:25,370 of estimating this distribution 125 00:04:25,990 --> 00:04:27,130 p of x, they're sometimes called 126 00:04:28,280 --> 00:04:29,540 the problem of density estimation. 127 00:04:30,420 --> 00:04:31,270 Hence the title of the slide. 128 00:04:33,800 --> 00:04:35,310 So putting everything together, here 129 00:04:35,500 --> 00:04:36,920 is our anomaly detection algorithm. 130 00:04:38,120 --> 00:04:40,290 The first step is to choose 131 00:04:40,650 --> 00:04:41,600 features, or come up with 132 00:04:41,700 --> 00:04:42,740 features xi that we think 133 00:04:43,040 --> 00:04:45,340 might be indicative of anomalous examples. 134 00:04:46,050 --> 00:04:47,020 So what I mean by that, 135 00:04:47,240 --> 00:04:48,490 is, try to come 136 00:04:48,680 --> 00:04:49,990 up with features, so that when there's 137 00:04:50,280 --> 00:04:51,630 an unusual user in your 138 00:04:52,190 --> 00:04:53,000 system that may be doing 139 00:04:53,190 --> 00:04:54,790 fraudulent things, or when the 140 00:04:55,020 --> 00:04:56,670 aircraft engine examples, you know 141 00:04:56,760 --> 00:04:59,500 there's something funny, something strange about one of the aircraft engines. 142 00:05:00,280 --> 00:05:01,230 Choose features X I, that 143 00:05:02,000 --> 00:05:03,330 you think might take on unusually 144 00:05:04,410 --> 00:05:05,860 large values, or unusually 145 00:05:06,020 --> 00:05:08,750 small values, for what an 146 00:05:08,880 --> 00:05:10,160 anomalous example might look like. 147 00:05:10,910 --> 00:05:12,440 But more generally, just try 148 00:05:12,690 --> 00:05:14,340 to choose features that describe general 149 00:05:16,160 --> 00:05:19,380 properties of the things that you're collecting data on. 150 00:05:20,030 --> 00:05:21,360 Next, given a training set, 151 00:05:22,020 --> 00:05:23,980 of M, unlabled examples, 152 00:05:25,000 --> 00:05:26,980 X1 through X M, we 153 00:05:27,170 --> 00:05:28,580 then fit the parameters, 154 00:05:29,090 --> 00:05:30,170 mu 1 through mu n, and 155 00:05:30,340 --> 00:05:31,480 sigma squared 1 through sigma 156 00:05:31,690 --> 00:05:33,460 squared n, and so these 157 00:05:33,840 --> 00:05:34,810 were the formulas similar to 158 00:05:34,840 --> 00:05:36,420 the formulas we have 159 00:05:36,680 --> 00:05:37,610 in the previous video, that we're 160 00:05:37,740 --> 00:05:39,180 going to use the estimate 161 00:05:39,310 --> 00:05:41,120 each of these parameters, and just to give 162 00:05:42,030 --> 00:05:43,670 some interpretation, mu J, 163 00:05:44,060 --> 00:05:47,830 that's my average value of the j feature. 164 00:05:48,720 --> 00:05:51,580 Mu j goes in this term p of xj. 165 00:05:52,440 --> 00:05:53,870 which is parametrized by mu J 166 00:05:54,220 --> 00:05:55,590 and sigma squared J. And 167 00:05:55,920 --> 00:05:57,890 so this says for the 168 00:05:58,360 --> 00:05:59,620 mu J just take the 169 00:05:59,700 --> 00:06:00,720 mean over my training 170 00:06:01,070 --> 00:06:02,930 set of the values of the j feature. 171 00:06:03,860 --> 00:06:05,100 And, just to mention, that you 172 00:06:05,220 --> 00:06:07,410 do this, you compute these 173 00:06:07,620 --> 00:06:08,830 formulas for j equals 174 00:06:09,420 --> 00:06:10,360 one through n. So use 175 00:06:10,700 --> 00:06:11,960 these formulas to estimate mu 176 00:06:12,230 --> 00:06:14,020 1, to estimate mu 177 00:06:14,070 --> 00:06:15,620 2, and so on up to 178 00:06:16,170 --> 00:06:17,460 mu n, and similarly for sigma 179 00:06:17,770 --> 00:06:19,060 squared, and it's also 180 00:06:19,390 --> 00:06:21,530 possible to come up with vectorized versions of these. 181 00:06:21,830 --> 00:06:22,900 So if you think of 182 00:06:23,000 --> 00:06:25,220 mu as a vector, so mu 183 00:06:25,920 --> 00:06:27,430 if is a vector there's mu 1, 184 00:06:27,760 --> 00:06:29,230 mu 2, down to mu 185 00:06:29,570 --> 00:06:31,180 n, then a vectorized 186 00:06:31,660 --> 00:06:33,510 version of that set 187 00:06:33,910 --> 00:06:35,530 of parameters can be written 188 00:06:36,440 --> 00:06:37,830 like so sum from 1 189 00:06:37,880 --> 00:06:39,610 equals one through n xi. 190 00:06:40,290 --> 00:06:41,290 So, this formula that I 191 00:06:41,410 --> 00:06:43,530 just wrote out estimates this 192 00:06:43,990 --> 00:06:45,160 xi as the feature vectors 193 00:06:45,660 --> 00:06:48,140 that estimates mu for all the values of n simultaneously. 194 00:06:49,140 --> 00:06:50,070 And it's also possible to come 195 00:06:50,430 --> 00:06:52,130 up with a vectorized formula for 196 00:06:52,290 --> 00:06:55,110 estimating sigma squared j. Finally, 197 00:06:56,500 --> 00:06:57,890 when you're given a new example, so 198 00:06:58,100 --> 00:06:59,270 when you have a new aircraft engine 199 00:06:59,740 --> 00:07:01,420 and you want to know is this aircraft engine anomalous. 200 00:07:02,470 --> 00:07:03,430 What we need to do is then 201 00:07:03,570 --> 00:07:05,610 compute p of x, what's the probability of this new example? 202 00:07:06,790 --> 00:07:07,670 So, p of x is equal 203 00:07:07,990 --> 00:07:09,990 to this product, and 204 00:07:10,100 --> 00:07:11,140 what you implement, what you compute, 205 00:07:11,750 --> 00:07:14,040 is this formula and 206 00:07:15,000 --> 00:07:16,610 where over here, this thing 207 00:07:16,840 --> 00:07:17,900 here this is just the 208 00:07:18,260 --> 00:07:19,250 formula for the Gaussian 209 00:07:19,800 --> 00:07:21,000 probability, so you compute 210 00:07:21,240 --> 00:07:22,880 this thing, and finally if 211 00:07:22,940 --> 00:07:24,420 this probability is very small, 212 00:07:24,860 --> 00:07:26,370 then you flag this thing as an anomaly. 213 00:07:27,570 --> 00:07:29,380 Here's an example of an application of this method. 214 00:07:30,870 --> 00:07:31,860 Let's say we have this data 215 00:07:32,210 --> 00:07:35,430 set plotted on the upper left of this slide. 216 00:07:36,670 --> 00:07:38,860 if you look at this, well, lets look the feature of x1. 217 00:07:39,610 --> 00:07:40,640 If you look at this data set, it 218 00:07:40,750 --> 00:07:42,600 looks like on average, the features 219 00:07:42,950 --> 00:07:44,330 x1 has a mean of about 5 220 00:07:45,540 --> 00:07:47,420 and the standard deviation, if 221 00:07:47,590 --> 00:07:48,660 you only look at just the x1 222 00:07:49,010 --> 00:07:50,030 values of this data set 223 00:07:50,310 --> 00:07:51,720 has the standard deviation of maybe 2. 224 00:07:52,370 --> 00:07:55,110 So that sigma 1 and 225 00:07:55,460 --> 00:07:57,380 looks like x2 the 226 00:07:57,670 --> 00:07:59,070 values of the features as 227 00:07:59,250 --> 00:08:00,370 measured on the vertical axis, 228 00:08:00,840 --> 00:08:01,730 looks like it has an average 229 00:08:02,010 --> 00:08:03,110 value of about 3, and 230 00:08:03,380 --> 00:08:05,750 a standard deviation of about 1. So if 231 00:08:05,880 --> 00:08:06,940 you take this data set and if 232 00:08:07,010 --> 00:08:08,690 you estimate mu1, mu2, sigma1, 233 00:08:09,030 --> 00:08:11,410 sigma2, this is what you get. 234 00:08:11,610 --> 00:08:12,930 And again, I'm writing sigma here, 235 00:08:13,140 --> 00:08:14,620 I'm think about standard deviations, but 236 00:08:15,100 --> 00:08:16,240 the formula on the previous 5 237 00:08:16,280 --> 00:08:17,640 actually gave the estimates of the squares 238 00:08:18,120 --> 00:08:20,670 of theses things, so sigma squared 1 and sigma squared 2. 239 00:08:20,940 --> 00:08:21,920 So, just be careful whether 240 00:08:22,090 --> 00:08:23,260 you are using sigma 1, sigma 241 00:08:23,380 --> 00:08:25,490 2, or sigma squared 1 or sigma squared 2. 242 00:08:25,960 --> 00:08:26,700 So, sigma squared 1 of course 243 00:08:26,820 --> 00:08:28,500 would be equal to 4, for 244 00:08:31,130 --> 00:08:32,260 example, as the square of 2. 245 00:08:32,310 --> 00:08:34,010 And in pictures what p of 246 00:08:34,180 --> 00:08:35,550 x1 parametrized by mu1 and 247 00:08:35,660 --> 00:08:36,830 sigma squared 1 and p 248 00:08:37,120 --> 00:08:38,130 of x2, parametrized by mu 249 00:08:38,230 --> 00:08:39,050 2 and sigma squared 2, that 250 00:08:39,190 --> 00:08:41,360 would look like these two distributions over here. 251 00:08:42,650 --> 00:08:44,280 And, turns out that 252 00:08:44,480 --> 00:08:45,960 if were to plot of p 253 00:08:46,210 --> 00:08:47,540 of x, right, which 254 00:08:47,710 --> 00:08:49,000 is the product of these two 255 00:08:49,210 --> 00:08:50,450 things, you can actually get 256 00:08:50,800 --> 00:08:52,770 a surface plot that looks like this. 257 00:08:53,360 --> 00:08:54,370 This is a plot of p 258 00:08:54,640 --> 00:08:55,920 of x, where the height 259 00:08:56,390 --> 00:08:57,730 above of this, where the 260 00:08:57,830 --> 00:08:58,950 height of this surface at 261 00:08:58,990 --> 00:09:01,360 a particular point, so given a 262 00:09:01,470 --> 00:09:03,670 particular x1 x2 263 00:09:03,930 --> 00:09:05,640 values of x2 if 264 00:09:05,800 --> 00:09:07,830 x1 equals 2, x equal 2, that's this point. 265 00:09:08,510 --> 00:09:09,450 And the height of this 3-D 266 00:09:09,710 --> 00:09:11,280 surface here, that's p 267 00:09:13,020 --> 00:09:14,420 of x. So p of x, that is the height 268 00:09:14,710 --> 00:09:16,220 of this plot, is 269 00:09:16,340 --> 00:09:17,520 literally just p of x1 270 00:09:18,640 --> 00:09:20,010 parametrized by mu 1 sigma 271 00:09:20,290 --> 00:09:22,540 squared 1, times p 272 00:09:23,200 --> 00:09:25,050 of x2 parametrized by 273 00:09:25,120 --> 00:09:27,530 mu 2 sigma squared 2. 274 00:09:27,720 --> 00:09:29,180 Now, so this is 275 00:09:29,320 --> 00:09:31,400 how we fit the parameters to this data. 276 00:09:31,930 --> 00:09:32,950 Let's see if we have a couple of new examples. 277 00:09:33,530 --> 00:09:35,090 Maybe I have a new example there. 278 00:09:36,700 --> 00:09:38,340 Is this an anomaly or not? 279 00:09:38,550 --> 00:09:39,220 Or, maybe I have a different 280 00:09:39,570 --> 00:09:41,860 example, maybe I have a different second example over there. 281 00:09:42,140 --> 00:09:43,400 So, is that an anomaly or not? 282 00:09:44,360 --> 00:09:47,050 They way we do that is, we 283 00:09:47,190 --> 00:09:48,470 would set some value for 284 00:09:48,620 --> 00:09:49,490 Epsilon, let's say I've chosen 285 00:09:50,020 --> 00:09:51,220 Epsilon equals 0.02. 286 00:09:51,980 --> 00:09:54,110 I'll say later how we choose Epsilon. 287 00:09:55,180 --> 00:09:56,110 But let's take this first 288 00:09:56,540 --> 00:09:57,360 example, let me call this 289 00:09:57,500 --> 00:09:59,500 example X1 test. 290 00:10:00,200 --> 00:10:01,010 And let me call the second example 291 00:10:02,800 --> 00:10:03,900 X2 test. 292 00:10:04,780 --> 00:10:05,670 What we do is, we 293 00:10:05,820 --> 00:10:07,380 then compute p of 294 00:10:07,540 --> 00:10:08,740 X1 test, so we use 295 00:10:08,990 --> 00:10:10,400 this formula to compute it and 296 00:10:11,140 --> 00:10:12,760 this looks like a pretty large value. 297 00:10:13,250 --> 00:10:15,560 In particular, this is greater 298 00:10:15,920 --> 00:10:18,480 than, or greater than or equal to epsilon. 299 00:10:18,670 --> 00:10:19,670 And so this is a pretty 300 00:10:19,810 --> 00:10:21,290 high probability at least bigger 301 00:10:21,490 --> 00:10:22,510 than epsilon, so we'll say that 302 00:10:22,970 --> 00:10:24,490 X1 test is not an anomaly. 303 00:10:25,650 --> 00:10:27,370 Whereas, if you compute p of 304 00:10:27,440 --> 00:10:29,810 X2 test, well that is just a much smaller value. 305 00:10:30,170 --> 00:10:31,340 So this is less than 306 00:10:31,490 --> 00:10:32,490 epsilon and so we'll say 307 00:10:32,720 --> 00:10:34,400 that that is indeed an anomaly, 308 00:10:34,860 --> 00:10:37,350 because it is much smaller than that epsilon that we then chose. 309 00:10:38,450 --> 00:10:39,950 And in fact, I'd improve it here. 310 00:10:40,460 --> 00:10:43,340 What this is really saying is that, you look through the 3d surface plot. 311 00:10:44,660 --> 00:10:46,270 It's saying that all the 312 00:10:46,350 --> 00:10:47,940 values of x1 and x2 313 00:10:48,210 --> 00:10:50,570 that have a high height 314 00:10:50,810 --> 00:10:52,770 above the surface, corresponds to 315 00:10:52,910 --> 00:10:55,160 an a non-anomalous example of an OK or normal example. 316 00:10:55,970 --> 00:10:57,450 Whereas all the points far out 317 00:10:57,640 --> 00:10:58,940 here, all the points out 318 00:10:59,150 --> 00:11:00,460 here, all of those 319 00:11:00,580 --> 00:11:01,740 points have very low 320 00:11:01,910 --> 00:11:02,940 probability, so we are 321 00:11:03,020 --> 00:11:04,310 going to flag those points 322 00:11:04,620 --> 00:11:06,350 as anomalous, and so it's gonna define 323 00:11:06,760 --> 00:11:07,790 some region, that maybe looks 324 00:11:08,000 --> 00:11:09,480 like this, so that everything 325 00:11:09,810 --> 00:11:12,160 outside this, it flags 326 00:11:12,380 --> 00:11:12,580 as anomalous, 327 00:11:14,940 --> 00:11:16,260 whereas the things inside this 328 00:11:16,770 --> 00:11:18,340 ellipse I just drew, if it 329 00:11:18,570 --> 00:11:21,320 considers okay, or non-anomalous, not anomalous examples. 330 00:11:22,110 --> 00:11:24,040 And so this example x2 331 00:11:24,250 --> 00:11:26,260 test lies outside 332 00:11:26,650 --> 00:11:27,510 that region, and so it 333 00:11:27,620 --> 00:11:30,280 has very small probability, and so we consider it an anomalous example. 334 00:11:31,400 --> 00:11:32,990 In this video we talked about how to 335 00:11:33,460 --> 00:11:35,440 estimate p of x, the probability of 336 00:11:35,590 --> 00:11:36,840 x, for the purpose of 337 00:11:36,930 --> 00:11:38,740 developing an anomaly detection algorithm. 338 00:11:39,880 --> 00:11:40,890 And in this video, we also 339 00:11:41,260 --> 00:11:42,970 stepped through an entire process 340 00:11:43,830 --> 00:11:45,090 of giving data set, we 341 00:11:45,340 --> 00:11:47,740 have, fitting the parameters, doing parameter estimations. 342 00:11:48,370 --> 00:11:50,570 We get mu and sigma parameters, and 343 00:11:50,700 --> 00:11:52,180 then taking new examples and deciding 344 00:11:52,740 --> 00:11:54,110 if the new examples are anomalous or not. 345 00:11:55,490 --> 00:11:56,800 In the next few videos we 346 00:11:56,880 --> 00:11:58,580 will delve deeper into this algorithm, 347 00:11:58,980 --> 00:11:59,930 and talk a bit more 348 00:12:00,230 --> 00:12:02,310 about how to actually get this to work well.