1 00:00:00,120 --> 00:00:01,220 In the last video, we developed 2 00:00:01,850 --> 00:00:03,200 an anomaly detection algorithm. 3 00:00:04,150 --> 00:00:05,240 In this video, I like to 4 00:00:05,300 --> 00:00:06,870 talk about the process of how 5 00:00:07,090 --> 00:00:08,750 to go about developing a specific 6 00:00:09,060 --> 00:00:10,790 application of anomaly detection 7 00:00:11,410 --> 00:00:12,810 to a problem and in particular 8 00:00:13,470 --> 00:00:14,500 this will focus on the problem 9 00:00:15,090 --> 00:00:18,700 of how to evaluate an anomaly detection algorithm. In 10 00:00:18,880 --> 00:00:20,490 previous videos, we've already talked 11 00:00:20,800 --> 00:00:22,380 about the importance of real 12 00:00:22,570 --> 00:00:24,770 number evaluation and this captures the idea that 13 00:00:25,170 --> 00:00:26,810 when you're trying to develop 14 00:00:27,270 --> 00:00:28,460 a learning algorithm for a 15 00:00:28,690 --> 00:00:30,300 specific application, you need to 16 00:00:30,560 --> 00:00:31,540 often make a lot of choices 17 00:00:31,710 --> 00:00:34,410 like, you know, choosing what features to use and then so on. 18 00:00:35,010 --> 00:00:36,800 And making decisions about all 19 00:00:36,880 --> 00:00:38,540 of these choices is often much 20 00:00:38,780 --> 00:00:39,890 easier, and if you have 21 00:00:40,040 --> 00:00:41,330 a way to evaluate your learning 22 00:00:41,410 --> 00:00:43,190 algorithm that just gives you back a number. 23 00:00:44,200 --> 00:00:44,950 So if you're trying to decide, 24 00:00:45,980 --> 00:00:47,130 you know, I have an idea for 25 00:00:47,220 --> 00:00:49,730 one extra feature, do I include this feature or not. 26 00:00:50,560 --> 00:00:51,560 If you can run the algorithm 27 00:00:51,760 --> 00:00:52,830 with the feature, and run the 28 00:00:52,960 --> 00:00:54,420 algorithm without the feature, and 29 00:00:54,570 --> 00:00:55,960 just get back a number that 30 00:00:56,100 --> 00:00:57,350 tells you, you know, did 31 00:00:57,460 --> 00:01:00,070 it improve or worsen performance to add this feature? 32 00:01:00,670 --> 00:01:01,480 Then it gives you a much better 33 00:01:01,670 --> 00:01:04,370 way, a much simpler way, with which 34 00:01:04,590 --> 00:01:06,110 to decide whether or not to include that feature. 35 00:01:07,570 --> 00:01:09,010 So in order to be 36 00:01:09,200 --> 00:01:10,850 able to develop an anomaly 37 00:01:11,410 --> 00:01:13,880 detection system quickly, it would 38 00:01:14,080 --> 00:01:14,960 be a really helpful to have 39 00:01:15,150 --> 00:01:17,820 a way of evaluating an anomaly detection system. 40 00:01:19,260 --> 00:01:20,420 In order to do this, 41 00:01:20,790 --> 00:01:22,380 in order to evaluate an anomaly 42 00:01:22,730 --> 00:01:24,080 detection system, we're 43 00:01:24,310 --> 00:01:26,380 actually going to assume have some labeled data. 44 00:01:27,270 --> 00:01:28,270 So, so far, we'll be treating 45 00:01:28,420 --> 00:01:29,870 anomaly detection as an 46 00:01:30,310 --> 00:01:31,770 unsupervised learning problem, using 47 00:01:32,210 --> 00:01:33,560 unlabeled data. 48 00:01:34,010 --> 00:01:35,190 But if you have some labeled 49 00:01:35,560 --> 00:01:37,390 data that specifies what 50 00:01:37,700 --> 00:01:39,570 are some anomalous examples, and 51 00:01:39,670 --> 00:01:42,030 what are some non-anomalous examples, then 52 00:01:42,470 --> 00:01:43,350 this is how we actually 53 00:01:43,630 --> 00:01:45,670 think of as the standard way of evaluating an anomaly detection algorithm. 54 00:01:45,820 --> 00:01:49,020 So taking the 55 00:01:49,300 --> 00:01:50,580 aircraft engine example again. 56 00:01:51,010 --> 00:01:52,680 Let's say that, you know, we have some 57 00:01:53,070 --> 00:01:55,840 label data of just a few anomalous 58 00:01:56,330 --> 00:01:57,890 examples of some aircraft engines 59 00:01:58,400 --> 00:02:00,780 that were manufactured in the past that turns out to be anomalous. 60 00:02:01,520 --> 00:02:02,360 Turned out to be flawed or strange in some way. 61 00:02:02,400 --> 00:02:04,130 Let's say we 62 00:02:04,360 --> 00:02:05,750 use we also have some non-anomalous 63 00:02:06,100 --> 00:02:07,810 examples, so some 64 00:02:08,050 --> 00:02:10,200 perfectly okay examples. 65 00:02:10,940 --> 00:02:12,050 I'm going to use y equals 0 66 00:02:12,110 --> 00:02:13,600 to denote the normal or the 67 00:02:13,790 --> 00:02:15,470 non-anomalous example and 68 00:02:15,530 --> 00:02:21,450 y equals 1 to denote the anomalous examples. 69 00:02:22,450 --> 00:02:24,670 The process of developing and evaluating an anomaly 70 00:02:25,130 --> 00:02:26,450 detection algorithm is as follows. 71 00:02:27,500 --> 00:02:28,300 We're going to think of it as 72 00:02:28,560 --> 00:02:29,830 a training set and talk 73 00:02:30,000 --> 00:02:31,310 about the cross validation in test 74 00:02:31,440 --> 00:02:32,580 sets later, but the training set we usually 75 00:02:33,280 --> 00:02:34,000 think of this as still the unlabeled 76 00:02:35,040 --> 00:02:36,180 training set. 77 00:02:36,510 --> 00:02:37,250 And so this is our large 78 00:02:37,560 --> 00:02:39,580 collection of normal, non-anomalous 79 00:02:40,190 --> 00:02:41,130 or not anomalous examples. 80 00:02:42,400 --> 00:02:43,530 And usually we think 81 00:02:43,690 --> 00:02:44,750 of this as being as non-anomalous, 82 00:02:45,010 --> 00:02:46,490 but it's actually okay even 83 00:02:46,740 --> 00:02:48,660 if a few anomalies slip into 84 00:02:48,660 --> 00:02:51,240 your unlabeled training set. 85 00:02:51,420 --> 00:02:52,100 And next we are going to 86 00:02:52,310 --> 00:02:53,830 define a cross validation set 87 00:02:54,100 --> 00:02:55,510 and a test set, with which 88 00:02:55,750 --> 00:02:58,360 to evaluate a particular anomaly detection algorithm. 89 00:02:59,230 --> 00:03:00,850 So, specifically, for both the 90 00:03:01,000 --> 00:03:01,960 cross validation test sets we're 91 00:03:02,080 --> 00:03:03,590 going to assume that, you know, we 92 00:03:03,800 --> 00:03:05,030 can include a few examples 93 00:03:05,670 --> 00:03:06,720 in the cross validation set and 94 00:03:06,900 --> 00:03:08,150 the test set that contain examples 95 00:03:08,910 --> 00:03:09,660 that are known to be anomalous. 96 00:03:10,200 --> 00:03:11,410 So the test sets say 97 00:03:11,950 --> 00:03:13,270 we have a few examples with 98 00:03:13,340 --> 00:03:14,770 y equals 1 that 99 00:03:15,040 --> 00:03:17,470 correspond to anomalous aircraft engines. 100 00:03:18,640 --> 00:03:19,800 So here's a specific example. 101 00:03:20,930 --> 00:03:23,120 Let's say that, altogether, this 102 00:03:23,280 --> 00:03:24,990 is the data that we have. 103 00:03:25,260 --> 00:03:27,410 We have manufactured 10,000 examples 104 00:03:28,130 --> 00:03:29,140 of engines that, as far 105 00:03:29,450 --> 00:03:30,740 as we know we're perfectly normal, 106 00:03:31,220 --> 00:03:33,110 perfectly good aircraft engines. 107 00:03:34,060 --> 00:03:35,240 And again, it turns out to be okay even 108 00:03:35,560 --> 00:03:37,310 if a few flawed engine 109 00:03:37,740 --> 00:03:39,400 slips into the set of 110 00:03:39,550 --> 00:03:40,860 10,000 is actually okay, but 111 00:03:40,970 --> 00:03:41,970 we kind of assumed that the vast 112 00:03:42,410 --> 00:03:44,300 majority of these 113 00:03:44,500 --> 00:03:47,660 10,000 examples are, you know, good and normal non-anomalous engines. 114 00:03:48,480 --> 00:03:50,940 And let's say that, you know, historically, however 115 00:03:51,200 --> 00:03:52,120 long we've been running on manufacturing 116 00:03:52,650 --> 00:03:54,130 plant, let's say that 117 00:03:54,480 --> 00:03:55,930 we end up getting features, 118 00:03:56,440 --> 00:03:57,970 getting 24 to 28 119 00:03:58,240 --> 00:04:00,180 anomalous engines as well. 120 00:04:01,120 --> 00:04:03,030 And for a pretty typical application of 121 00:04:03,310 --> 00:04:05,490 anomaly detection, you know, the number non-anomalous 122 00:04:06,740 --> 00:04:08,090 examples, that is with y equals 123 00:04:08,760 --> 00:04:10,650 1, we may have anywhere from, you know, 20 to 50. 124 00:04:10,820 --> 00:04:12,920 It would be a pretty typical 125 00:04:13,360 --> 00:04:14,570 range of examples, number of 126 00:04:14,830 --> 00:04:16,710 examples that we have with y equals 1. 127 00:04:16,910 --> 00:04:17,730 And usually we will have a 128 00:04:17,860 --> 00:04:20,000 much larger number of good examples. 129 00:04:21,810 --> 00:04:23,150 So, given this data set, 130 00:04:24,180 --> 00:04:25,410 a fairly typical way to split 131 00:04:25,850 --> 00:04:27,150 it into the training set, 132 00:04:27,430 --> 00:04:29,210 cross validation set and test set would be as follows. 133 00:04:30,390 --> 00:04:31,880 Let's take 10,000 good aircraft 134 00:04:32,360 --> 00:04:34,060 engines and put 6,000 135 00:04:34,260 --> 00:04:37,100 of that into the unlabeled training set. 136 00:04:37,620 --> 00:04:38,800 So, I'm calling this an unlabeled training 137 00:04:39,130 --> 00:04:40,050 set but all of these examples 138 00:04:40,640 --> 00:04:42,510 are really ones that correspond to 139 00:04:42,810 --> 00:04:44,380 y equals 0, as far as we know. 140 00:04:45,300 --> 00:04:46,350 And so, we will use this to 141 00:04:46,520 --> 00:04:48,840 fit p of x, right. 142 00:04:49,150 --> 00:04:49,850 So, we will use these 6000 engines 143 00:04:50,350 --> 00:04:51,180 to fit p of x, which 144 00:04:51,360 --> 00:04:52,190 is that p of x 145 00:04:52,420 --> 00:04:53,930 one parametrized by Mu 146 00:04:54,330 --> 00:04:56,380 1, sigma squared 1, up 147 00:04:56,540 --> 00:04:57,700 to p of Xn parametrized 148 00:04:58,370 --> 00:04:59,570 by Mu N sigma squared 149 00:05:00,790 --> 00:05:02,300 n. And so it would be these 150 00:05:02,500 --> 00:05:03,930 6,000 examples that we would 151 00:05:04,110 --> 00:05:05,370 use to estimate the parameters 152 00:05:05,590 --> 00:05:06,760 Mu 1, sigma squared 1, 153 00:05:07,140 --> 00:05:08,960 up to Mu N, sigma 154 00:05:09,200 --> 00:05:10,280 squared N. And so that's our training 155 00:05:10,500 --> 00:05:11,960 set of all, you know, 156 00:05:12,150 --> 00:05:13,980 good, or the vast majority of good examples. 157 00:05:15,430 --> 00:05:16,950 Next we will take our good 158 00:05:17,140 --> 00:05:18,380 aircraft engines and put some 159 00:05:18,660 --> 00:05:19,470 number of them in a cross 160 00:05:19,580 --> 00:05:21,320 validation set plus some number 161 00:05:21,570 --> 00:05:22,970 of them in the test sets. 162 00:05:23,280 --> 00:05:24,300 So 6,000 plus 2,000 plus 2,000, 163 00:05:24,480 --> 00:05:25,470 that's how we split up our 164 00:05:25,740 --> 00:05:28,820 10,000 good aircraft engines. 165 00:05:29,260 --> 00:05:31,460 And then we also have 20 166 00:05:31,930 --> 00:05:33,380 flawed aircraft engines, and we'll 167 00:05:33,490 --> 00:05:34,890 take that and maybe split it 168 00:05:35,160 --> 00:05:36,100 up, you know, put ten of them in 169 00:05:36,200 --> 00:05:37,230 the cross validation set and put 170 00:05:37,370 --> 00:05:39,560 ten of them in the test sets. 171 00:05:39,850 --> 00:05:41,320 And in the next slide 172 00:05:41,660 --> 00:05:42,460 we will talk about how to 173 00:05:42,750 --> 00:05:43,800 actually use this to evaluate 174 00:05:44,520 --> 00:05:46,330 the anomaly detection algorithm. 175 00:05:48,130 --> 00:05:49,140 So what I have 176 00:05:49,220 --> 00:05:50,610 just described here is a you 177 00:05:50,790 --> 00:05:52,300 know probably the recommend a 178 00:05:52,440 --> 00:05:55,290 good way of splitting the labeled and unlabeled example. 179 00:05:55,820 --> 00:05:57,970 The good and the flawed aircraft engines. 180 00:05:58,480 --> 00:06:00,380 Where we use like 181 00:06:00,730 --> 00:06:01,650 a 60, 20, 20% split for 182 00:06:01,800 --> 00:06:03,350 the good engines and we take 183 00:06:03,570 --> 00:06:04,780 the flawed engines, and we 184 00:06:04,910 --> 00:06:05,750 put them just in the cross 185 00:06:05,870 --> 00:06:06,940 validation set, and just in 186 00:06:07,030 --> 00:06:09,200 the test set, then we'll see in the next slide why that's the case. 187 00:06:10,370 --> 00:06:12,080 Just as an aside, if you 188 00:06:12,360 --> 00:06:13,360 look at how people apply anomaly 189 00:06:13,750 --> 00:06:15,400 detection algorithms, sometimes you see 190 00:06:15,510 --> 00:06:16,980 other peoples' split the data differently as well. 191 00:06:17,460 --> 00:06:19,400 So, another alternative, this is really 192 00:06:19,660 --> 00:06:21,290 not a recommended alternative, but 193 00:06:21,470 --> 00:06:23,650 some people want to 194 00:06:23,790 --> 00:06:24,770 take off your 10,000 good engines, maybe put 6000 195 00:06:24,820 --> 00:06:26,020 of them in your training set 196 00:06:26,320 --> 00:06:27,130 and then put the same 197 00:06:27,650 --> 00:06:28,800 4000 in the cross validation 198 00:06:30,380 --> 00:06:31,020 set and the test set. 199 00:06:31,170 --> 00:06:32,030 And so, you know, we like to think of the cross 200 00:06:32,360 --> 00:06:33,340 validation set and the 201 00:06:33,400 --> 00:06:34,620 test set as being completely 202 00:06:35,280 --> 00:06:36,370 different data sets to each other. 203 00:06:37,690 --> 00:06:39,030 But you know, in anomaly detection, you know, 204 00:06:39,230 --> 00:06:40,360 for sometimes you see 205 00:06:40,600 --> 00:06:41,760 people, sort of, use the 206 00:06:42,070 --> 00:06:42,970 same set of good engines 207 00:06:43,370 --> 00:06:44,640 in the cross validation sets, and 208 00:06:44,710 --> 00:06:46,150 the test sets, and sometimes you 209 00:06:46,250 --> 00:06:48,070 see people use exactly the 210 00:06:48,180 --> 00:06:49,800 same sets of anomalous 211 00:06:50,880 --> 00:06:54,190 engines in the cross validation set and the test set. 212 00:06:54,590 --> 00:06:55,970 And so, all of these are considered, you know, 213 00:06:56,850 --> 00:06:59,030 less good practices and definitely less recommended. 214 00:07:00,250 --> 00:07:01,360 Certainly using the same 215 00:07:01,650 --> 00:07:02,530 data in the cross validation 216 00:07:03,200 --> 00:07:04,220 set and the test set, that 217 00:07:04,510 --> 00:07:06,400 is not considered a good machine learning practice. 218 00:07:07,180 --> 00:07:08,780 But, sometimes you see people do this too. 219 00:07:09,790 --> 00:07:11,330 So, given the training cross 220 00:07:11,550 --> 00:07:12,780 validation and test sets, 221 00:07:13,260 --> 00:07:15,220 here's how you evaluate or 222 00:07:15,370 --> 00:07:16,920 here is how you develop and evaluate an algorithm. 223 00:07:18,490 --> 00:07:19,510 First, we take the training sets 224 00:07:19,910 --> 00:07:20,750 and we fit the model p of 225 00:07:20,840 --> 00:07:21,860 x. So, we fit, you know, all these 226 00:07:22,210 --> 00:07:24,600 Gaussians to my m 227 00:07:25,060 --> 00:07:26,690 unlabeled examples of aircraft engines, 228 00:07:27,090 --> 00:07:28,140 and these, I am calling them 229 00:07:28,270 --> 00:07:29,560 unlabeled examples, but these are 230 00:07:29,660 --> 00:07:31,370 really examples that we're assuming 231 00:07:31,790 --> 00:07:33,390 our goods are the normal aircraft engines. 232 00:07:34,580 --> 00:07:36,510 Then imagine that your 233 00:07:36,650 --> 00:07:38,590 anomaly detection algorithm is actually making prediction. 234 00:07:39,030 --> 00:07:40,070 So, on the cross validation 235 00:07:40,630 --> 00:07:42,280 of the test set, given that, 236 00:07:42,610 --> 00:07:44,660 say, test example X, think 237 00:07:44,840 --> 00:07:46,490 of the algorithm as predicting that 238 00:07:46,730 --> 00:07:48,090 y is equal to 1, p 239 00:07:48,230 --> 00:07:49,320 of x is less than epsilon, 240 00:07:50,040 --> 00:07:51,740 we must be taking zero, if 241 00:07:52,280 --> 00:07:54,760 p of x is 242 00:07:54,950 --> 00:07:57,340 greater than or equal to epsilon. 243 00:07:58,390 --> 00:07:59,280 So, given x, it's trying to predict, what is 244 00:07:59,340 --> 00:08:00,270 the label, given y equals 1 corresponding 245 00:08:00,500 --> 00:08:01,470 to an anomaly or is 246 00:08:01,630 --> 00:08:06,380 it y equals 0 corresponding to a normal example? 247 00:08:07,290 --> 00:08:09,480 So given the training, cross validation, and test sets. 248 00:08:09,940 --> 00:08:11,080 How do you develop an algorithm? 249 00:08:11,480 --> 00:08:12,920 And more specifically, how do 250 00:08:13,010 --> 00:08:15,450 you evaluate an anomaly detection algorithm? 251 00:08:15,790 --> 00:08:17,470 Well, to this whole, 252 00:08:17,820 --> 00:08:19,410 the first step is to take 253 00:08:19,670 --> 00:08:21,130 the unlabeled training set, and 254 00:08:21,290 --> 00:08:23,520 to fit the model p of x lead training data. 255 00:08:23,990 --> 00:08:25,060 So you take this, you know 256 00:08:25,130 --> 00:08:26,620 on I'm coming, unlabeled training set, 257 00:08:26,910 --> 00:08:28,390 but really, these are examples 258 00:08:28,870 --> 00:08:30,290 that we are assuming, vast majority 259 00:08:30,750 --> 00:08:32,400 of which are normal aircraft engines, 260 00:08:32,900 --> 00:08:34,020 not because they're not anomalies 261 00:08:34,150 --> 00:08:35,380 and it will 262 00:08:35,490 --> 00:08:36,470 fit the model p of x. It 263 00:08:36,640 --> 00:08:38,110 will fit all those parameters for all 264 00:08:38,240 --> 00:08:40,330 the Gaussians on this data. 265 00:08:41,560 --> 00:08:43,190 Next on the cross validation 266 00:08:43,300 --> 00:08:44,480 of the test set, we're 267 00:08:44,600 --> 00:08:45,460 going to think of the anomaly 268 00:08:46,100 --> 00:08:47,530 detention algorithm as trying to 269 00:08:47,640 --> 00:08:48,580 predict the value of 270 00:08:49,540 --> 00:08:51,670 y. So in each of like 271 00:08:52,430 --> 00:08:53,470 say test examples. 272 00:08:54,140 --> 00:08:56,110 We have these X-I tests, 273 00:08:57,200 --> 00:08:58,720 Y-I test, where y is 274 00:08:58,870 --> 00:09:00,100 going to be equal to 275 00:09:00,320 --> 00:09:03,230 1 or 0 depending on whether this was an anomalous example. 276 00:09:04,370 --> 00:09:05,510 So given input x in 277 00:09:05,600 --> 00:09:07,340 my test set, my anomaly detection 278 00:09:07,730 --> 00:09:08,850 algorithm think of it as 279 00:09:09,100 --> 00:09:11,880 predicting the y as 1 if p of x is less than epsilon. 280 00:09:12,240 --> 00:09:15,120 So predicting that it is an anomaly, it is probably is very low. 281 00:09:15,960 --> 00:09:17,810 And we think of the algorithm is predicting that y is equal to 0. 282 00:09:17,970 --> 00:09:20,830 If p of x is greater then or equals epsilon. 283 00:09:21,290 --> 00:09:23,600 So predicting those normal example 284 00:09:24,200 --> 00:09:26,340 if the p of x is reasonably large. 285 00:09:27,350 --> 00:09:28,510 And so we can now 286 00:09:28,720 --> 00:09:29,930 think of the anomaly detection algorithm 287 00:09:30,580 --> 00:09:32,040 as making predictions for what 288 00:09:32,240 --> 00:09:33,490 are the values of these y 289 00:09:33,830 --> 00:09:35,080 labels in the test sets 290 00:09:36,050 --> 00:09:37,000 or on the cross validation set. 291 00:09:37,720 --> 00:09:39,140 And this puts us somewhat more similar 292 00:09:39,670 --> 00:09:41,250 to the supervised learning setting, right? 293 00:09:41,620 --> 00:09:42,870 Where we have label test 294 00:09:43,170 --> 00:09:44,550 set and our algorithm is 295 00:09:44,800 --> 00:09:46,060 making predictions on these labels 296 00:09:46,190 --> 00:09:48,050 and so we can evaluate it you 297 00:09:48,480 --> 00:09:50,930 know by seeing how often it gets these labels right. 298 00:09:52,180 --> 00:09:53,820 Of course these labels are 299 00:09:54,540 --> 00:09:56,420 will be very skewed because y 300 00:09:56,710 --> 00:09:57,930 equals zero, that is normal 301 00:09:58,300 --> 00:10:00,490 examples, usually be much 302 00:10:00,780 --> 00:10:01,930 more common than y equals 303 00:10:02,310 --> 00:10:03,520 1 than anomalous examples. 304 00:10:04,660 --> 00:10:05,610 But, you know, this is much closer 305 00:10:06,040 --> 00:10:06,990 to the source of evaluation 306 00:10:07,690 --> 00:10:09,770 metrics we can use in supervised learning. 307 00:10:12,390 --> 00:10:14,500 So what's a good evaluation metric to use. 308 00:10:14,790 --> 00:10:18,530 Well, because the data is 309 00:10:18,840 --> 00:10:20,450 very skewed, because y equals 0 is 310 00:10:20,880 --> 00:10:22,980 much more common, classification accuracy 311 00:10:23,520 --> 00:10:24,950 would not be a good the evaluation metrics. 312 00:10:25,360 --> 00:10:26,760 So, we talked about this in the earlier video. 313 00:10:28,360 --> 00:10:29,130 So, if you have a very 314 00:10:29,410 --> 00:10:31,360 skewed data set, then predicting 315 00:10:31,740 --> 00:10:32,750 y equals 0 all the time, 316 00:10:33,430 --> 00:10:34,300 will have very high classification accuracy. 317 00:10:35,690 --> 00:10:36,820 Instead, we should use evaluation 318 00:10:37,330 --> 00:10:38,920 metrics, like computing the fraction 319 00:10:39,530 --> 00:10:41,030 of true positives, false positives, 320 00:10:41,670 --> 00:10:42,940 false negatives, true negatives or 321 00:10:43,580 --> 00:10:44,830 compute the position of the 322 00:10:44,890 --> 00:10:46,370 v curve of this algorithm or 323 00:10:46,790 --> 00:10:48,370 do things like compute the 324 00:10:48,520 --> 00:10:50,510 f1 score, right, which is 325 00:10:50,630 --> 00:10:51,680 a single real number way of summarizing 326 00:10:52,600 --> 00:10:53,450 the position and the recall numbers. 327 00:10:54,340 --> 00:10:55,090 And so these would be ways 328 00:10:55,750 --> 00:10:56,940 to evaluate an anomaly detection 329 00:10:57,320 --> 00:10:59,090 algorithm on your cross validation set or on your test set. 330 00:11:01,550 --> 00:11:02,960 Finally, earlier in the 331 00:11:03,100 --> 00:11:05,050 anomaly detection algorithm, we 332 00:11:05,200 --> 00:11:06,720 also had this parameter epsilon, right? 333 00:11:07,400 --> 00:11:09,100 So, epsilon is this threshold 334 00:11:09,920 --> 00:11:11,320 that we would use to decide 335 00:11:11,790 --> 00:11:13,630 when to flag something as an anomaly. 336 00:11:14,840 --> 00:11:16,740 And so, if you have 337 00:11:16,840 --> 00:11:18,380 a cross validation set, another way 338 00:11:19,000 --> 00:11:20,320 to and to choose this parameter 339 00:11:20,710 --> 00:11:22,020 epsilon, would be to 340 00:11:22,240 --> 00:11:24,090 try a different, try many 341 00:11:24,340 --> 00:11:26,220 different values of epsilon, and 342 00:11:26,380 --> 00:11:27,520 then pick the value of epsilon 343 00:11:27,990 --> 00:11:30,670 that, let's say, maximizes f1 344 00:11:31,620 --> 00:11:33,940 score, or that otherwise does well on your cross validation set. 345 00:11:35,340 --> 00:11:36,770 And more generally, the way 346 00:11:37,000 --> 00:11:38,230 to reduce the training, testing, 347 00:11:38,630 --> 00:11:40,230 and cross validation sets, is that 348 00:11:41,760 --> 00:11:43,020 when we are trying to make decisions, 349 00:11:43,640 --> 00:11:45,430 like what features to include, or 350 00:11:45,570 --> 00:11:46,580 trying to, you know, tune the parameter 351 00:11:47,100 --> 00:11:48,160 epsilon, we would then 352 00:11:48,410 --> 00:11:51,010 continually evaluate the algorithm 353 00:11:51,500 --> 00:11:52,870 on the cross validation sets and 354 00:11:53,000 --> 00:11:54,120 make all those decisions like what 355 00:11:54,320 --> 00:11:55,700 features did you use, you know, 356 00:11:55,790 --> 00:11:57,650 how to set epsilon, use that, evaluate 357 00:11:58,240 --> 00:11:59,410 the algorithm on the cross validation 358 00:11:59,880 --> 00:12:00,870 set, and then when we've 359 00:12:01,320 --> 00:12:02,770 picked the set of features, when 360 00:12:02,910 --> 00:12:03,860 we've found the value of 361 00:12:04,070 --> 00:12:05,150 epsilon that we're happy with, we 362 00:12:05,250 --> 00:12:07,030 can then take the final model and 363 00:12:07,270 --> 00:12:08,680 evaluate it, you know, do the 364 00:12:08,770 --> 00:12:11,360 final evaluation of the algorithm on the test sets. 365 00:12:12,680 --> 00:12:13,900 So, in this video, we talked 366 00:12:14,180 --> 00:12:15,240 about the process of how 367 00:12:15,520 --> 00:12:17,060 to evaluate an anomaly 368 00:12:17,520 --> 00:12:18,970 detection algorithm, and again, 369 00:12:19,960 --> 00:12:21,220 having being able to evaluate 370 00:12:21,410 --> 00:12:22,540 an algorithm, you know, with 371 00:12:22,640 --> 00:12:23,830 a single real number evaluation, 372 00:12:24,730 --> 00:12:25,630 with a number like an F1 score 373 00:12:26,530 --> 00:12:27,660 that often allows you to 374 00:12:28,080 --> 00:12:29,710 much more efficient use 375 00:12:29,960 --> 00:12:31,160 of your time when you are 376 00:12:31,280 --> 00:12:33,250 trying to develop an anomaly detection system. 377 00:12:33,800 --> 00:12:34,970 And we try to make these sorts of decisions. 378 00:12:35,650 --> 00:12:38,020 I have to chose epsilon, what features to include, and so on. 379 00:12:38,970 --> 00:12:39,920 In this video, we started 380 00:12:40,330 --> 00:12:40,830 to use a bit of labeled 381 00:12:41,590 --> 00:12:42,750 data in order to 382 00:12:43,020 --> 00:12:43,550 evaluate the anomaly detection algorithm and 383 00:12:43,570 --> 00:12:45,710 this takes us a 384 00:12:45,890 --> 00:12:48,340 little bit closer to a supervised learning setting. 385 00:12:49,610 --> 00:12:50,810 In the next video, I'm going 386 00:12:51,000 --> 00:12:52,780 to say a bit more about that. 387 00:12:52,940 --> 00:12:54,240 And in particular we'll talk about when should 388 00:12:54,440 --> 00:12:55,860 you be using an anomaly detection 389 00:12:56,310 --> 00:12:57,130 algorithm and when should we 390 00:12:57,560 --> 00:13:00,770 be thinking about using supervised learning instead, and what are the differences between these two formalisms.