1 00:00:00,200 --> 00:00:01,770 By now you've seen the anomaly 2 00:00:02,250 --> 00:00:03,540 detection algorithm and we've 3 00:00:03,740 --> 00:00:05,240 also talked about how to 4 00:00:05,570 --> 00:00:06,870 evaluate an anomaly detection 5 00:00:07,330 --> 00:00:08,880 algorithm. It turns out, 6 00:00:09,530 --> 00:00:10,800 that when you're applying anomaly 7 00:00:11,170 --> 00:00:12,400 detection, one of the 8 00:00:12,460 --> 00:00:13,290 things that has a huge 9 00:00:13,720 --> 00:00:14,860 effect on how well it 10 00:00:14,940 --> 00:00:16,440 does, is what features you 11 00:00:16,520 --> 00:00:17,720 use, and what features you choose, 12 00:00:18,530 --> 00:00:19,910 to give the anomaly detection algorithm. 13 00:00:20,830 --> 00:00:22,170 So in this video, what I'd 14 00:00:22,280 --> 00:00:23,390 like to do is say a few 15 00:00:23,480 --> 00:00:24,890 words, give some suggestions and 16 00:00:25,000 --> 00:00:26,250 guidelines for how to 17 00:00:26,370 --> 00:00:27,920 go about designing or selecting 18 00:00:28,470 --> 00:00:30,950 features give to an anomaly detection algorithm. 19 00:00:33,920 --> 00:00:35,310 In our anomaly detection algorithm, 20 00:00:36,120 --> 00:00:37,270 one of the things we did was 21 00:00:37,510 --> 00:00:40,330 model the features using this sort of Gaussian distribution. 22 00:00:41,180 --> 00:00:42,810 With xi to mu 23 00:00:43,120 --> 00:00:46,050 i, sigma squared i, lets say. 24 00:00:46,550 --> 00:00:47,890 And so one thing that 25 00:00:47,950 --> 00:00:49,620 I often do would be to plot the 26 00:00:50,670 --> 00:00:52,260 data or the histogram of 27 00:00:52,330 --> 00:00:53,490 the data, to make sure that 28 00:00:53,940 --> 00:00:55,210 the data looks vaguely 29 00:00:55,540 --> 00:00:57,320 Gaussian before feeding it 30 00:00:57,470 --> 00:00:58,830 to my anomaly detection algorithm. 31 00:00:59,810 --> 00:01:01,040 And, it'll usually work okay, 32 00:01:01,610 --> 00:01:02,820 even if your data isn't Gaussian, 33 00:01:03,400 --> 00:01:05,700 but this is sort of a nice sanitary check to run. 34 00:01:05,970 --> 00:01:06,860 And by the way, in case your data 35 00:01:07,400 --> 00:01:09,540 looks non-Gaussian, the algorithms will often work just find. 36 00:01:10,410 --> 00:01:12,070 But, concretely if I 37 00:01:12,430 --> 00:01:13,510 plot the data like this, 38 00:01:13,850 --> 00:01:15,280 and if it looks like a histogram like 39 00:01:15,370 --> 00:01:16,480 this, and the way 40 00:01:16,630 --> 00:01:17,800 to plot a histogram is to 41 00:01:17,950 --> 00:01:19,910 use the HIST, or the 42 00:01:20,130 --> 00:01:21,820 HIST command in Octave, 43 00:01:21,910 --> 00:01:22,800 but it looks like this, this looks 44 00:01:23,010 --> 00:01:24,770 vaguely Gaussian, so if 45 00:01:24,940 --> 00:01:26,200 my features look like this, 46 00:01:26,480 --> 00:01:29,970 I would be pretty happy feeding into my algorithm. 47 00:01:30,180 --> 00:01:31,830 But if i were to plot a histogram of my 48 00:01:31,950 --> 00:01:33,070 data, and it were 49 00:01:33,210 --> 00:01:34,800 to look like this well, this 50 00:01:35,060 --> 00:01:36,090 doesn't look at all like a 51 00:01:36,220 --> 00:01:38,430 bell shaped curve, this is a very asymmetric distribution, 52 00:01:39,410 --> 00:01:40,660 it has a peak way off to one side. 53 00:01:41,750 --> 00:01:42,660 If this is what my data 54 00:01:42,800 --> 00:01:43,960 looks like, what I'll often 55 00:01:44,190 --> 00:01:45,370 do is play with different 56 00:01:45,730 --> 00:01:46,920 transformations of the data in order 57 00:01:47,010 --> 00:01:48,850 to make it look more Gaussian. 58 00:01:49,480 --> 00:01:51,940 And again the algorithm will usually work okay, even if you don't. 59 00:01:52,590 --> 00:01:53,660 But if you use these transformations 60 00:01:54,630 --> 00:01:56,590 to make your data more gaussian, it might work a bit better. 61 00:01:58,030 --> 00:01:59,780 So given the data set 62 00:02:00,140 --> 00:02:01,340 that looks like this, what I 63 00:02:01,430 --> 00:02:02,810 might do is take a 64 00:02:03,010 --> 00:02:04,520 log transformation of the 65 00:02:04,660 --> 00:02:05,930 data and if i 66 00:02:06,060 --> 00:02:07,810 do that and re-plot the 67 00:02:08,150 --> 00:02:09,110 histogram, what I end up 68 00:02:09,330 --> 00:02:10,500 with in this particular example, 69 00:02:11,130 --> 00:02:12,400 is a histogram that looks like this. 70 00:02:12,540 --> 00:02:14,470 And this looks much more Gaussian, right? 71 00:02:14,650 --> 00:02:15,720 This looks much more like the classic 72 00:02:16,690 --> 00:02:18,020 bell shaped curve, that we 73 00:02:18,710 --> 00:02:21,000 can fit with some mean and variance paramater sigma. 74 00:02:22,180 --> 00:02:22,940 So what I mean by taking 75 00:02:23,230 --> 00:02:24,610 a log transform, is really that 76 00:02:24,860 --> 00:02:26,140 if I have some feature x1 and 77 00:02:26,860 --> 00:02:28,260 then the histogram of x1 looks 78 00:02:28,720 --> 00:02:30,500 like this then I might 79 00:02:31,070 --> 00:02:32,210 take my feature x1 80 00:02:32,410 --> 00:02:33,890 and replace it with log 81 00:02:34,800 --> 00:02:36,730 of x1 and this is 82 00:02:36,860 --> 00:02:37,880 my new x1 that I'll plot 83 00:02:38,170 --> 00:02:40,000 to the histogram over on the right, and this looks much 84 00:02:40,430 --> 00:02:42,350 more Guassian. 85 00:02:44,000 --> 00:02:44,730 Rather than just a log transform some other things you can 86 00:02:44,920 --> 00:02:46,020 do, might be, let's say 87 00:02:46,110 --> 00:02:47,720 I have a different feature x2, 88 00:02:48,690 --> 00:02:49,840 maybe I'll replace that will 89 00:02:50,120 --> 00:02:52,560 log x plus 1, 90 00:02:52,630 --> 00:02:54,720 or more generally with log 91 00:02:56,360 --> 00:02:57,690 x with x2 and 92 00:02:58,430 --> 00:03:00,350 some constant c and this 93 00:03:00,520 --> 00:03:01,540 constant could be something 94 00:03:01,890 --> 00:03:04,390 that I play with, to try to make it look as Gaussian as possible. 95 00:03:05,610 --> 00:03:06,820 Or for a different feature x3, maybe 96 00:03:07,200 --> 00:03:08,610 I'll replace it with x3, 97 00:03:09,730 --> 00:03:11,250 I might take the square root. 98 00:03:11,610 --> 00:03:14,180 The square root is just x3 to the power of one half, right? 99 00:03:15,260 --> 00:03:16,660 And this one half 100 00:03:17,130 --> 00:03:19,220 is another example of a parameter I can play with. 101 00:03:19,640 --> 00:03:21,600 So, I might have x4 and 102 00:03:22,450 --> 00:03:23,820 maybe I might instead replace 103 00:03:24,410 --> 00:03:25,370 that with x4 to the power 104 00:03:25,730 --> 00:03:26,790 of something else, maybe to the 105 00:03:26,890 --> 00:03:28,460 power of 1/3. 106 00:03:28,940 --> 00:03:30,830 And these, all of 107 00:03:30,900 --> 00:03:32,320 these, this one, this 108 00:03:32,540 --> 00:03:33,670 exponent parameter, or the 109 00:03:33,810 --> 00:03:35,110 C parameter, all of these 110 00:03:35,380 --> 00:03:36,880 are examples of parameters that 111 00:03:36,960 --> 00:03:38,110 you can play with in order 112 00:03:38,460 --> 00:03:40,420 to make your data look a little bit more Gaussian. 113 00:03:45,180 --> 00:03:46,210 So, let me show you a live demo 114 00:03:46,740 --> 00:03:48,720 of how I actually go about 115 00:03:49,150 --> 00:03:50,690 playing with my data to make it look more Gaussian. 116 00:03:51,650 --> 00:03:52,370 So, I have already loaded 117 00:03:52,750 --> 00:03:54,730 in to octave here a set 118 00:03:54,860 --> 00:03:56,170 of features x I have a thousand examples 119 00:03:57,150 --> 00:03:57,870 loaded over there. 120 00:03:58,580 --> 00:04:00,100 So let's pull up the histogram of my data. 121 00:04:01,560 --> 00:04:02,570 Use the hist x command. 122 00:04:03,190 --> 00:04:04,100 So there's my histogram. 123 00:04:05,660 --> 00:04:06,580 By default, I think this 124 00:04:06,680 --> 00:04:08,250 uses 10 bins of histograms, 125 00:04:08,610 --> 00:04:10,400 but I want to see a more fine grid histogram. 126 00:04:11,330 --> 00:04:12,950 So we do hist to the x, 50, 127 00:04:13,050 --> 00:04:14,970 so, this plots it in 50 different bins. 128 00:04:15,310 --> 00:04:15,660 Okay, that looks better. 129 00:04:16,180 --> 00:04:18,570 Now, this doesn't look very Gaussian, does it? 130 00:04:18,930 --> 00:04:20,720 So, lets start playing around with the data. 131 00:04:20,900 --> 00:04:22,310 Lets try a hist of 132 00:04:22,610 --> 00:04:24,810 x to the 0.5. 133 00:04:25,090 --> 00:04:26,590 So we take the 134 00:04:26,870 --> 00:04:28,820 square root of the data, and plot that histogram. 135 00:04:30,670 --> 00:04:31,680 And, okay, it looks 136 00:04:31,800 --> 00:04:32,870 a little bit more Gaussian, but not 137 00:04:32,960 --> 00:04:34,550 quite there, so let's play at the 0.5 parameter. 138 00:04:34,790 --> 00:04:35,330 Let's see. 139 00:04:36,520 --> 00:04:38,110 Set this to 0.2. 140 00:04:38,280 --> 00:04:39,780 Looks a little bit more Gaussian. 141 00:04:40,930 --> 00:04:43,150 Let's reduce a little bit more 0.1. 142 00:04:44,450 --> 00:04:45,220 Yeah, that looks pretty good. 143 00:04:45,500 --> 00:04:48,440 I could actually just use 0.1. 144 00:04:48,880 --> 00:04:50,190 Well, let's reduce it to 0.05. 145 00:04:50,520 --> 00:04:50,910 And, you know? 146 00:04:51,740 --> 00:04:52,750 Okay, this looks pretty Gaussian, 147 00:04:53,230 --> 00:04:54,090 so I can define a new 148 00:04:54,190 --> 00:04:55,510 feature which is x mu equals 149 00:04:56,110 --> 00:04:58,940 x to the 0.05, 150 00:04:59,620 --> 00:05:01,380 and now my new 151 00:05:01,610 --> 00:05:03,050 feature x Mu looks more 152 00:05:03,250 --> 00:05:04,490 Gaussian than my previous one 153 00:05:04,510 --> 00:05:05,560 and then I might instead use 154 00:05:05,850 --> 00:05:07,070 this new feature to feed 155 00:05:07,380 --> 00:05:09,390 into my anomaly detection algorithm. 156 00:05:10,150 --> 00:05:12,100 And of course, there is more than one way to do this. 157 00:05:12,410 --> 00:05:14,530 You could also have hist of log of 158 00:05:14,710 --> 00:05:17,320 x, that's another example of a transformation you can use. 159 00:05:18,270 --> 00:05:20,410 And, you know, that also look pretty Gaussian. 160 00:05:20,870 --> 00:05:22,040 So, I can also define x 161 00:05:22,230 --> 00:05:23,760 mu equals log of x. 162 00:05:24,220 --> 00:05:25,120 and that would be another 163 00:05:25,300 --> 00:05:26,890 pretty good choice of a feature to use. 164 00:05:28,040 --> 00:05:29,400 So to summarize, if you 165 00:05:29,520 --> 00:05:30,580 plot a histogram with the data, 166 00:05:31,000 --> 00:05:31,690 and find that it looks pretty 167 00:05:31,940 --> 00:05:33,460 non-Gaussian, it's worth playing 168 00:05:33,740 --> 00:05:35,110 around a little bit with 169 00:05:35,280 --> 00:05:37,120 different transformations like these, to 170 00:05:37,290 --> 00:05:38,190 see if you can make 171 00:05:38,300 --> 00:05:39,410 your data look a little bit more 172 00:05:39,570 --> 00:05:40,520 Gaussian, before you feed it to 173 00:05:40,770 --> 00:05:41,970 your learning algorithm, although even if 174 00:05:42,050 --> 00:05:43,550 you don't, it might work okay. 175 00:05:43,850 --> 00:05:45,070 But I usually do take this step. 176 00:05:45,850 --> 00:05:46,880 Now, the second thing I want 177 00:05:46,970 --> 00:05:48,280 to talk about is, how do 178 00:05:48,400 --> 00:05:51,540 you come up with features for an anomaly detection algorithm. 179 00:05:52,650 --> 00:05:53,780 And the way I often do 180 00:05:53,990 --> 00:05:56,490 so, is via an error analysis procedure. 181 00:05:57,630 --> 00:05:58,590 So what I mean by that, 182 00:05:58,970 --> 00:05:59,960 is that this is really similar 183 00:06:00,320 --> 00:06:02,320 to the error analysis procedure that 184 00:06:02,450 --> 00:06:04,600 we have for supervised learning, where 185 00:06:04,860 --> 00:06:06,810 we would train a 186 00:06:06,860 --> 00:06:08,220 complete algorithm, and run the 187 00:06:08,350 --> 00:06:09,980 algorithm on a cross validation set, 188 00:06:10,840 --> 00:06:11,870 and look at the examples it gets 189 00:06:12,230 --> 00:06:13,500 wrong, and see if 190 00:06:13,580 --> 00:06:14,800 we can come up with extra features 191 00:06:15,370 --> 00:06:16,440 to help the algorithm do 192 00:06:16,580 --> 00:06:17,870 better on the examples 193 00:06:18,280 --> 00:06:19,850 that it got wrong in the cross-validation set. 194 00:06:21,060 --> 00:06:23,380 So lets try 195 00:06:24,040 --> 00:06:25,960 to reason through an example of this process. 196 00:06:26,950 --> 00:06:28,680 In anomaly detection, we are 197 00:06:28,880 --> 00:06:29,690 hoping that p of x will 198 00:06:29,840 --> 00:06:30,910 be large for the normal examples 199 00:06:31,760 --> 00:06:33,180 and it will be small for the anomalous examples. 200 00:06:34,400 --> 00:06:35,370 And so a pretty common problem 201 00:06:35,950 --> 00:06:37,780 would be if p of x is comparable, 202 00:06:38,480 --> 00:06:41,540 maybe both are large for both the normal and the anomalous examples. 203 00:06:42,940 --> 00:06:44,380 Lets look at a specific example of that. 204 00:06:45,150 --> 00:06:46,760 Let's say that this is my unlabeled data. 205 00:06:47,120 --> 00:06:47,970 So, here I have just one 206 00:06:48,210 --> 00:06:51,130 feature, x1 and so I'm gonna fit a Gaussian to this. 207 00:06:52,160 --> 00:06:55,990 And maybe my Gaussian that I fit to my data looks like that. 208 00:06:57,300 --> 00:06:59,130 And now let's say I have an anomalous example, 209 00:06:59,670 --> 00:07:00,480 and let's say that my anomalous example 210 00:07:01,080 --> 00:07:02,850 takes on an x value of 2.5. 211 00:07:03,020 --> 00:07:06,420 So I plot my anomalous example there. 212 00:07:07,200 --> 00:07:08,120 And you know, it's kind of buried 213 00:07:08,650 --> 00:07:09,730 in the middle of a bunch 214 00:07:09,880 --> 00:07:11,690 of normal examples, and so, 215 00:07:13,450 --> 00:07:14,850 just this anomalous example 216 00:07:15,460 --> 00:07:16,780 that I've drawn in green, it gets a 217 00:07:16,820 --> 00:07:18,550 pretty high probability, where it's the 218 00:07:18,730 --> 00:07:20,000 height of the blue curve, 219 00:07:20,960 --> 00:07:22,280 and the algorithm fails to 220 00:07:22,390 --> 00:07:23,840 flag this as an anomalous example. 221 00:07:25,320 --> 00:07:26,600 Now, if this were maybe aircraft 222 00:07:27,000 --> 00:07:29,540 engine manufacturing or something, what 223 00:07:29,680 --> 00:07:30,490 I would do is, I would actually 224 00:07:30,860 --> 00:07:32,370 look at my training examples and 225 00:07:32,840 --> 00:07:34,500 look at what went wrong with 226 00:07:34,730 --> 00:07:36,920 that particular aircraft engine, and 227 00:07:37,030 --> 00:07:38,360 see, if looking at that 228 00:07:38,720 --> 00:07:40,720 example can inspire me to 229 00:07:40,860 --> 00:07:41,800 come up with a new feature 230 00:07:42,290 --> 00:07:43,890 x2, that helps to distinguish 231 00:07:44,650 --> 00:07:46,530 between this bad example, compared 232 00:07:46,900 --> 00:07:47,850 to the rest of my 233 00:07:48,530 --> 00:07:49,850 red examples, compared to all 234 00:07:50,980 --> 00:07:51,600 of my normal aircraft engines. 235 00:07:52,790 --> 00:07:53,840 And if I managed to do 236 00:07:54,000 --> 00:07:54,910 so, the hope would be then, 237 00:07:55,150 --> 00:07:56,540 that, if I can create a 238 00:07:56,610 --> 00:07:59,360 new feature, X2, so that 239 00:07:59,610 --> 00:08:01,490 when I re-plot my data, if 240 00:08:01,580 --> 00:08:02,530 I take all my normal examples 241 00:08:02,770 --> 00:08:04,420 of my training set, hopefully 242 00:08:04,750 --> 00:08:05,560 I find that all my training 243 00:08:05,710 --> 00:08:07,380 examples are these red crosses here. 244 00:08:08,210 --> 00:08:09,580 And hopefully, if I find 245 00:08:09,860 --> 00:08:11,390 that for my anomalous example, the 246 00:08:11,480 --> 00:08:13,490 feature x2 takes on the the unusual value. 247 00:08:14,470 --> 00:08:15,820 So for my green example 248 00:08:16,290 --> 00:08:18,670 here, this anomaly, right, my 249 00:08:18,940 --> 00:08:20,800 X1 value, is still 2.5. 250 00:08:21,260 --> 00:08:22,900 Then maybe my X2 value, hopefully 251 00:08:23,290 --> 00:08:24,530 it takes on a very large 252 00:08:24,840 --> 00:08:26,710 value like 3.5 over there, 253 00:08:27,940 --> 00:08:28,450 or a very small value. 254 00:08:29,450 --> 00:08:30,530 But now, if I model 255 00:08:30,970 --> 00:08:32,480 my data, I'll find that 256 00:08:33,050 --> 00:08:34,660 my anomaly detection algorithm gives 257 00:08:35,240 --> 00:08:36,830 high probability to data 258 00:08:37,190 --> 00:08:39,160 in the central regions, slightly lower 259 00:08:39,200 --> 00:08:42,470 probability to that, sightly lower probability to that. 260 00:08:42,660 --> 00:08:43,960 An example that's all the 261 00:08:44,070 --> 00:08:45,450 way out there, my algorithm will 262 00:08:45,630 --> 00:08:46,720 now give very low probability 263 00:08:48,360 --> 00:08:48,360 to. 264 00:08:48,510 --> 00:08:49,170 And so, the process of this 265 00:08:49,230 --> 00:08:50,320 is, really look at the 266 00:08:51,430 --> 00:08:52,570 mistakes that it is making. 267 00:08:52,830 --> 00:08:54,370 Look at the anomaly that the algorithm 268 00:08:54,580 --> 00:08:56,020 is failing to flag, and see 269 00:08:56,320 --> 00:08:59,100 if that inspires you to create some new feature. 270 00:08:59,590 --> 00:09:01,180 So find something unusual about 271 00:09:01,470 --> 00:09:02,590 that aircraft engine and use 272 00:09:02,800 --> 00:09:03,640 that to create a new feature, 273 00:09:04,530 --> 00:09:05,780 so that with this new 274 00:09:05,900 --> 00:09:07,140 feature it becomes easier to 275 00:09:07,400 --> 00:09:09,250 distinguish the anomalies from your good examples. 276 00:09:09,880 --> 00:09:11,170 And so that's the 277 00:09:11,280 --> 00:09:12,600 process of error analysis 278 00:09:14,020 --> 00:09:15,360 and using that to create 279 00:09:15,750 --> 00:09:17,100 new features for anomaly detection. 280 00:09:17,770 --> 00:09:18,980 Finally, let me share with 281 00:09:19,090 --> 00:09:20,440 you my thinking on how I 282 00:09:20,630 --> 00:09:23,190 usually go about choosing features for anomaly detection. 283 00:09:24,350 --> 00:09:27,700 So, usually, the way I think about choosing features is 284 00:09:27,960 --> 00:09:29,160 I want to choose features that will 285 00:09:29,270 --> 00:09:30,610 take on either very, very 286 00:09:30,860 --> 00:09:32,000 large values, or very, very 287 00:09:32,110 --> 00:09:33,890 small values, for examples 288 00:09:34,750 --> 00:09:36,420 that I think might turn out to be anomalies. 289 00:09:37,850 --> 00:09:38,710 So let's use our example 290 00:09:39,060 --> 00:09:41,820 again of monitoring the computers in a data center. 291 00:09:42,250 --> 00:09:43,560 And so you have lots of 292 00:09:43,630 --> 00:09:44,930 machines, maybe thousands, or tens 293 00:09:45,170 --> 00:09:47,830 of thousands of machines in a data center. 294 00:09:48,310 --> 00:09:49,410 And we want to know if one 295 00:09:49,580 --> 00:09:50,640 of the machines, one of our 296 00:09:50,710 --> 00:09:53,320 computers is acting up, so doing something strange. 297 00:09:54,180 --> 00:09:56,050 So here are examples of features you may choose, 298 00:09:57,020 --> 00:09:59,630 maybe memory used, number of disc accesses, CPU load, network traffic. 299 00:10:01,040 --> 00:10:01,960 But now, lets say that I 300 00:10:02,220 --> 00:10:03,040 suspect one of the failure 301 00:10:03,470 --> 00:10:04,580 cases, let's say that 302 00:10:05,230 --> 00:10:06,970 in my data set I think 303 00:10:07,150 --> 00:10:08,460 that CPU load the network traffic 304 00:10:08,990 --> 00:10:10,820 tend to grow linearly with each other. 305 00:10:11,110 --> 00:10:12,120 Maybe I'm running a bunch of 306 00:10:12,220 --> 00:10:13,370 web servers, and so, here 307 00:10:13,750 --> 00:10:15,050 if one of my servers is 308 00:10:15,310 --> 00:10:16,530 serving a lot of users, 309 00:10:16,850 --> 00:10:19,050 I have a very high CPU load, and have a very high network traffic. 310 00:10:20,230 --> 00:10:21,360 But let's say, I think, 311 00:10:21,840 --> 00:10:23,280 let's say I have a suspicion, that 312 00:10:23,390 --> 00:10:24,890 one of the failure cases is 313 00:10:25,180 --> 00:10:26,240 if one of my computers 314 00:10:26,530 --> 00:10:29,590 has a job that gets stuck in some infinite loop. 315 00:10:29,670 --> 00:10:30,750 So if I think one of 316 00:10:30,800 --> 00:10:32,240 the failure cases, is one of 317 00:10:32,420 --> 00:10:33,470 my machines, one of my 318 00:10:34,380 --> 00:10:36,020 web servers--server code-- 319 00:10:36,680 --> 00:10:37,990 gets stuck in some infinite loop, 320 00:10:38,230 --> 00:10:39,550 and so the CPU load grows, 321 00:10:40,380 --> 00:10:41,490 but the network traffic doesn't because 322 00:10:41,560 --> 00:10:42,790 it's just spinning it's 323 00:10:42,940 --> 00:10:44,570 wheels and doing a lot of CPU work, you know, 324 00:10:44,870 --> 00:10:46,000 stuck in some infinite loop. 325 00:10:46,930 --> 00:10:47,850 In that case, to detect 326 00:10:48,240 --> 00:10:49,610 that type of anomaly, I might 327 00:10:49,780 --> 00:10:52,440 create a new feature, X5, 328 00:10:53,170 --> 00:10:55,130 which might be CPU load 329 00:10:56,600 --> 00:11:00,120 divided by network traffic. 330 00:11:01,230 --> 00:11:02,810 And so here X5 will take 331 00:11:03,180 --> 00:11:04,860 on a unusually large value 332 00:11:05,700 --> 00:11:06,410 if one of the machines has a 333 00:11:06,790 --> 00:11:08,190 very large CPU load but 334 00:11:08,470 --> 00:11:09,980 not that much network traffic and 335 00:11:10,250 --> 00:11:11,030 so this will be a 336 00:11:11,160 --> 00:11:12,390 feature that will help your 337 00:11:12,490 --> 00:11:14,180 anomaly detection capture, a certain type of anomaly. 338 00:11:15,000 --> 00:11:16,700 And you can 339 00:11:16,840 --> 00:11:19,060 also get creative and come up with other features as well. 340 00:11:19,230 --> 00:11:20,090 Like maybe I have a feature 341 00:11:20,570 --> 00:11:22,050 x6 thats CPU load 342 00:11:22,880 --> 00:11:25,540 squared divided by network traffic. 343 00:11:27,030 --> 00:11:28,280 And this would be another variant 344 00:11:28,950 --> 00:11:29,910 of a feature like x5 to try 345 00:11:30,020 --> 00:11:32,120 to capture anomalies where one 346 00:11:32,280 --> 00:11:33,650 of your machines has a very 347 00:11:33,800 --> 00:11:35,030 high CPU load, that maybe 348 00:11:35,290 --> 00:11:37,100 doesn't have a commensurately large network traffic. 349 00:11:38,540 --> 00:11:40,080 And by creating features like 350 00:11:40,290 --> 00:11:41,560 these, you can start to capture 351 00:11:42,770 --> 00:11:44,550 anomalies that correspond to 352 00:11:45,690 --> 00:11:48,270 unusual combinations of values of the features. 353 00:11:50,990 --> 00:11:52,090 So in this video we 354 00:11:52,260 --> 00:11:53,550 talked about how to and 355 00:11:53,690 --> 00:11:54,670 take a feature, and maybe transform 356 00:11:55,120 --> 00:11:56,680 it a little bit, so that 357 00:11:56,830 --> 00:11:57,910 it becomes a bit more Gaussian, 358 00:11:58,260 --> 00:12:00,480 before feeding into an anomaly detection algorithm. 359 00:12:00,950 --> 00:12:02,110 And also the error analysis 360 00:12:02,740 --> 00:12:04,220 in this process of creating features 361 00:12:04,870 --> 00:12:06,710 to try to capture different types of anomalies. 362 00:12:07,550 --> 00:12:10,300 And with these sorts of guidelines hopefully that will help you 363 00:12:10,850 --> 00:12:12,180 to choose good features, to give to 364 00:12:12,460 --> 00:12:14,310 your anomaly detection algorithm, to 365 00:12:14,430 --> 00:12:15,920 help it capture all sorts of anomalies.