1 00:00:00,170 --> 00:00:01,190 In this next set of videos, 2 00:00:01,720 --> 00:00:02,680 I'd like to tell you about 3 00:00:03,050 --> 00:00:04,560 a problem called Anomaly Detection. 4 00:00:05,710 --> 00:00:07,220 This is a reasonably commonly 5 00:00:07,870 --> 00:00:08,740 use you type machine learning. 6 00:00:09,580 --> 00:00:10,990 And one of the interesting aspects 7 00:00:11,580 --> 00:00:13,250 is that it's mainly for 8 00:00:14,020 --> 00:00:15,860 unsupervised problem, that there's some 9 00:00:16,320 --> 00:00:17,240 aspects of it that are 10 00:00:17,510 --> 00:00:20,000 also very similar to sort of the supervised learning problem. 11 00:00:21,160 --> 00:00:22,440 So, what is anomaly detection? 12 00:00:23,380 --> 00:00:25,000 To explain it. Let me use 13 00:00:25,240 --> 00:00:27,780 the motivating example of: Imagine 14 00:00:28,440 --> 00:00:30,040 that you're a manufacturer of 15 00:00:30,330 --> 00:00:32,370 aircraft engines, and let's 16 00:00:32,600 --> 00:00:33,850 say that as your aircraft 17 00:00:34,280 --> 00:00:35,330 engines roll off the assembly 18 00:00:35,620 --> 00:00:37,580 line, you're doing, you know, QA or 19 00:00:37,820 --> 00:00:39,850 quality assurance testing, and as 20 00:00:40,030 --> 00:00:41,340 part of that testing you 21 00:00:41,410 --> 00:00:43,140 measure features of your 22 00:00:43,510 --> 00:00:44,900 aircraft engine, like maybe, you measure 23 00:00:45,180 --> 00:00:46,820 the heat generated, things like 24 00:00:46,860 --> 00:00:48,340 the vibrations and so on. 25 00:00:48,630 --> 00:00:49,570 I share some friends that worked 26 00:00:49,860 --> 00:00:50,940 on this problem a long time 27 00:00:51,010 --> 00:00:52,610 ago, and these were actually the 28 00:00:52,710 --> 00:00:53,960 sorts of features that they were 29 00:00:54,470 --> 00:00:55,910 collecting off actual aircraft 30 00:00:56,280 --> 00:00:58,540 engines so you 31 00:00:58,630 --> 00:00:59,570 now have a data set of 32 00:00:59,700 --> 00:01:01,000 X1 through Xm, if you have 33 00:01:01,760 --> 00:01:04,490 manufactured m aircraft engines, 34 00:01:05,030 --> 00:01:06,740 and if you plot your data, maybe it looks like this. 35 00:01:07,130 --> 00:01:08,640 So, each point here, each cross 36 00:01:08,770 --> 00:01:10,580 here as one of your unlabeled examples. 37 00:01:11,990 --> 00:01:15,220 So, the anomaly detection problem is the following. 38 00:01:16,450 --> 00:01:17,770 Let's say that on, you 39 00:01:17,880 --> 00:01:18,970 know, the next day, you 40 00:01:19,140 --> 00:01:20,390 have a new aircraft engine 41 00:01:20,810 --> 00:01:21,860 that rolls off the assembly line 42 00:01:22,320 --> 00:01:23,890 and your new aircraft engine has 43 00:01:24,160 --> 00:01:25,440 some set of features x-test. 44 00:01:26,290 --> 00:01:27,680 What the anomaly detection problem is, 45 00:01:27,930 --> 00:01:29,070 we want to know if this 46 00:01:29,420 --> 00:01:31,310 aircraft engine is anomalous in 47 00:01:31,520 --> 00:01:32,480 any way, in other words, we want 48 00:01:32,740 --> 00:01:34,110 to know if, maybe, this engine 49 00:01:34,570 --> 00:01:36,290 should undergo further testing 50 00:01:37,330 --> 00:01:38,370 because, or if it looks 51 00:01:38,710 --> 00:01:40,560 like an okay engine, and 52 00:01:40,740 --> 00:01:41,700 so it's okay to just ship 53 00:01:41,880 --> 00:01:43,260 it to a customer without further testing. 54 00:01:44,560 --> 00:01:45,670 So, if your new 55 00:01:45,840 --> 00:01:47,330 aircraft engine looks like 56 00:01:47,540 --> 00:01:49,150 a point over there, well, you 57 00:01:49,260 --> 00:01:50,200 know, that looks a lot 58 00:01:50,360 --> 00:01:51,440 like the aircraft engines we've seen 59 00:01:51,650 --> 00:01:53,860 before, and so maybe we'll say that it looks okay. 60 00:01:54,750 --> 00:01:55,740 Whereas, if your new aircraft 61 00:01:56,200 --> 00:01:59,390 engine, if x-test, you know, were 62 00:01:59,620 --> 00:02:00,430 a point that were out here, 63 00:02:00,910 --> 00:02:02,270 so that if X1 and 64 00:02:02,410 --> 00:02:04,800 X2 are the features of this new example. 65 00:02:05,360 --> 00:02:06,530 If x-tests were all the 66 00:02:06,590 --> 00:02:08,930 way out there, then we would call that an anomaly. 67 00:02:10,420 --> 00:02:11,640 and maybe send that aircraft engine 68 00:02:12,070 --> 00:02:13,720 for further testing before we 69 00:02:13,870 --> 00:02:15,130 ship it to a customer, since 70 00:02:16,010 --> 00:02:18,340 it looks very different than 71 00:02:18,600 --> 00:02:20,350 the rest of the aircraft engines we've seen before. 72 00:02:21,000 --> 00:02:22,560 More formally in the anomaly 73 00:02:22,960 --> 00:02:24,230 detection problem, we're give 74 00:02:24,900 --> 00:02:26,160 some data sets, x1 through 75 00:02:26,280 --> 00:02:28,340 Xm of examples, and we 76 00:02:28,460 --> 00:02:29,720 usually assume that these end 77 00:02:29,880 --> 00:02:32,250 examples are normal or 78 00:02:33,120 --> 00:02:34,910 non-anomalous examples, and we 79 00:02:34,980 --> 00:02:36,100 want an algorithm to tell us 80 00:02:36,290 --> 00:02:38,300 if some new example x-test is anomalous. 81 00:02:38,850 --> 00:02:40,080 The approach that we're going 82 00:02:40,130 --> 00:02:41,670 to take is that given this training 83 00:02:42,060 --> 00:02:43,300 set, given the unlabeled training 84 00:02:43,690 --> 00:02:45,280 set, we're going to 85 00:02:45,420 --> 00:02:46,920 build a model for p of 86 00:02:47,020 --> 00:02:48,060 x. In other words, we're 87 00:02:48,140 --> 00:02:49,320 going to build a model for the 88 00:02:49,520 --> 00:02:51,230 probability of x, where 89 00:02:51,390 --> 00:02:53,330 x are these features of, say, aircraft engines. 90 00:02:54,620 --> 00:02:56,290 And so, having built a 91 00:02:56,530 --> 00:02:57,350 model of the probability of x 92 00:02:58,070 --> 00:02:59,230 we're then going to say that 93 00:02:59,820 --> 00:03:01,280 for the new aircraft engine, if 94 00:03:01,520 --> 00:03:04,670 p of x-test is less 95 00:03:04,920 --> 00:03:07,180 than some epsilon then 96 00:03:07,930 --> 00:03:09,170 we flag this as an anomaly. 97 00:03:11,410 --> 00:03:12,260 So we see a new engine 98 00:03:12,660 --> 00:03:13,960 that, you know, has very low probability 99 00:03:14,850 --> 00:03:15,900 under a model p of 100 00:03:16,020 --> 00:03:17,130 x that we estimate from the data, 101 00:03:17,790 --> 00:03:19,370 then we flag this anomaly, whereas 102 00:03:19,730 --> 00:03:21,880 if p of x-test is, say, 103 00:03:22,320 --> 00:03:24,110 greater than or equal to some small threshold. 104 00:03:25,120 --> 00:03:26,620 Then we say that, you know, okay, it looks okay. 105 00:03:27,780 --> 00:03:28,740 And so, given the training set, 106 00:03:28,980 --> 00:03:30,890 like that plotted here, if 107 00:03:31,060 --> 00:03:31,940 you build a model, hopefully 108 00:03:32,560 --> 00:03:34,020 you will find that aircraft engines, 109 00:03:34,470 --> 00:03:35,500 or hopefully the model p of 110 00:03:35,560 --> 00:03:37,070 x will say that points that 111 00:03:37,260 --> 00:03:38,540 lie, you know, somewhere in the 112 00:03:38,580 --> 00:03:39,550 middle, that's pretty high probability, 113 00:03:40,720 --> 00:03:42,830 whereas points a little bit further out have lower probability. 114 00:03:43,850 --> 00:03:45,050 Points that are even further out 115 00:03:45,530 --> 00:03:47,220 have somewhat lower probability, and the 116 00:03:47,480 --> 00:03:48,420 point that's way out here, 117 00:03:49,080 --> 00:03:50,400 the point that's way 118 00:03:50,520 --> 00:03:52,100 out there, would be an anomaly. 119 00:03:54,150 --> 00:03:55,280 Whereas the point that's way in 120 00:03:55,470 --> 00:03:56,460 there, right in the 121 00:03:56,520 --> 00:03:57,720 middle, this would be 122 00:03:57,830 --> 00:03:59,080 okay because p of x 123 00:03:59,370 --> 00:04:00,300 right in the middle of that 124 00:04:00,460 --> 00:04:01,320 would be very high cause we've 125 00:04:01,520 --> 00:04:03,320 seen a lot of points in that region. 126 00:04:04,620 --> 00:04:07,580 Here are some examples of applications of anomaly detection. 127 00:04:08,450 --> 00:04:09,990 Perhaps the most common application of 128 00:04:10,080 --> 00:04:11,420 anomaly detection is actually 129 00:04:11,560 --> 00:04:13,260 for detection if you 130 00:04:13,360 --> 00:04:14,820 have many users, and if 131 00:04:15,070 --> 00:04:16,360 each of your users take different 132 00:04:16,670 --> 00:04:17,740 activities, you know maybe 133 00:04:17,920 --> 00:04:18,560 on your website or in the 134 00:04:18,630 --> 00:04:20,180 physical plant or something, you 135 00:04:20,300 --> 00:04:23,670 can compute features of the different users activities. 136 00:04:24,830 --> 00:04:25,730 And what you can do is build 137 00:04:25,940 --> 00:04:27,240 a model to say, you know, 138 00:04:27,310 --> 00:04:28,960 what is the probability of different 139 00:04:29,170 --> 00:04:30,730 users behaving different ways. 140 00:04:30,890 --> 00:04:32,280 What is the probability of a particular vector 141 00:04:32,460 --> 00:04:34,590 of features of a 142 00:04:34,840 --> 00:04:36,750 users behavior so you 143 00:04:36,900 --> 00:04:38,360 know examples of features of 144 00:04:38,450 --> 00:04:40,480 a users activity may be on 145 00:04:40,650 --> 00:04:41,650 the website it'd be things like, 146 00:04:42,710 --> 00:04:44,350 maybe x1 is how often does 147 00:04:44,840 --> 00:04:46,460 this user log in, x2, you know, maybe 148 00:04:46,850 --> 00:04:47,920 the number of what 149 00:04:48,130 --> 00:04:49,330 pages visited, or the 150 00:04:49,730 --> 00:04:51,420 number of transactions, maybe x3 151 00:04:51,440 --> 00:04:52,820 is, you know, the number of 152 00:04:53,120 --> 00:04:53,990 posts of the users on the 153 00:04:54,130 --> 00:04:55,850 forum, feature x4 could 154 00:04:56,000 --> 00:04:56,910 be what is the typing 155 00:04:57,440 --> 00:04:58,660 speed of the user and some 156 00:04:58,920 --> 00:04:59,980 websites can actually track that 157 00:05:00,280 --> 00:05:01,410 was the typing speed of this 158 00:05:01,600 --> 00:05:03,010 user in characters per second. 159 00:05:03,730 --> 00:05:06,610 And so you can model p of x based on this sort of data. 160 00:05:08,150 --> 00:05:09,140 And finally having your model 161 00:05:09,270 --> 00:05:10,530 p of x, you can 162 00:05:10,790 --> 00:05:12,570 try to identify users that 163 00:05:12,760 --> 00:05:14,210 are behaving very strangely on your 164 00:05:14,350 --> 00:05:15,590 website by checking which ones have 165 00:05:16,320 --> 00:05:18,100 probably effects less than epsilon and 166 00:05:18,240 --> 00:05:21,140 maybe send the profiles of those users for further review. 167 00:05:22,330 --> 00:05:24,560 Or demand additional identification from 168 00:05:24,740 --> 00:05:26,190 those users, or some such 169 00:05:26,650 --> 00:05:28,370 to guard against you know, 170 00:05:29,200 --> 00:05:31,650 strange behavior or fraudulent behavior on your website. 171 00:05:33,030 --> 00:05:34,960 This sort of technique will tend 172 00:05:35,160 --> 00:05:36,470 of flag the users that are 173 00:05:36,720 --> 00:05:38,250 behaving unusually, not just 174 00:05:39,480 --> 00:05:41,420 users that maybe behaving fraudulently. 175 00:05:42,190 --> 00:05:44,030 So not just constantly having 176 00:05:44,370 --> 00:05:45,670 stolen or users that are 177 00:05:45,780 --> 00:05:47,780 trying to do funny things, or just find unusual users. 178 00:05:48,560 --> 00:05:49,770 But this is actually the technique 179 00:05:50,040 --> 00:05:51,430 that is used by many online 180 00:05:52,500 --> 00:05:53,570 websites that sell things to 181 00:05:53,750 --> 00:05:55,860 try identify users behaving 182 00:05:56,240 --> 00:05:57,900 strangely that might be 183 00:05:58,040 --> 00:05:59,160 indicative of either fraudulent 184 00:05:59,760 --> 00:06:02,420 behavior or of computer accounts that have been stolen. 185 00:06:03,580 --> 00:06:06,410 Another example of anomaly detection is manufacturing. 186 00:06:07,180 --> 00:06:08,470 So, already talked about the 187 00:06:08,530 --> 00:06:09,770 aircraft engine thing where you can 188 00:06:10,030 --> 00:06:11,460 find unusual, say, aircraft 189 00:06:11,900 --> 00:06:13,600 engines and send those for further review. 190 00:06:15,430 --> 00:06:16,740 A third application would be 191 00:06:17,070 --> 00:06:19,210 monitoring computers in a data center. 192 00:06:19,390 --> 00:06:20,410 I actually have some friends who work on this too. 193 00:06:21,260 --> 00:06:22,280 So if you have a lot 194 00:06:22,580 --> 00:06:23,550 of machines in a computer 195 00:06:23,730 --> 00:06:24,690 cluster or in a 196 00:06:24,780 --> 00:06:25,710 data center, we can do 197 00:06:25,920 --> 00:06:28,560 things like compute features at each machine. 198 00:06:29,020 --> 00:06:30,650 So maybe some features capturing 199 00:06:31,170 --> 00:06:32,730 you know, how much memory used, number of 200 00:06:32,870 --> 00:06:34,280 disc accesses, CPU load. 201 00:06:35,060 --> 00:06:36,050 As well as more complex features 202 00:06:36,440 --> 00:06:37,450 like what is the CPU 203 00:06:37,830 --> 00:06:39,650 load on this machine divided by 204 00:06:39,960 --> 00:06:41,340 the amount of network traffic 205 00:06:41,950 --> 00:06:43,050 on this machine? 206 00:06:43,340 --> 00:06:44,580 Then given the dataset of how 207 00:06:44,820 --> 00:06:45,780 your computers in your data 208 00:06:46,070 --> 00:06:47,230 center usually behave, you can 209 00:06:47,390 --> 00:06:48,460 model the probability of x, 210 00:06:48,590 --> 00:06:49,730 so you can model the probability 211 00:06:50,350 --> 00:06:51,840 of these machines having 212 00:06:52,840 --> 00:06:53,790 different amounts of memory use 213 00:06:54,060 --> 00:06:55,200 or probability of these machines having 214 00:06:55,920 --> 00:06:57,160 different numbers of disc accesses 215 00:06:57,780 --> 00:06:59,880 or different CPU loads and so on. 216 00:07:00,030 --> 00:07:01,100 And if you ever have a machine 217 00:07:02,030 --> 00:07:03,530 whose probability of x, 218 00:07:03,800 --> 00:07:05,330 p of x, is very small then you 219 00:07:05,440 --> 00:07:06,880 know that machine is behaving unusually 220 00:07:07,970 --> 00:07:08,950 and maybe that machine is 221 00:07:09,050 --> 00:07:11,630 about to go down, and you 222 00:07:11,700 --> 00:07:13,620 can flag that for review by a system administrator. 223 00:07:14,690 --> 00:07:15,890 And this is actually being used 224 00:07:16,060 --> 00:07:17,800 today by various data 225 00:07:18,020 --> 00:07:19,550 centers to watch out for unusual 226 00:07:20,040 --> 00:07:21,430 things happening on their machines. 227 00:07:22,920 --> 00:07:24,420 So, that's anomaly detection. 228 00:07:25,540 --> 00:07:26,880 In the next video, I'll 229 00:07:27,120 --> 00:07:29,400 talk a bit about the Gaussian distribution and 230 00:07:29,580 --> 00:07:31,030 review properties of the Gaussian 231 00:07:31,580 --> 00:07:33,540 probability distribution, and in 232 00:07:33,690 --> 00:07:34,650 videos after that, we will 233 00:07:34,790 --> 00:07:37,390 apply it to develop an anomaly detection algorithm.