In the last video, we talked about the process of evaluating an anomaly detection algorithm and there we started to use some labelled data, with examples that we knew were either anomalous or not anomalous, with y equals 1 or y equals 0. So the question then arises, if we have this labeled data, we have some examples that are known to be anomalies and some that are known not to be not anomalies, why don't we just use a supervised learning algorithm, so why don't we just use logistic regression or a neural network to try to learn directly from our labeled data, to predict whether y equals one or y equals zero. In this video, I'll try to share with you some of the thinking and some guidelines for when you should probably use an anomaly detection algorithm and when it might be more fruitful to consider using a supervised learning algorithm. This slide shows, what are the settings under which you should maybe use anomaly detection versus when supervised learning might be more fruitful. If you have a problem with a very small number of positive examples, and remember examples of y equals one are the anomalous examples, then you might consider using an anomaly detection algorithm inset. So having 0 to 20, maybe up to 50 positive examples, might be pretty typical, and usually, we have such a small set of positive examples, we are going to save the positive examples just for the cross validation sets and test sets. In contrast, in a typical normal anomaly detection setting, we will often have a relatively large number of negative examples, of these normal examples of normal aircraft engines. And we can then use this very large number of negative examples, with which to fit the model p of x. And so, there is this idea in many anomaly detection applications, you have very few positive examples, and lots of negative examples, and when we are doing the process of estimating p of x, of fitting all those Gaussian parameters, we need only negative examples to do that. So if you have a lot of negative data, we can still fit to p of x pretty well. In contrast, for supervised learning, more typically we would have a reasonably large number of both positive and negative examples. And so this is one way to look at your problem and decide if you should use an anomaly detection algorithm or a supervised learning algorithm. Here is another way people often think about anomaly detection algorithms. So, for anomaly detection applications often there are many different types of anomalies. So think about aircraft engines. You know there are so many different ways for aircraft engines to go wrong. Right? There are so many things that could go wrong that could break an aircraft engine. And so, if that's the case and you have a pretty small set of positive examples, then it can be difficult for an algorithm to learn from your small set of positive examples what the anomalies look like. And in particular, you know, future anomalies may look nothing like the ones you've seen so far. So maybe in your set of positive examples, maybe you had seen 5 or 10, or 20 different ways that an aircraft engine could go wrong. But maybe tomorrow, you need to detect a totally new set, a totally new type of anomaly, a totally new way for an aircraft engine to be broken that you have just never seen before, and if that is the case, then it might be more promising to just model the negative examples, with a sort of a Gaussian model P of X. Rather than try too hard to model the positive examples, because, you know, tomorrow's anomaly may be nothing like the ones you've seen so far. In contrast, in some other problems you have enough positive examples for an algorithm to get a sense of what the positive examples are like. And in particular, if you think that future positive examples are likely to be similar to ones in the training set, then in that setting it might be more reasonable to have a supervised learning algorithm, that looks at a lot of the positive examples, looks at a lot of the negative examples, and uses that to try to distinguish between positives and negatives. So hopefully this gives you a sense of if you have a specific problem you should think about using the anomaly detection algorithm or a supervised learning algorithm. And the key difference really is, that in anomaly detection, after we have such a small number of positive examples that there is not possible, for a learning algorithm to learn that much from the positive examples. And so what we do instead, is take a large set of negative examples, and have it just learned a lot, learned p of x from just the negative examples of the normal aircraft engines, say. And we reserve the small number of positive examples for evaluating our algorithm to use in either the cross validation sets or the test sets. And just as a side comment about these many different types of anomalies, you know, in some earlier videos we talked about the email SPAM examples. In those examples, there are actually many different types of SPAM email. The SPAM email is trying to sell you things spam email, trying to steal your passwords, this is called fishing emails, and many different types of SPAM emails. But for the SPAM problem, we usually have enough examples of spam email to see, you know, most of these different types of SPAM email, because we have a large set of examples of SPAM, and that's why we usually think of SPAM as a supervised learning setting, even though, you know, there may be many different types of SPAM. And so, if we look at some applications of anomaly detection versus supervised learning, we'll find that, in fraud detection, if you have many different types of ways for people to try to commit fraud, and a relevantly small training set, a small number of fraudulent users on your website, then I would use an anomaly detection algorithm. I should say, if you have, if you are very a major online retailer, and if you actually have had a lot of people try to commit fraud on your website, so if you actually have a lot of examples where y equals 1, then you know, sometimes fraud detection could actually shift over to the supervised learning column. But, if you haven't seen that many examples of users doing strange things on your website then, more frequently, fraud detection is actually treated as an anomaly detection algorithm, rather than one of the supervised learning algorithm. Other examples, we talked about manufacturing already, hopefully you'll see more normal examples, not that many anomalies. But then again, for some manufacturing processes, if you're manufacturing very large volumes and you've seen a lot of bad examples, maybe manufacturing could shift to the supervised learning column as well. But, if you haven't seen that many bad examples of the old products, then I'll do this anomaly detection. Monitoring machines in the data center, again similar sorts of arguments apply. Whereas, email SPAM classification, weather prediction, and classifying cancers, if you have equal numbers of positive and negative examples, a lot of you have many examples of your positive and your negative examples, then, we would tend to treat all of these as supervised learning problems. So, hopefully, that gives you a sense of what are the properties of a learning problem that would cause you to treat it as an anomaly detention problem verses a supervised learning problem. And for many of the problems that are faced by various technology companies and so on, we actually are in these settings where we have very few or sometimes zero positive training examples, maybe there are so many different types of anomalies that we've never seen them before, and for those sorts of problems, very often, the algorithm that is used is an anomaly detection algorithm.