[MUSIC] We've discussed error and accuracy as ways to evaluate a classifier. Now, it's very important to understand the accuracies or errors that you're actually getting from your classifier. I really think deeply about whether those are good errors or good levels of accuracy in your situation. So for example, one of the common mistakes you might make is to say how good is my classification at all? When you build a classifier, the first baseline comparison it should do is against random guessing. So for example, if you have a binary classification problem like is this sentence of positive or negative sentiment, then just random guessing is gonna give you 50% accuracy on average, so you better beat 50%. If you have k classes, so for example, if you have 3 classes. You're gonna have a random guessing accuracy of 33%. For 4 classes it would be 25%, for k classes it would be 1 over k. So at the very least it should beat random guessing really well. Because if you don't then your approach is basically pointless. Now, even beyond beating random guessing, truly think deeply about whether you classify, even it if looks really good, is it really meaningfully good? So for example, suppose you have a span predictor that gets 90% accuracy. Should you go brag about it? Is that awesome? Well, it really depends. So the case of spam, not so good, because in 2010 data shows that 90% of the emails ever sent were spam, 90% of the emails. So if I just guess that every email is spam, what accuracy do I get? 90%. This is a problem where this is what's called majority class prediction so its just predicted classes most common. And it can have amazing performance in cases where there's what's called class imbalance. One class has much more representation than the others. Spam is much more representative than regular good emails. And so, you have to be very cautious and really look at whether you have class imbalance when you try to figure out whether your accuracy is good. And of course, this also beats, this approach also beats random guessing, if you know what majority class is. So you should always be digging into your problem, and understanding really thinking about the predictions you're getting and whether that accuracy is really meaningfully good for your problem. So ask yourself questions like, is there class imbalance? How they compare against baseline approaches like random guessing, majority class and really fancier things than that. And most importantly, think about your application and ask yourself, what is a good enough accuracy to make my users really happy? So, in spam filtering, if your accuracy is not that good, then there'll be important messages going to the spam folder, and that could be a really bad thing. [MUSIC]