1 00:00:00,000 --> 00:00:04,246 [MUSIC] 2 00:00:04,246 --> 00:00:08,890 We've discussed error and accuracy as ways to evaluate a classifier. 3 00:00:08,890 --> 00:00:12,770 Now, it's very important to understand the accuracies or 4 00:00:12,770 --> 00:00:14,800 errors that you're actually getting from your classifier. 5 00:00:14,800 --> 00:00:18,467 I really think deeply about whether those are good errors or 6 00:00:18,467 --> 00:00:21,760 good levels of accuracy in your situation. 7 00:00:21,760 --> 00:00:26,830 So for example, one of the common mistakes you might make 8 00:00:26,830 --> 00:00:32,340 is to say how good is my classification at all? 9 00:00:32,340 --> 00:00:33,770 When you build a classifier, 10 00:00:33,770 --> 00:00:38,590 the first baseline comparison it should do is against random guessing. 11 00:00:38,590 --> 00:00:42,926 So for example, if you have a binary classification problem like is this 12 00:00:42,926 --> 00:00:46,694 sentence of positive or negative sentiment, then just random 13 00:00:46,694 --> 00:00:51,689 guessing is gonna give you 50% accuracy on average, so you better beat 50%. 14 00:00:51,689 --> 00:00:54,765 If you have k classes, so for example, if you have 3 classes. 15 00:00:54,765 --> 00:00:59,525 You're gonna have a random guessing accuracy of 33%. 16 00:00:59,525 --> 00:01:04,307 For 4 classes it would be 25%, for k classes it would be 1 over k. 17 00:01:04,307 --> 00:01:11,669 So at the very least it should beat random guessing really well. 18 00:01:11,669 --> 00:01:15,870 Because if you don't then your approach is basically pointless. 19 00:01:15,870 --> 00:01:20,197 Now, even beyond beating random guessing, truly think deeply about whether you 20 00:01:20,197 --> 00:01:24,750 classify, even it if looks really good, is it really meaningfully good? 21 00:01:24,750 --> 00:01:29,991 So for example, suppose you have a span predictor that gets 90% accuracy. 22 00:01:29,991 --> 00:01:31,174 Should you go brag about it? 23 00:01:31,174 --> 00:01:32,776 Is that awesome? 24 00:01:32,776 --> 00:01:34,436 Well, it really depends. 25 00:01:34,436 --> 00:01:41,173 So the case of spam, not so good, because in 2010 data shows that 90% 26 00:01:41,173 --> 00:01:46,250 of the emails ever sent were spam, 90% of the emails. 27 00:01:46,250 --> 00:01:50,998 So if I just guess that every email is spam, what accuracy do I get? 28 00:01:50,998 --> 00:01:51,784 90%. 29 00:01:53,420 --> 00:01:56,910 This is a problem where this is what's called majority class prediction so 30 00:01:56,910 --> 00:01:59,490 its just predicted classes most common. 31 00:01:59,490 --> 00:02:03,500 And it can have amazing performance in cases where there's what's called class 32 00:02:03,500 --> 00:02:04,470 imbalance. 33 00:02:04,470 --> 00:02:07,920 One class has much more representation than the others. 34 00:02:07,920 --> 00:02:12,025 Spam is much more representative than regular good emails. 35 00:02:13,060 --> 00:02:17,950 And so, you have to be very cautious and really look at whether you have class 36 00:02:17,950 --> 00:02:20,720 imbalance when you try to figure out whether your accuracy is good. 37 00:02:21,940 --> 00:02:23,730 And of course, this also beats, 38 00:02:23,730 --> 00:02:29,070 this approach also beats random guessing, if you know what majority class is. 39 00:02:29,070 --> 00:02:31,850 So you should always be digging into your problem, and 40 00:02:31,850 --> 00:02:36,390 understanding really thinking about the predictions you're getting and 41 00:02:36,390 --> 00:02:39,940 whether that accuracy is really meaningfully good for your problem. 42 00:02:39,940 --> 00:02:44,543 So ask yourself questions like, is there class imbalance? 43 00:02:44,543 --> 00:02:47,476 How they compare against baseline approaches like random guessing, 44 00:02:47,476 --> 00:02:50,300 majority class and really fancier things than that. 45 00:02:50,300 --> 00:02:53,190 And most importantly, think about your application and 46 00:02:53,190 --> 00:02:58,590 ask yourself, what is a good enough accuracy to make my users really happy? 47 00:02:58,590 --> 00:03:03,178 So, in spam filtering, if your accuracy is not that good, then there'll be important 48 00:03:03,178 --> 00:03:06,974 messages going to the spam folder, and that could be a really bad thing. 49 00:03:06,974 --> 00:03:10,819 [MUSIC]