1 00:00:00,000 --> 00:00:04,956 [MUSIC] 2 00:00:04,956 --> 00:00:08,473 When there's a trade off between precision and recall, it's important for 3 00:00:08,473 --> 00:00:09,890 us to look at the two extremes. 4 00:00:09,890 --> 00:00:12,950 What does it mean to have a classifier that's extremely precise? 5 00:00:12,950 --> 00:00:16,550 And what does it mean to have a classifier that's extremely high recall? 6 00:00:16,550 --> 00:00:19,770 And how the two can go against each other sometimes. 7 00:00:20,782 --> 00:00:24,300 First, let's think about what I call an optimistic classifier. 8 00:00:24,300 --> 00:00:27,170 You might know some of these optimists in your life. 9 00:00:27,170 --> 00:00:28,730 They think everything is good. 10 00:00:28,730 --> 00:00:29,870 How's it going? 11 00:00:29,870 --> 00:00:31,436 Good. Even if bad stuff is happening, 12 00:00:31,436 --> 00:00:32,110 they say good. 13 00:00:33,480 --> 00:00:37,420 Those folks say that all possible experiences are good, so 14 00:00:37,420 --> 00:00:38,540 they're optimists. 15 00:00:38,540 --> 00:00:42,450 That means that pretty much every input, every sentence, 16 00:00:42,450 --> 00:00:45,780 is labeled as positive, very few get labeled as negative. 17 00:00:45,780 --> 00:00:53,050 It's extremely likely that all the truly positive data points get labeled as good. 18 00:00:53,050 --> 00:00:54,270 What does that mean? 19 00:00:54,270 --> 00:00:58,370 That means that I have perfect recall, 20 00:00:58,370 --> 00:01:01,150 because I recall all those positive data points. 21 00:01:01,150 --> 00:01:02,110 Good. 22 00:01:02,110 --> 00:01:05,180 But I might not get 23 00:01:05,180 --> 00:01:08,460 perfect precision because I put in a bunch of negatives into that bit. 24 00:01:09,690 --> 00:01:11,290 How can we address that? 25 00:01:11,290 --> 00:01:13,830 We can have that pessimistic classifier, 26 00:01:13,830 --> 00:01:16,940 you might have some friends like that where you try really hard, 27 00:01:16,940 --> 00:01:22,350 you do everything for them, you go out of your way, and everything sucks. 28 00:01:22,350 --> 00:01:25,460 Every single experience that you have is really bad. 29 00:01:25,460 --> 00:01:29,490 There's very, very, very, very, few things that they say are good. 30 00:01:29,490 --> 00:01:34,902 And when they are there very likely to be good but everything else they say is bad, 31 00:01:34,902 --> 00:01:38,912 so everything else in the world is very hard, equals -1. 32 00:01:38,912 --> 00:01:42,370 Pessimist means that you're going to miss out on many good things in life. 33 00:01:43,900 --> 00:01:48,940 The pessimists have high precision because the few things that was good tends to be 34 00:01:48,940 --> 00:01:55,270 good, but very, very, very low recall, they don't inspire great things in life. 35 00:01:56,620 --> 00:02:02,200 It turns out there is a spectrum between a high precision low recall model and 36 00:02:02,200 --> 00:02:04,770 low precision high recall model, the pessimist and the optimist. 37 00:02:06,350 --> 00:02:10,690 What we'd like to do is somehow balance between the two perspectives in the world 38 00:02:10,690 --> 00:02:13,810 to find something that's just right for us. 39 00:02:13,810 --> 00:02:18,610 So, balance between a pessimistic model and the optimistic model. 40 00:02:20,510 --> 00:02:25,920 In particular, we want to find as many positive reviews or sentences as possible, 41 00:02:25,920 --> 00:02:31,420 as many of those as possible, with as few incorrect predictions as we can. 42 00:02:31,420 --> 00:02:35,272 So, that's the balance we're trying to strike in the case of our restaurant. 43 00:02:35,272 --> 00:02:39,309 [MUSIC]