1 00:00:00,240 --> 00:00:04,861 [MUSIC] 2 00:00:04,861 --> 00:00:09,790 Thus far we've talked about precision, recall, optimism, pessimism. 3 00:00:09,790 --> 00:00:12,010 All sorts of different aspects. 4 00:00:13,020 --> 00:00:17,340 But one of the most surprising things about this whole story is that it's quite 5 00:00:17,340 --> 00:00:22,880 easy to navigate from a low precision model to a high precision model from 6 00:00:22,880 --> 00:00:27,710 a high recall model to a low recall model, so kind of investigate that spectrum. 7 00:00:27,710 --> 00:00:32,960 So we can have a low precision, high recall model that's very optimistic, 8 00:00:32,960 --> 00:00:37,690 you can have a high precision, low recall model that's very pessimistic, and 9 00:00:37,690 --> 00:00:41,930 then it turns out that it's easy to find a path in between. 10 00:00:41,930 --> 00:00:43,200 And the question is, how do we do that? 11 00:00:45,130 --> 00:00:50,488 If you recall from earlier in this course, we assign not just a label, 12 00:00:50,488 --> 00:00:55,590 +1 or -1, for every data point, but a probability number, let's say, 0.99 or 13 00:00:56,650 --> 00:01:01,068 being positive for the sushi and everything else were awesome. 14 00:01:01,068 --> 00:01:07,420 To say 0.55 of being positive for the sushi was good, the service was okay. 15 00:01:07,420 --> 00:01:10,440 This probability is I, as I mentioned earlier in the course, 16 00:01:10,440 --> 00:01:12,050 that they are going to be fundamentally useful. 17 00:01:12,050 --> 00:01:15,400 Now you're going to see a place where they are amazingly useful. 18 00:01:15,400 --> 00:01:22,140 So the probabilities can be used to tradeoff precision with recall. 19 00:01:22,140 --> 00:01:23,530 And so let's figure that out. 20 00:01:23,530 --> 00:01:28,150 So earlier in the course, we just had a fixed threshold to decide if an input 21 00:01:28,150 --> 00:01:31,410 sentence, x-i, was going to be positive or negative. 22 00:01:31,410 --> 00:01:34,870 We said, it's going to be positive if the probability is greater than 0.5, and 23 00:01:34,870 --> 00:01:38,280 is going to be negative if the probability is less than 0.5, 24 00:01:38,280 --> 00:01:40,260 or less than or equal to it. 25 00:01:40,260 --> 00:01:42,210 Now, how can we create an optimistic and 26 00:01:42,210 --> 00:01:45,880 pessimistic model by just changing the 0.5 threshold? 27 00:01:45,880 --> 00:01:48,944 Let's explore that idea. 28 00:01:48,944 --> 00:01:53,196 Think about what would happen if we set the threshold, 29 00:01:53,196 --> 00:01:56,820 instead of being 0.5 to being 0.999. 30 00:01:56,820 --> 00:02:01,450 So a data point is only +1 if its probability is greater than 0.999. 31 00:02:01,450 --> 00:02:04,740 Well, here's what happen. 32 00:02:04,740 --> 00:02:07,060 Very few data points would satisfy this. 33 00:02:08,260 --> 00:02:12,450 So if very few data points satisfy this, then very few data points will be +1. 34 00:02:12,450 --> 00:02:15,230 And the vast majority will be labeled as -1. 35 00:02:15,230 --> 00:02:18,720 And we call this classifier the pessimistic classifier. 36 00:02:20,130 --> 00:02:25,524 Now alternatively, if we change the threshold to be 0.001, then that means 37 00:02:25,524 --> 00:02:30,620 that any experience is going to be labeled as positive. 38 00:02:30,620 --> 00:02:36,480 So, almost all of the data points are going to satisfy this condition. 39 00:02:36,480 --> 00:02:37,796 So we're going to say that. 40 00:02:37,796 --> 00:02:44,530 Everything is +1, and so this is going to be the optimistic classifier. 41 00:02:44,530 --> 00:02:49,830 It's going to be say yeah, everything is +1, everything's good. 42 00:02:49,830 --> 00:02:54,440 So by varying that threshold from 0.5 to something close to 0 43 00:02:54,440 --> 00:02:59,840 to something close to 1, we're going to change between optimism and pessimism. 44 00:02:59,840 --> 00:03:03,020 So if you go back to this picture of logistic regression, for example, 45 00:03:03,020 --> 00:03:04,540 as a complete case. 46 00:03:04,540 --> 00:03:09,030 We have this input, the score of x. 47 00:03:10,860 --> 00:03:18,580 And the output here was the probability that y is equal to +1 given x and w. 48 00:03:19,990 --> 00:03:23,820 This should bring some memories maybe some sad, sad memories. 49 00:03:23,820 --> 00:03:27,858 The threshold here is going to be a cut where we say, 50 00:03:27,858 --> 00:03:34,277 set y hat to be equal to +1 if it's greater than or equal to this threshold t. 51 00:03:34,277 --> 00:03:37,687 So everything above the line will be +1 and 52 00:03:37,687 --> 00:03:41,500 everything below the line will be labeled -1. 53 00:03:41,500 --> 00:03:45,420 Or concretely, let's see what happens if we set the threshold to be some very, 54 00:03:45,420 --> 00:03:47,010 very high number. 55 00:03:47,010 --> 00:03:52,232 So, t here is close to one. 56 00:03:52,232 --> 00:03:54,713 So if t is some number close to one, 57 00:03:54,713 --> 00:03:59,220 then everything below that line will be labeled as -1. 58 00:03:59,220 --> 00:04:04,505 And very, very few things there above the line here can be labeled as +1. 59 00:04:04,505 --> 00:04:10,145 So, that's why we end up with a pessimistic classifier, 60 00:04:10,145 --> 00:04:16,127 on the flip side if we set the t threshold to be something very, 61 00:04:16,127 --> 00:04:23,800 very small, so this is small t then everything's going to be above the line. 62 00:04:23,800 --> 00:04:26,440 So everything's going to be labeled as +1, and 63 00:04:26,440 --> 00:04:29,910 very few data points are going to be labeled as -1. 64 00:04:29,910 --> 00:04:33,140 So we'll end up with the optimistic classifier. 65 00:04:33,140 --> 00:04:39,660 So range in t from 0 to 1, takes us from optimism to pessimism. 66 00:04:40,900 --> 00:04:45,298 In other words that spectrum that we said weren't navigate on can now be 67 00:04:45,298 --> 00:04:49,490 navigated for a single parameter, t, that goes between 0 and 1. 68 00:04:49,490 --> 00:04:54,779 [MUSIC]