1 00:00:00,020 --> 00:00:04,668 [MUSIC] 2 00:00:04,668 --> 00:00:10,151 Now that explored the idea of precision let's talk about the slightly more subtle 3 00:00:10,151 --> 00:00:15,237 idea of recall, and recall has to do with out of the possible positive things 4 00:00:15,237 --> 00:00:20,720 in the world, everything out there, did I find every possible positive? 5 00:00:20,720 --> 00:00:22,970 Sentence about my restaurant. 6 00:00:22,970 --> 00:00:27,730 So, more specifically, I might take all the restaurant reviews and 7 00:00:27,730 --> 00:00:31,290 all the sentences and feed it through my classifier. 8 00:00:31,290 --> 00:00:35,270 And then classify some of those as positive, and some of those as negative. 9 00:00:35,270 --> 00:00:37,540 So, some might have, y hat = +1. 10 00:00:37,540 --> 00:00:39,530 Some might have y hat = -1. 11 00:00:39,530 --> 00:00:45,342 Now, if I look at the true label, the true class of each one of the sentences, so yi, 12 00:00:45,342 --> 00:00:50,870 which ones have +1, you see that four of those were predicted to be positive, 13 00:00:50,870 --> 00:00:53,190 and just like we talked earlier in the module. 14 00:00:53,190 --> 00:00:58,230 But there turned out to be two other sentences, there were three positive 15 00:00:58,230 --> 00:01:02,270 that my algorithm did not find that fell into the negative bin. 16 00:01:02,270 --> 00:01:04,300 So those are missed out. 17 00:01:04,300 --> 00:01:09,030 So in other words, we found the 4 positive sentences, but 18 00:01:09,030 --> 00:01:14,100 we missed out on 2 positive sentences that could have shown in my website. 19 00:01:14,100 --> 00:01:15,730 And maybe those two sentences were so 20 00:01:15,730 --> 00:01:22,230 amazing they would change the history of my restaurant, but I missed it. 21 00:01:22,230 --> 00:01:25,100 Because they did not have perfect recall. 22 00:01:25,100 --> 00:01:29,890 So, more formally out of all the truly positive data points, so 23 00:01:29,890 --> 00:01:36,190 the one's where yi is +1, we can have a subset on those that we do capture. 24 00:01:36,190 --> 00:01:41,040 So where y hat is also +1 but there's a subset which we 25 00:01:41,040 --> 00:01:46,620 don't capture where we think the y hat is -1, so y hat does not agree with y. 26 00:01:46,620 --> 00:01:48,450 And so, that's the part that we missed. 27 00:01:48,450 --> 00:01:52,850 So the recall is the fraction of the once that we actually get. 28 00:01:52,850 --> 00:01:54,520 We want everything to be in the blue box here. 29 00:01:55,770 --> 00:01:59,650 More formally, we can define recall as the fraction of true positives. 30 00:01:59,650 --> 00:02:02,340 For these are the data points that we were positive and 31 00:02:02,340 --> 00:02:05,370 we got them right divided by the true positives. 32 00:02:06,410 --> 00:02:07,580 And the false negative. 33 00:02:07,580 --> 00:02:11,550 So the data points that were true, but we labeled as negative. 34 00:02:11,550 --> 00:02:13,700 So falsely labeled as negative. 35 00:02:13,700 --> 00:02:21,940 And so, this is going to have value one if the false negatives are zero. 36 00:02:21,940 --> 00:02:26,250 Which means we captured everything, we captured all the true positives. 37 00:02:26,250 --> 00:02:30,530 And zero if we did not capture any true positives so 38 00:02:30,530 --> 00:02:33,570 all the positive data went to the false negative bin. 39 00:02:34,690 --> 00:02:38,620 So if we go back to the example that we have been looking at, 40 00:02:38,620 --> 00:02:42,490 I want to show positive sentences on my website. 41 00:02:42,490 --> 00:02:50,400 I've got four of them, in y hat i equals +1 but I missed out on two sentences. 42 00:02:50,400 --> 00:02:51,110 So for example, 43 00:02:51,110 --> 00:02:57,270 I missed out on the sentences that said my wife tried the ramen and it was delicious, 44 00:02:57,270 --> 00:03:01,460 and so maybe somebody's interested in ramen they don't see that sentence, 45 00:03:01,460 --> 00:03:05,853 they don't go to my website, and so I missed out on something really good. 46 00:03:05,853 --> 00:03:10,489 So high recall means that you discover everything positive that's being 47 00:03:10,489 --> 00:03:14,388 said about the restaurant or all of the positive data points. 48 00:03:14,388 --> 00:03:19,669 [MUSIC]