1 00:00:00,000 --> 00:00:04,306 [MUSIC] 2 00:00:04,306 --> 00:00:08,755 We saw how we could change the threshold from zero to one for deciding 3 00:00:08,755 --> 00:00:13,753 what's a positive value to navigate between the optimistic classifier and 4 00:00:13,753 --> 00:00:16,330 the pessimistic classifier. 5 00:00:16,330 --> 00:00:19,600 There's an actually really intuitive visualization that does this. 6 00:00:19,600 --> 00:00:21,363 It's called a precision-recall curve. 7 00:00:21,363 --> 00:00:24,090 Precision-recall curves are extremely useful 8 00:00:24,090 --> 00:00:26,420 to understanding how a classifier is performing. 9 00:00:26,420 --> 00:00:31,180 So in this case, you can imagine setting two points in that curve. 10 00:00:31,180 --> 00:00:34,840 What happens to the precision once you have, 11 00:00:34,840 --> 00:00:37,500 when the threshold is very close to one? 12 00:00:37,500 --> 00:00:41,320 Well the precision is going to be one because we're going to get everything 13 00:00:41,320 --> 00:00:44,880 right, there's very, very few things and very sure they're going to correct. 14 00:00:44,880 --> 00:00:48,860 But the recall's going to be zero because you're going to say everything's bad, 15 00:00:49,990 --> 00:00:52,771 everything else is bad, so that's pessimistic. 16 00:00:52,771 --> 00:00:57,650 On the other extreme, our precision recall curve, the point on the bottom there, 17 00:00:57,650 --> 00:01:02,420 is a point where the optimistic point where you have very high recall because 18 00:01:02,420 --> 00:01:06,630 you're going to find all the positive data points, but very low precision, 19 00:01:06,630 --> 00:01:09,650 because you're going to find all sorts of other stuff and say that's still good. 20 00:01:09,650 --> 00:01:12,970 And so that happens when t is very small, close to zero. 21 00:01:12,970 --> 00:01:15,420 Now if you keep varying t, 22 00:01:15,420 --> 00:01:20,720 you have a spectrum of tradeoffs between precision and recall. 23 00:01:20,720 --> 00:01:24,590 So if you want a model that has a little bit more recall but still highly precise, 24 00:01:24,590 --> 00:01:30,760 maybe you set t = 0.8, but if you really want really, really high recall, 25 00:01:30,760 --> 00:01:35,840 but trying to improve precision a little bit, maybe set t to 0.2. 26 00:01:35,840 --> 00:01:39,670 And you can navigate that spectrum to explore the tradeoff between 27 00:01:39,670 --> 00:01:41,370 precision and recall. 28 00:01:41,370 --> 00:01:45,640 Now there doesn't always have to be a tradeoff, if you have a really 29 00:01:45,640 --> 00:01:49,120 perfect classifier, you might have a curve that looks like this. 30 00:01:49,120 --> 00:01:54,380 This is kind of the world's ideal where you have perfect precision 31 00:01:54,380 --> 00:01:56,830 no matter what your recall level. 32 00:01:56,830 --> 00:01:59,210 This line basically never happens. 33 00:01:59,210 --> 00:02:00,510 But that's kind of the ideal. 34 00:02:00,510 --> 00:02:04,600 That's where you're trying to get to, is that kind of flat line at the top. 35 00:02:04,600 --> 00:02:08,050 So the more your algorithm is closer to the flat line at the top, 36 00:02:09,080 --> 00:02:10,148 the better it is. 37 00:02:10,148 --> 00:02:14,150 And so precision-recall curves can be used to compare 38 00:02:14,150 --> 00:02:16,610 algorithms in addition to understanding one. 39 00:02:16,610 --> 00:02:17,260 So for example, 40 00:02:17,260 --> 00:02:21,890 let's say you have two classifiers, classifier A and classifier B. 41 00:02:21,890 --> 00:02:28,560 And you see that for every single point, classifier B is higher than classifier A. 42 00:02:29,810 --> 00:02:33,100 In that case we always prefer classifier B. 43 00:02:33,100 --> 00:02:37,570 No matter what the threshold is, classifier B always gives you 44 00:02:37,570 --> 00:02:41,520 a better precision for the same recall, better precision for same recall. 45 00:02:41,520 --> 00:02:43,630 So B is always better. 46 00:02:43,630 --> 00:02:45,730 However, life is not always this simple. 47 00:02:47,960 --> 00:02:51,830 If there's one thing you should learn about thus far, is that life and 48 00:02:51,830 --> 00:02:53,990 practice tends to be a bit messy. 49 00:02:53,990 --> 00:02:57,520 And so, often what you observe is not classifier A and B like we saw, but 50 00:02:57,520 --> 00:03:00,060 it's classifier A and C like we're seeing over here. 51 00:03:00,060 --> 00:03:07,090 Where there might be one or more cutoff points, where classifier A does better in 52 00:03:07,090 --> 00:03:11,080 some regions of the precision recal curve where classifier B does better in others. 53 00:03:11,080 --> 00:03:13,600 So, for example, or C in this case. 54 00:03:13,600 --> 00:03:19,410 So for example, if you're interested in very high precision but okay with lower 55 00:03:19,410 --> 00:03:24,420 recall, then you should pick classifier C, because it does better in that region. 56 00:03:24,420 --> 00:03:27,700 It's higher up, closer to that flat line. 57 00:03:27,700 --> 00:03:31,740 But, if you cared about getting high recall, 58 00:03:31,740 --> 00:03:34,050 then you should choose classifier A. 59 00:03:34,050 --> 00:03:38,590 Because in the high recall regime, when you pick tease, they're smaller, 60 00:03:38,590 --> 00:03:43,850 then classifier A tends to do better. 61 00:03:43,850 --> 00:03:46,236 So you see, it's curve is higher over here. 62 00:03:46,236 --> 00:03:55,320 So that's kind of complexity of dealing with machine learning in the real world. 63 00:03:55,320 --> 00:03:58,500 Now if you just had to pick one classifier, 64 00:03:58,500 --> 00:04:00,590 the question is how do you decide? 65 00:04:00,590 --> 00:04:03,950 How do you choose between A and C in this case? 66 00:04:03,950 --> 00:04:08,250 And we often the single number to decide, as I was hinting at, 67 00:04:08,250 --> 00:04:11,180 depends on where you want to be in the precision trade off curve. 68 00:04:11,180 --> 00:04:14,400 And there's many metrics out there to try to do single numbers, 69 00:04:14,400 --> 00:04:18,780 some are called F1 measures, some called area-under-the-curve. 70 00:04:18,780 --> 00:04:20,800 I'm less fond of those measures, 71 00:04:20,800 --> 00:04:25,450 myself, for a lot of applications than I am of one that's much simpler. 72 00:04:25,450 --> 00:04:28,570 Which, it's called precision at k. 73 00:04:28,570 --> 00:04:34,210 And let me talk about that because it's a really simple measure, really useful. 74 00:04:34,210 --> 00:04:39,840 So, let's say that there's five slots on my website to show sentences. 75 00:04:39,840 --> 00:04:40,990 That's all I care about. 76 00:04:40,990 --> 00:04:44,390 I want to show five great sentences on my website. 77 00:04:44,390 --> 00:04:46,980 I don't have room for ten million, for five million, just for five. 78 00:04:48,430 --> 00:04:51,660 And I show five sentences there. 79 00:04:51,660 --> 00:04:54,850 Four were great and one sucked. 80 00:04:54,850 --> 00:04:58,650 I want all five to be great. 81 00:04:58,650 --> 00:05:03,120 So I want my precision for the five top sentences, for 82 00:05:03,120 --> 00:05:05,650 the top five, to be as good as possible. 83 00:05:05,650 --> 00:05:10,892 In this case, our precision was four out of five, 0.8. 84 00:05:10,892 --> 00:05:15,088 So I ended up putting a sentence in there that said, my wife tried the ramen and 85 00:05:15,088 --> 00:05:17,620 it was pretty forgettable. 86 00:05:17,620 --> 00:05:19,580 That's kind of a disappointing thing to put in. 87 00:05:19,580 --> 00:05:24,236 So for many applications, like recommender systems for example, where you 88 00:05:24,236 --> 00:05:28,818 go to a web page and somebody showing you some products you might want to buy, 89 00:05:28,818 --> 00:05:32,419 precision at k is a really good metric to be thinking about. 90 00:05:32,419 --> 00:05:36,899 [MUSIC]