1
00:00:00,000 --> 00:00:04,306
[MUSIC]

2
00:00:04,306 --> 00:00:08,755
We saw how we could change the threshold
from zero to one for deciding

3
00:00:08,755 --> 00:00:13,753
what's a positive value to navigate
between the optimistic classifier and

4
00:00:13,753 --> 00:00:16,330
the pessimistic classifier.

5
00:00:16,330 --> 00:00:19,600
There's an actually really intuitive
visualization that does this.

6
00:00:19,600 --> 00:00:21,363
It's called a precision-recall curve.

7
00:00:21,363 --> 00:00:24,090
Precision-recall curves
are extremely useful

8
00:00:24,090 --> 00:00:26,420
to understanding how
a classifier is performing.

9
00:00:26,420 --> 00:00:31,180
So in this case, you can imagine
setting two points in that curve.

10
00:00:31,180 --> 00:00:34,840
What happens to
the precision once you have,

11
00:00:34,840 --> 00:00:37,500
when the threshold is very close to one?

12
00:00:37,500 --> 00:00:41,320
Well the precision is going to be one
because we're going to get everything

13
00:00:41,320 --> 00:00:44,880
right, there's very, very few things and
very sure they're going to correct.

14
00:00:44,880 --> 00:00:48,860
But the recall's going to be zero because
you're going to say everything's bad,

15
00:00:49,990 --> 00:00:52,771
everything else is bad,
so that's pessimistic.

16
00:00:52,771 --> 00:00:57,650
On the other extreme, our precision recall
curve, the point on the bottom there,

17
00:00:57,650 --> 00:01:02,420
is a point where the optimistic point
where you have very high recall because

18
00:01:02,420 --> 00:01:06,630
you're going to find all the positive
data points, but very low precision,

19
00:01:06,630 --> 00:01:09,650
because you're going to find all sorts of
other stuff and say that's still good.

20
00:01:09,650 --> 00:01:12,970
And so that happens when t is very small,
close to zero.

21
00:01:12,970 --> 00:01:15,420
Now if you keep varying t,

22
00:01:15,420 --> 00:01:20,720
you have a spectrum of tradeoffs
between precision and recall.

23
00:01:20,720 --> 00:01:24,590
So if you want a model that has a little
bit more recall but still highly precise,

24
00:01:24,590 --> 00:01:30,760
maybe you set t = 0.8, but if you
really want really, really high recall,

25
00:01:30,760 --> 00:01:35,840
but trying to improve precision
a little bit, maybe set t to 0.2.

26
00:01:35,840 --> 00:01:39,670
And you can navigate that spectrum
to explore the tradeoff between

27
00:01:39,670 --> 00:01:41,370
precision and recall.

28
00:01:41,370 --> 00:01:45,640
Now there doesn't always have to be
a tradeoff, if you have a really

29
00:01:45,640 --> 00:01:49,120
perfect classifier, you might have
a curve that looks like this.

30
00:01:49,120 --> 00:01:54,380
This is kind of the world's ideal
where you have perfect precision

31
00:01:54,380 --> 00:01:56,830
no matter what your recall level.

32
00:01:56,830 --> 00:01:59,210
This line basically never happens.

33
00:01:59,210 --> 00:02:00,510
But that's kind of the ideal.

34
00:02:00,510 --> 00:02:04,600
That's where you're trying to get to,
is that kind of flat line at the top.

35
00:02:04,600 --> 00:02:08,050
So the more your algorithm is
closer to the flat line at the top,

36
00:02:09,080 --> 00:02:10,148
the better it is.

37
00:02:10,148 --> 00:02:14,150
And so precision-recall
curves can be used to compare

38
00:02:14,150 --> 00:02:16,610
algorithms in addition
to understanding one.

39
00:02:16,610 --> 00:02:17,260
So for example,

40
00:02:17,260 --> 00:02:21,890
let's say you have two classifiers,
classifier A and classifier B.

41
00:02:21,890 --> 00:02:28,560
And you see that for every single point,
classifier B is higher than classifier A.

42
00:02:29,810 --> 00:02:33,100
In that case we always
prefer classifier B.

43
00:02:33,100 --> 00:02:37,570
No matter what the threshold is,
classifier B always gives you

44
00:02:37,570 --> 00:02:41,520
a better precision for the same recall,
better precision for same recall.

45
00:02:41,520 --> 00:02:43,630
So B is always better.

46
00:02:43,630 --> 00:02:45,730
However, life is not always this simple.

47
00:02:47,960 --> 00:02:51,830
If there's one thing you should learn
about thus far, is that life and

48
00:02:51,830 --> 00:02:53,990
practice tends to be a bit messy.

49
00:02:53,990 --> 00:02:57,520
And so, often what you observe is not
classifier A and B like we saw, but

50
00:02:57,520 --> 00:03:00,060
it's classifier A and
C like we're seeing over here.

51
00:03:00,060 --> 00:03:07,090
Where there might be one or more cutoff
points, where classifier A does better in

52
00:03:07,090 --> 00:03:11,080
some regions of the precision recal curve
where classifier B does better in others.

53
00:03:11,080 --> 00:03:13,600
So, for example, or C in this case.

54
00:03:13,600 --> 00:03:19,410
So for example, if you're interested in
very high precision but okay with lower

55
00:03:19,410 --> 00:03:24,420
recall, then you should pick classifier C,
because it does better in that region.

56
00:03:24,420 --> 00:03:27,700
It's higher up, closer to that flat line.

57
00:03:27,700 --> 00:03:31,740
But, if you cared about
getting high recall,

58
00:03:31,740 --> 00:03:34,050
then you should choose classifier A.

59
00:03:34,050 --> 00:03:38,590
Because in the high recall regime,
when you pick tease, they're smaller,

60
00:03:38,590 --> 00:03:43,850
then classifier A tends to do better.

61
00:03:43,850 --> 00:03:46,236
So you see,
it's curve is higher over here.

62
00:03:46,236 --> 00:03:55,320
So that's kind of complexity of dealing
with machine learning in the real world.

63
00:03:55,320 --> 00:03:58,500
Now if you just had to
pick one classifier,

64
00:03:58,500 --> 00:04:00,590
the question is how do you decide?

65
00:04:00,590 --> 00:04:03,950
How do you choose between A and
C in this case?

66
00:04:03,950 --> 00:04:08,250
And we often the single number to decide,
as I was hinting at,

67
00:04:08,250 --> 00:04:11,180
depends on where you want to be
in the precision trade off curve.

68
00:04:11,180 --> 00:04:14,400
And there's many metrics out there
to try to do single numbers,

69
00:04:14,400 --> 00:04:18,780
some are called F1 measures,
some called area-under-the-curve.

70
00:04:18,780 --> 00:04:20,800
I'm less fond of those measures,

71
00:04:20,800 --> 00:04:25,450
myself, for a lot of applications
than I am of one that's much simpler.

72
00:04:25,450 --> 00:04:28,570
Which, it's called precision at k.

73
00:04:28,570 --> 00:04:34,210
And let me talk about that because it's
a really simple measure, really useful.

74
00:04:34,210 --> 00:04:39,840
So, let's say that there's five slots
on my website to show sentences.

75
00:04:39,840 --> 00:04:40,990
That's all I care about.

76
00:04:40,990 --> 00:04:44,390
I want to show five great
sentences on my website.

77
00:04:44,390 --> 00:04:46,980
I don't have room for ten million,
for five million, just for five.

78
00:04:48,430 --> 00:04:51,660
And I show five sentences there.

79
00:04:51,660 --> 00:04:54,850
Four were great and one sucked.

80
00:04:54,850 --> 00:04:58,650
I want all five to be great.

81
00:04:58,650 --> 00:05:03,120
So I want my precision for
the five top sentences, for

82
00:05:03,120 --> 00:05:05,650
the top five, to be as good as possible.

83
00:05:05,650 --> 00:05:10,892
In this case,
our precision was four out of five, 0.8.

84
00:05:10,892 --> 00:05:15,088
So I ended up putting a sentence in there
that said, my wife tried the ramen and

85
00:05:15,088 --> 00:05:17,620
it was pretty forgettable.

86
00:05:17,620 --> 00:05:19,580
That's kind of a disappointing
thing to put in.

87
00:05:19,580 --> 00:05:24,236
So for many applications, like recommender
systems for example, where you

88
00:05:24,236 --> 00:05:28,818
go to a web page and somebody showing
you some products you might want to buy,

89
00:05:28,818 --> 00:05:32,419
precision at k is a really good
metric to be thinking about.

90
00:05:32,419 --> 00:05:36,899
[MUSIC]