[MUSIC] We saw how we could change the threshold
from zero to one for deciding what's a positive value to navigate
between the optimistic classifier and the pessimistic classifier. There's an actually really intuitive
visualization that does this. It's called a precision-recall curve. Precision-recall curves
are extremely useful to understanding how
a classifier is performing. So in this case, you can imagine
setting two points in that curve. What happens to
the precision once you have, when the threshold is very close to one? Well the precision is going to be one
because we're going to get everything right, there's very, very few things and
very sure they're going to correct. But the recall's going to be zero because
you're going to say everything's bad, everything else is bad,
so that's pessimistic. On the other extreme, our precision recall
curve, the point on the bottom there, is a point where the optimistic point
where you have very high recall because you're going to find all the positive
data points, but very low precision, because you're going to find all sorts of
other stuff and say that's still good. And so that happens when t is very small,
close to zero. Now if you keep varying t, you have a spectrum of tradeoffs
between precision and recall. So if you want a model that has a little
bit more recall but still highly precise, maybe you set t = 0.8, but if you
really want really, really high recall, but trying to improve precision
a little bit, maybe set t to 0.2. And you can navigate that spectrum
to explore the tradeoff between precision and recall. Now there doesn't always have to be
a tradeoff, if you have a really perfect classifier, you might have
a curve that looks like this. This is kind of the world's ideal
where you have perfect precision no matter what your recall level. This line basically never happens. But that's kind of the ideal. That's where you're trying to get to,
is that kind of flat line at the top. So the more your algorithm is
closer to the flat line at the top, the better it is. And so precision-recall
curves can be used to compare algorithms in addition
to understanding one. So for example, let's say you have two classifiers,
classifier A and classifier B. And you see that for every single point,
classifier B is higher than classifier A. In that case we always
prefer classifier B. No matter what the threshold is,
classifier B always gives you a better precision for the same recall,
better precision for same recall. So B is always better. However, life is not always this simple. If there's one thing you should learn
about thus far, is that life and practice tends to be a bit messy. And so, often what you observe is not
classifier A and B like we saw, but it's classifier A and
C like we're seeing over here. Where there might be one or more cutoff
points, where classifier A does better in some regions of the precision recal curve
where classifier B does better in others. So, for example, or C in this case. So for example, if you're interested in
very high precision but okay with lower recall, then you should pick classifier C,
because it does better in that region. It's higher up, closer to that flat line. But, if you cared about
getting high recall, then you should choose classifier A. Because in the high recall regime,
when you pick tease, they're smaller, then classifier A tends to do better. So you see,
it's curve is higher over here. So that's kind of complexity of dealing
with machine learning in the real world. Now if you just had to
pick one classifier, the question is how do you decide? How do you choose between A and
C in this case? And we often the single number to decide,
as I was hinting at, depends on where you want to be
in the precision trade off curve. And there's many metrics out there
to try to do single numbers, some are called F1 measures,
some called area-under-the-curve. I'm less fond of those measures, myself, for a lot of applications
than I am of one that's much simpler. Which, it's called precision at k. And let me talk about that because it's
a really simple measure, really useful. So, let's say that there's five slots
on my website to show sentences. That's all I care about. I want to show five great
sentences on my website. I don't have room for ten million,
for five million, just for five. And I show five sentences there. Four were great and one sucked. I want all five to be great. So I want my precision for
the five top sentences, for the top five, to be as good as possible. In this case,
our precision was four out of five, 0.8. So I ended up putting a sentence in there
that said, my wife tried the ramen and it was pretty forgettable. That's kind of a disappointing
thing to put in. So for many applications, like recommender
systems for example, where you go to a web page and somebody showing
you some products you might want to buy, precision at k is a really good
metric to be thinking about. [MUSIC]