We've now seen linear classifiers,
in particular, logistic and reaction is a really core
example of those, and how to learn them from data using
gradient descent algorithms. So we're now ready to
build those classifiers. However, when we go into practical
settings we have to think about over fitting, which is very significant problem
in machine learning and in particular for logistic aggression can
be really troublesome. So let's see how we can avoid
over fitting in this setting. In order to explore the concept of over
fitting we need to better understand how we measure error in
classification in general. So for a classifier, we typically start
with some data that has been labeled as positive or negative reviews, and then
we split that data into a training set which we use to train our model and
a validation set, which would then take the results of the learned model and
use it to evaluate our classifier. So let's talk a little bit about the
evaluation of a classifier, in general. As we discussed in the first course. We measure a classifier's performance
using what's called classification error. For example, if I give a sentence,
the sushi was great, and it's labeled positive, and I want to
measure my error in that sentence, what I do is I feed the sentence
into my classifier. The sushi was great, but I hide that
label, so that the classifier doesn't get to know whether the sentence
was labeled as positive or negative in the training data. And then I compare the output, y hat
of my classifier with the true label. In this case they both agree, so
the classifier is correct, so I add one to the correct column. However, if I take another sentence,
so for example, I take the sentence, the food was okay, which in the training
set was labeled as a negative example. But this is a little bit of
a euphemism from America. Is it positive, is it negative? People say okay when they mean bad stuff. The classifier might not be familiar
with that kind of cultural jargon. So when you hide the label, the classifier
might say y why hat is positive but really it's a negative label. And the classifiers then made a mistake. So now we put plus one
in the mistake column. So in general we're going to go example
by example in a validation set and measure which one's we got right and
which one's we made a mistake. And now we can measure what's called the
error or classification error in our data. So let me turn my pen on here,
and I'm going to use white here. So it's really simple. There are measures the fraction of
data points where we made mistakes. So it's the ratio between the number of mistakes that I made and the total number of data points. Sometimes we also talk
about accuracy as well, which is one minus
the number of the error. So here, we talk about,
instead of the number of mistakes. It's the number of data
points where we got them correct divided by the total
number of data points. Very good. [MUSIC]