[MUSIC] In the regression model, we talked
about predicting house prices and fitting a regression model for that and we measured the error in
terms of sum-squared errors. Here in classification, our errors
are a little different because we are talking about what inputs we get
correct and which inputs we get wrong. So let's talk a little bit about
measuring error in classification. So when I learn a classifier,
I'm given a set of input data. So these are sentences that have been
marked to say positive or negative sentiment, and as in regression, we split
it into a training set and a test set. I feed the training set to
the classifier I'm trying to learn and that algorithm is actually going
to learn the weights for words. So for example it's going to
learn that good has a weight 1.0. Awesome, 1.7. Bad, -1.0. And awful, -3.3. And then, these weights are going to
be used to score every element in the test set and evaluate how good
we're doing in terms of classification. So lets talk about what
that evaluation looks like. Let's discuss how we measure error,
in fact, classification error,
when we're doing this classification. So we're getting a set of
test examples of the form, sushi was great, is a positive sentence,
and we're trying to figure out how many of these test sentences we get correct and
how many do we get make mistakes on. So what we are going to do is take
the sentence sushi is great and feed it through the classifier,
through the learned classifier. But we don't want the learned classifier
to actually see the true label. We're gonna see if it gets
the true label right. So we're gonna hide that true label. So the sentence gets fed to the learned
classifier while the true label is hidden. And now given the sentence, we're
gonna predict y hat as being positive. So we leave this as a positive sentence
and so, we've made a correct prediction. So the number of correct
sentences goes up by one. Now let's take another sentence,
another test example. So let's say you say the food
was okay as a negative sentence. So that's a bit of
a ambiguous sentence but it's been labeled as negative
in the training set. So now I feed the sentence to
the classifier, I hide the label. And let's see what the classifier does. In this case, cuz the food was
okay can be revealed as positive, maybe it makes a prediction that this
is positive sentence I made a mistake, cuz the true label is negative. So say hey, mistake was made. We now have one more mistake. So we have one correct classification and
one mistake. Now, we do this for
every sentence in the corpus. There are two common measures
of quality in classification. So for example,
one of them is the notion of error. So error measures, the fraction of the
test examples that we make mistakes on. So what we just do is say, out of all
of the sentences that are classified, how many mistakes there are made,
so number of mistakes, and I divide that by the total
number of test sentences. So for example if there
were 100 test sentences and I made ten mistakes then our
error would be 0.1 or 10%. Now the best possible error that
I can make is zero basically, I make no mistakes. Now, it's common to instead
of talk about error, to also talk about accuracy
of your classifier. So accuracy is exactly opposite of that. So, in accuracy,
instead of measuring the number of errors, we measure the number of
correct classifications. So the ratio here is number of correct divided by total number of sentences. And unlike error where the best possible
value is zero, in terms of accuracy, the best possible value is 1,
I've got all the sentences right. And in fact there's a really natural
relationship between the two. We know that error = 1- accuracy, and vise versa. [MUSIC]