1 00:00:03,630 --> 00:00:06,840 So far we talked about classification in terms of just predicting, 2 00:00:06,840 --> 00:00:11,744 is this a positive sentence or a negative sentence, is email spam or not spam, 3 00:00:11,744 --> 00:00:15,210 but in general, you wanna go a little bit beyond that and 4 00:00:15,210 --> 00:00:19,550 ask about, what is the probability that this is email spam? 5 00:00:19,550 --> 00:00:22,500 So how confident am I on the prediction? 6 00:00:22,500 --> 00:00:24,260 So if you're just looking at positive and negative, 7 00:00:24,260 --> 00:00:27,200 I want to know how sure I am about the prediction. 8 00:00:27,200 --> 00:00:30,550 So for example, if you take a sentence like, the sushi and 9 00:00:30,550 --> 00:00:32,110 everything else were awesome. 10 00:00:33,540 --> 00:00:35,920 That's a definite plus. 11 00:00:35,920 --> 00:00:37,730 Definitely positive. 12 00:00:37,730 --> 00:00:41,330 However, the sushi was good, the service was okay. 13 00:00:41,330 --> 00:00:43,890 It's probably a plus but I'm not so sure. 14 00:00:43,890 --> 00:00:45,810 It's not as definite. 15 00:00:45,810 --> 00:00:51,824 And so what a classifier will often do, is not just output is positive or 16 00:00:51,824 --> 00:00:56,800 negative, but output how confident, how sure it is. 17 00:00:56,800 --> 00:00:59,440 One way to do that is to talk about probabilities. 18 00:00:59,440 --> 00:01:03,352 So you have to play the probability of being a positive or 19 00:01:03,352 --> 00:01:07,570 negative sentence, given the input sentence x. 20 00:01:07,570 --> 00:01:08,810 So the output label, 21 00:01:08,810 --> 00:01:12,190 what's the probability output label, given the input sentence? 22 00:01:12,190 --> 00:01:19,219 So for example, for the top example there, instead of saying that's definite +, 23 00:01:19,219 --> 00:01:25,530 we say the probability that it's a + given x in is 0.99. 24 00:01:25,530 --> 00:01:29,880 Even x is only 0.55 because I'm kind of uncertain about that. 25 00:01:29,880 --> 00:01:34,149 Predicting probabilities or level of confidence is extremely important and 26 00:01:34,149 --> 00:01:38,430 as we'll see in the classification course, it allows you to do many things. 27 00:01:38,430 --> 00:01:42,786 So for example when you know the probability, you can make decisions like, 28 00:01:42,786 --> 00:01:46,667 what is a good decision boundary that trades off false positives and 29 00:01:46,667 --> 00:01:49,480 false negatives, and balance between the two. 30 00:01:49,480 --> 00:01:53,749 [MUSIC]