1 00:00:00,000 --> 00:00:04,844 [MUSIC] 2 00:00:04,844 --> 00:00:08,395 Now that we had the chance to review the basics of probability, 3 00:00:08,395 --> 00:00:12,350 let's figure out how it can be useful in the classification problem. 4 00:00:13,420 --> 00:00:17,412 So, if we take our two sentences, one where was definite plus one, 5 00:00:17,412 --> 00:00:21,064 definite positive, the sushi and everything were awesome! 6 00:00:21,064 --> 00:00:27,160 And the other one I was not sure, the sushi was good, the service was okay. 7 00:00:27,160 --> 00:00:31,631 For the first one, you can say that the probability that it's a positive review 8 00:00:31,631 --> 00:00:33,443 for this sentence is very high. 9 00:00:33,443 --> 00:00:39,265 So, the probability that y equals plus one, given the sentence is 0.99. 10 00:00:39,265 --> 00:00:44,987 On the other one though, the probability of y equals plus 1 given the sentence, 11 00:00:44,987 --> 00:00:50,814 given x equals the sushi was good, the service was okay, that's only 0.55. 12 00:00:50,814 --> 00:00:54,132 And in general, many classifiers output this degree of beliefs, 13 00:00:54,132 --> 00:00:55,264 or this probability. 14 00:00:55,264 --> 00:00:59,535 So the probability of the wide output label y given input x, and 15 00:00:59,535 --> 00:01:02,688 it's going to be extremely useful in practice. 16 00:01:02,688 --> 00:01:08,609 So let's go through a little bit of an example of what that means. 17 00:01:08,609 --> 00:01:13,690 So let's say we're given an input data set with N data points. 18 00:01:13,690 --> 00:01:18,740 They have inputs, number of awesomes, number of awfuls and the labels y. 19 00:01:19,960 --> 00:01:24,460 And I use it to train the classifier that outputs these probabilities, 20 00:01:24,460 --> 00:01:26,794 the predictions we're going to call that P hat, or 21 00:01:26,794 --> 00:01:31,850 estimate of the predictions which are going to spend on the parameters w hat, or 22 00:01:31,850 --> 00:01:34,080 the coefficients w hat for our model. 23 00:01:35,550 --> 00:01:41,540 And so P hat is going to be useful for predicting y hat, 24 00:01:41,540 --> 00:01:46,990 the predictive class, which in our case is the sentiment for senders. 25 00:01:46,990 --> 00:01:48,740 So, let's see how that works. 26 00:01:48,740 --> 00:01:54,090 What we're going to do is learn this P hat estimated from data, 27 00:01:54,090 --> 00:01:57,100 and use it to predict the most likely class. 28 00:01:57,100 --> 00:02:00,910 So in particular, if I'm giving you some input sentence and 29 00:02:00,910 --> 00:02:05,541 I compute the probability the y is plus one, it's a positive review given 30 00:02:05,541 --> 00:02:10,170 the sentence and that's greater then 0.5, I say the y hat is plus one, 31 00:02:10,170 --> 00:02:14,203 its a positive sentence and if you say that its less than 0.5, 32 00:02:14,203 --> 00:02:18,053 then we say it's a negative sentence, so y hat is minus one. 33 00:02:18,053 --> 00:02:20,153 But we're not just going to get that, 34 00:02:20,153 --> 00:02:23,448 P hat is going to give us more kind of interpretive output. 35 00:02:23,448 --> 00:02:27,858 So it's not going to say just plus one and minus one, but 36 00:02:27,858 --> 00:02:33,262 it's going to tell us how sure we are that this is a positive review. 37 00:02:33,262 --> 00:02:35,949 [MUSIC]