1 00:00:00,000 --> 00:00:04,595 [MUSIC] 2 00:00:04,595 --> 00:00:08,310 Now we are ready to describe our logistic regression model. 3 00:00:08,310 --> 00:00:14,380 It takes a score as input that ranges from minus infinity to plus infinity and 4 00:00:14,380 --> 00:00:17,830 is actually w transposed h of x i. 5 00:00:17,830 --> 00:00:19,780 It pushes it through the sigmoid function. 6 00:00:20,930 --> 00:00:27,250 To estimate the probability that y=+1 given xi and w. 7 00:00:27,250 --> 00:00:32,171 So what does that mean more explicitly is 8 00:00:32,171 --> 00:00:36,223 that this probability is equal to 9 00:00:36,223 --> 00:00:41,151 1/1+e to the minus score of (x)i. 10 00:00:42,260 --> 00:00:47,280 Which is the same as saying 1 / 11 00:00:47,280 --> 00:00:53,280 1+e to the -w transpose h(xi). 12 00:00:53,280 --> 00:00:58,740 And we can just for fun write out 13 00:00:58,740 --> 00:01:02,330 that w transpose h explicitly. 14 00:01:02,330 --> 00:01:09,511 So it's 1 + e to the power 15 00:01:09,511 --> 00:01:14,735 of -w0h0 of xi + 16 00:01:14,735 --> 00:01:20,611 w1h1 of xi + dot dot 17 00:01:20,611 --> 00:01:25,833 dot + w capital D, 18 00:01:25,833 --> 00:01:29,110 h D(x i). 19 00:01:29,110 --> 00:01:31,779 [SOUND] And now we have it, 20 00:01:31,779 --> 00:01:37,640 that's what a logistic regression model looks like. 21 00:01:37,640 --> 00:01:42,560 It predicts the output, what's the probability of a positive sentiment 22 00:01:42,560 --> 00:01:45,270 given the input x and the parameter is w. 23 00:01:45,270 --> 00:01:49,980 Now let's take a moment to understand 24 00:01:49,980 --> 00:01:52,910 the logistic regression model a little bit better. 25 00:01:52,910 --> 00:01:59,260 So as input we have this core of a sentence x or 26 00:01:59,260 --> 00:02:04,280 any other input that we have, and as output we have the probability 27 00:02:04,280 --> 00:02:11,250 that the label is +1 given the input x and the parameters w. 28 00:02:11,250 --> 00:02:17,870 And that's the 1 over e to the power of -w transpose h of x. 29 00:02:18,900 --> 00:02:24,430 Now, if the score is zero and 30 00:02:24,430 --> 00:02:30,573 I'm going to draw it like this, we have that this probability is 0.5. 31 00:02:30,573 --> 00:02:35,330 So, if the score is zero, the probability is 0.5. 32 00:02:35,330 --> 00:02:42,010 Now, what do I observe first hand, is everything to the left of zero. 33 00:02:42,010 --> 00:02:46,200 Has score less than zero, so we should be predicting 34 00:02:46,200 --> 00:02:51,430 that this point on the left have y hat equals minus one and 35 00:02:51,430 --> 00:02:54,900 everything to the right of zero has score greater than zero. 36 00:02:54,900 --> 00:03:00,250 So we should be predicting that y hat on the right side is equal to plus one. 37 00:03:00,250 --> 00:03:01,830 So let's see that in action. 38 00:03:01,830 --> 00:03:07,893 So for example, let's say that we had the score of minus two, 39 00:03:07,893 --> 00:03:11,630 what would happen to our prediction? 40 00:03:11,630 --> 00:03:16,935 So we say the probability of y = +1 is actually 0.12 41 00:03:16,935 --> 00:03:22,710 if you plugged that in so minus two gives you 0.12. 42 00:03:22,710 --> 00:03:25,721 If you have plus two. 43 00:03:27,147 --> 00:03:32,781 And you push it to the right side, you get 0.88. 44 00:03:32,781 --> 00:03:37,178 So if the score is +2, it's 0.88. 45 00:03:37,178 --> 00:03:42,784 Is it a surprise to you that 0.12 + 0.8 adds up to 1? 46 00:03:43,970 --> 00:03:48,824 It's not a surprise because the probably of y = plus 1 plus the probability of y 47 00:03:48,824 --> 00:03:53,060 = minus 1 adds up to 1, and sigmoid is a symmetric function, so 48 00:03:53,060 --> 00:03:55,110 everything is working out exactly the way we hoped for. 49 00:03:56,730 --> 00:04:03,630 Now if the score is bigger, let's say the score is four, 50 00:04:03,630 --> 00:04:10,080 we should still output a y = +1, but we should be more sure. 51 00:04:10,080 --> 00:04:14,906 So let's push that through if the score is four 52 00:04:14,906 --> 00:04:18,991 look we're getting really big here and 53 00:04:18,991 --> 00:04:24,086 the prediction of the probability is 0.98. 54 00:04:24,086 --> 00:04:29,488 In other words, for the points where the score 55 00:04:29,488 --> 00:04:34,196 is less than zero you see the probability 56 00:04:34,196 --> 00:04:38,628 is less than 0.5 of being y = +1, 57 00:04:38,628 --> 00:04:44,880 which implies that we've output a y hat of minus one. 58 00:04:44,880 --> 00:04:49,230 Well for the ones where the score is positive we're 59 00:04:49,230 --> 00:04:52,580 going to output y hat is equal to plus one. 60 00:04:55,670 --> 00:04:59,476 And here we see in action the logistic regression model and 61 00:04:59,476 --> 00:05:03,138 how it has the characteristics that we're hoping for. 62 00:05:03,138 --> 00:05:07,179 [MUSIC]