1 00:00:00,125 --> 00:00:04,581 [MUSIC] 2 00:00:04,581 --> 00:00:07,653 Now that we've seen the logistic regression model and 3 00:00:07,653 --> 00:00:09,987 we understand it in quite a bit of detail, 4 00:00:09,987 --> 00:00:14,250 let's give a quick overview of what learning means for this model. 5 00:00:14,250 --> 00:00:17,020 Although in the next module we're going to go into quite a lot of detail to 6 00:00:17,020 --> 00:00:18,210 learning algorithm. 7 00:00:18,210 --> 00:00:21,610 This is just a little primer, a teaser for what's coming in the next module. 8 00:00:21,610 --> 00:00:24,850 Now we're going to start from some data and 9 00:00:24,850 --> 00:00:29,320 that data has inputs x and outputs either plus one or minus one. 10 00:00:29,320 --> 00:00:32,970 As we said we're going to split that into a training set and a validation set. 11 00:00:32,970 --> 00:00:36,310 And from the training set we're going to run a learning algorithm that 12 00:00:36,310 --> 00:00:39,740 will output the parameter estimates w hat. 13 00:00:39,740 --> 00:00:43,070 And those w hats are going to be plugged into the model. 14 00:00:43,070 --> 00:00:48,500 To estimate the probability that an input sentence is either positive or negative. 15 00:00:48,500 --> 00:00:53,273 And, of course, we can use the learn model to take the validation set and 16 00:00:53,273 --> 00:00:58,219 estimate how good it is, what the quality metrics are, what the error is. 17 00:00:59,490 --> 00:01:03,170 Now to find the best classifier we are going to define the quality metric. 18 00:01:03,170 --> 00:01:07,170 In this case the quality metric is going to be called the likelihood function. 19 00:01:07,170 --> 00:01:12,160 So very possible parameters serve coefficient w for example w0, w1, w2, 20 00:01:12,160 --> 00:01:17,530 we will be able to score it according to, l of w to figure out how good it is. 21 00:01:17,530 --> 00:01:21,222 So for example if I take this data set of plusses and minuses and 22 00:01:21,222 --> 00:01:25,419 learn the line shown in green, we might get a particular likelihood. 23 00:01:25,419 --> 00:01:29,908 So, for example, if the parameter w0 is 0, w1 is 1 but 24 00:01:29,908 --> 00:01:35,693 w2 is -1.5, the likelihood might be 10 to the -6, pretty small. 25 00:01:35,693 --> 00:01:38,200 These numbers actually tend to be pretty small. 26 00:01:38,200 --> 00:01:45,470 For this alternative line, where w0 is now 1, w1 is still 1, w2 is -1.5, 27 00:01:45,470 --> 00:01:49,900 the likelihood function is a little better 10 to the -5 instead of 10 to the -6. 28 00:01:49,900 --> 00:01:57,475 But perhaps for this best line over here, where w0 is 1, w1 is 0.5 and w2 is -1.5. 29 00:01:57,475 --> 00:02:01,550 You get the best likelihood, 10 to the -4. 30 00:02:01,550 --> 00:02:05,490 So we'd like an approach that would actually search for 31 00:02:05,490 --> 00:02:08,680 the possible values of w to find the best line. 32 00:02:09,710 --> 00:02:14,624 And, as we will see in the next module, we'll use a gradient ascent algorithm to 33 00:02:14,624 --> 00:02:19,197 find the set of parameters w that has the highest likelihood best quality. 34 00:02:19,197 --> 00:02:23,869 [MUSIC]