1 00:00:00,463 --> 00:00:05,303 [MUSIC] 2 00:00:05,303 --> 00:00:11,030 Let's train the sentiment classifier. 3 00:00:11,030 --> 00:00:13,770 And we're gonna do this in two steps. 4 00:00:13,770 --> 00:00:17,520 First, we're gonna do a train test split of the data. 5 00:00:17,520 --> 00:00:19,380 So I'm gonna compute the training data. 6 00:00:19,380 --> 00:00:21,910 We're gonna split the data into training data. 7 00:00:21,910 --> 00:00:26,170 And test data just like we talked in the regression class and 8 00:00:26,170 --> 00:00:29,740 just like we did in the regression notebook. 9 00:00:29,740 --> 00:00:32,070 So we're gonna use, we'll take this product table. 10 00:00:33,450 --> 00:00:34,080 Products. 11 00:00:35,490 --> 00:00:39,678 And then we're going to do the random split. 12 00:00:39,678 --> 00:00:42,709 So, oops. 13 00:00:42,709 --> 00:00:46,740 Products. 14 00:00:46,740 --> 00:00:50,150 And then we're gonna do that random split, 15 00:00:50,150 --> 00:00:55,240 where we're gonna do 80% for training, 20% for testing. 16 00:00:55,240 --> 00:00:59,473 And just so that you can reproduce this at home, I'm gonna do the seed = 0. 17 00:00:59,473 --> 00:01:02,400 Just like we discussed in regression. 18 00:01:02,400 --> 00:01:05,510 Normally you wouldn't do this, you pick another random seed, but 19 00:01:05,510 --> 00:01:07,010 I wanted the random seed to be the same, 20 00:01:07,010 --> 00:01:10,200 so when you do it, you get exactly the same results I did. 21 00:01:11,340 --> 00:01:16,580 So this is my first step, train test split on this data set, and now we're ready. 22 00:01:16,580 --> 00:01:21,500 We're gonna build that famous sentiment model. 23 00:01:23,520 --> 00:01:27,070 And here, we're going to use graphlab and 24 00:01:27,070 --> 00:01:31,640 we're going to use a particular classifier called the logistic classifier. 25 00:01:31,640 --> 00:01:33,640 And in the course on classification, 26 00:01:33,640 --> 00:01:36,270 we're going to learn a lot about different kinds of classifiers like logistic 27 00:01:36,270 --> 00:01:40,870 regression, this one, support vector machines Decision trees and others. 28 00:01:40,870 --> 00:01:43,405 But let's start with just a logistic classifier. 29 00:01:43,405 --> 00:01:48,230 And just like in you can type .create after the name and 30 00:01:48,230 --> 00:01:51,050 it'll actually create the classifier for you. 31 00:01:51,050 --> 00:01:54,340 And as input, it takes us a few parameters. 32 00:01:54,340 --> 00:02:00,560 So we're gonna take the train data, as one parameter. 33 00:02:00,560 --> 00:02:03,360 Then we're gonna see that the target, 34 00:02:03,360 --> 00:02:08,140 the thing we're trying to classify, is the sentiment column. 35 00:02:10,610 --> 00:02:15,980 And then we're gonna have to tell it what features to use. 36 00:02:15,980 --> 00:02:23,020 So for the features we're going to use just the word count column. 37 00:02:23,020 --> 00:02:27,930 So this is the new column that we've created above for word count. 38 00:02:27,930 --> 00:02:32,410 And, I'm going to give that a validation set. 39 00:02:32,410 --> 00:02:39,647 So the validation set is going to be my test_data. 40 00:02:39,647 --> 00:02:47,413 So validation_set=test_data. 41 00:02:47,413 --> 00:02:48,490 Okay. 42 00:02:48,490 --> 00:02:51,520 So now we execute the cell. 43 00:02:51,520 --> 00:02:55,470 And we shall be a building a sentiment classifier model. 44 00:02:55,470 --> 00:02:58,708 And we're only gonna take a few seconds, and here we go. 45 00:02:58,708 --> 00:03:01,107 It's done. 46 00:03:01,107 --> 00:03:06,330 And you will see [INAUDIBLE] the data, and 47 00:03:06,330 --> 00:03:09,940 the validation accuracy as it goes along it seems to be getting better and better. 48 00:03:09,940 --> 00:03:13,390 But let's actually do a peer evaluation. 49 00:03:13,390 --> 00:03:17,419 [MUSIC]