1
00:00:00,463 --> 00:00:05,303
[MUSIC]

2
00:00:05,303 --> 00:00:11,030
Let's train the sentiment classifier.

3
00:00:11,030 --> 00:00:13,770
And we're gonna do this in two steps.

4
00:00:13,770 --> 00:00:17,520
First, we're gonna do a train
test split of the data.

5
00:00:17,520 --> 00:00:19,380
So I'm gonna compute the training data.

6
00:00:19,380 --> 00:00:21,910
We're gonna split the data
into training data.

7
00:00:21,910 --> 00:00:26,170
And test data just like we talked
in the regression class and

8
00:00:26,170 --> 00:00:29,740
just like we did in
the regression notebook.

9
00:00:29,740 --> 00:00:32,070
So we're gonna use,
we'll take this product table.

10
00:00:33,450 --> 00:00:34,080
Products.

11
00:00:35,490 --> 00:00:39,678
And then we're going to
do the random split.

12
00:00:39,678 --> 00:00:42,709
So, oops.

13
00:00:42,709 --> 00:00:46,740
Products.

14
00:00:46,740 --> 00:00:50,150
And then we're gonna do that random split,

15
00:00:50,150 --> 00:00:55,240
where we're gonna do 80% for
training, 20% for testing.

16
00:00:55,240 --> 00:00:59,473
And just so that you can reproduce this
at home, I'm gonna do the seed = 0.

17
00:00:59,473 --> 00:01:02,400
Just like we discussed in regression.

18
00:01:02,400 --> 00:01:05,510
Normally you wouldn't do this,
you pick another random seed, but

19
00:01:05,510 --> 00:01:07,010
I wanted the random seed to be the same,

20
00:01:07,010 --> 00:01:10,200
so when you do it,
you get exactly the same results I did.

21
00:01:11,340 --> 00:01:16,580
So this is my first step, train test split
on this data set, and now we're ready.

22
00:01:16,580 --> 00:01:21,500
We're gonna build that
famous sentiment model.

23
00:01:23,520 --> 00:01:27,070
And here, we're going to use graphlab and

24
00:01:27,070 --> 00:01:31,640
we're going to use a particular classifier
called the logistic classifier.

25
00:01:31,640 --> 00:01:33,640
And in the course on classification,

26
00:01:33,640 --> 00:01:36,270
we're going to learn a lot about different
kinds of classifiers like logistic

27
00:01:36,270 --> 00:01:40,870
regression, this one, support vector
machines Decision trees and others.

28
00:01:40,870 --> 00:01:43,405
But let's start with just
a logistic classifier.

29
00:01:43,405 --> 00:01:48,230
And just like in you can type
.create after the name and

30
00:01:48,230 --> 00:01:51,050
it'll actually create the classifier for
you.

31
00:01:51,050 --> 00:01:54,340
And as input,
it takes us a few parameters.

32
00:01:54,340 --> 00:02:00,560
So we're gonna take the train data,
as one parameter.

33
00:02:00,560 --> 00:02:03,360
Then we're gonna see that the target,

34
00:02:03,360 --> 00:02:08,140
the thing we're trying to classify,
is the sentiment column.

35
00:02:10,610 --> 00:02:15,980
And then we're gonna have to
tell it what features to use.

36
00:02:15,980 --> 00:02:23,020
So for the features we're going to
use just the word count column.

37
00:02:23,020 --> 00:02:27,930
So this is the new column that
we've created above for word count.

38
00:02:27,930 --> 00:02:32,410
And, I'm going to give
that a validation set.

39
00:02:32,410 --> 00:02:39,647
So the validation set is
going to be my test_data.

40
00:02:39,647 --> 00:02:47,413
So validation_set=test_data.

41
00:02:47,413 --> 00:02:48,490
Okay.

42
00:02:48,490 --> 00:02:51,520
So now we execute the cell.

43
00:02:51,520 --> 00:02:55,470
And we shall be a building
a sentiment classifier model.

44
00:02:55,470 --> 00:02:58,708
And we're only gonna take a few seconds,
and here we go.

45
00:02:58,708 --> 00:03:01,107
It's done.

46
00:03:01,107 --> 00:03:06,330
And you will see [INAUDIBLE] the data, and

47
00:03:06,330 --> 00:03:09,940
the validation accuracy as it goes along
it seems to be getting better and better.

48
00:03:09,940 --> 00:03:13,390
But let's actually do a peer evaluation.

49
00:03:13,390 --> 00:03:17,419
[MUSIC]