1 00:00:00,000 --> 00:00:03,766 [MUSIC] 2 00:00:03,766 --> 00:00:08,734 Now we've seen classification in a wide variety of settings and how we can really 3 00:00:08,734 --> 00:00:14,120 use to predict a class like a positive or negative sentiment from data. 4 00:00:14,120 --> 00:00:18,100 In the regression section, Amy talked about this block diagram that 5 00:00:18,100 --> 00:00:22,750 really describes how a machine learning algorithm iterates through its data. 6 00:00:22,750 --> 00:00:27,470 So now let’s take this same block diagram and work through it and 7 00:00:27,470 --> 00:00:32,810 describe how it works out in the case of classification with sentiment analysis. 8 00:00:32,810 --> 00:00:38,426 So how does it look for classification for 9 00:00:38,426 --> 00:00:42,438 sentiment, so in this case, 10 00:00:42,438 --> 00:00:47,572 the data is the text of the reviews, so 11 00:00:47,572 --> 00:00:52,707 for each review, the text of review is 12 00:00:52,707 --> 00:00:59,299 associated with a particular labeled sentiment 13 00:01:02,773 --> 00:01:07,598 From that text of the review, we feed it to through a feature 14 00:01:07,598 --> 00:01:13,290 extraction phase which gives us x, the inputs to our algorithm. 15 00:01:13,290 --> 00:01:16,800 And this x here is going to be the word counts. 16 00:01:18,110 --> 00:01:23,220 So word counts for every data point, for every review. 17 00:01:24,230 --> 00:01:29,060 Now our Machine Learning model is going to take that input data. 18 00:01:29,060 --> 00:01:32,252 And so the word counts, as well as some several parameters 19 00:01:32,252 --> 00:01:36,108 which I'm calling here w-hat, which are the weights for each word. 20 00:01:40,335 --> 00:01:42,740 Each word. 21 00:01:43,900 --> 00:01:47,200 And from, combining these two, we're gonna output the predictions. 22 00:01:47,200 --> 00:01:49,720 So if the score is greater than zero, it's gonna be positive. 23 00:01:49,720 --> 00:01:51,550 If the score is less than zero, it's gonna be negative. 24 00:01:51,550 --> 00:01:54,713 So this output here is the predicted sentiment. 25 00:02:01,946 --> 00:02:05,840 And if we're just using the model, we would be done here. 26 00:02:05,840 --> 00:02:09,560 But really, in the machine learning algorithm phase, we're gonna evaluate that 27 00:02:09,560 --> 00:02:14,000 result and then feed it back into the algorithm to improve the parameters. 28 00:02:14,000 --> 00:02:19,130 So we're gonna take the predicted sentiment, y-hat and 29 00:02:19,130 --> 00:02:23,070 compare it with the true label for the sentiment. 30 00:02:23,070 --> 00:02:29,960 So the sentiment label for each data point. 31 00:02:29,960 --> 00:02:31,620 So that's gonna fit in and 32 00:02:31,620 --> 00:02:34,840 our quality measure here is gonna be classification accuracy. 33 00:02:38,126 --> 00:02:42,639 Classification accuracy. 34 00:02:42,639 --> 00:02:46,510 And the machine learning algorithm, which we're gonna discuss in more detail 35 00:02:46,510 --> 00:02:50,580 in the classification course, is gonna take that accuracy and try to improve it. 36 00:02:50,580 --> 00:02:55,340 And the way the improvement works, is by updating the parameter w-hat. 37 00:02:55,340 --> 00:02:57,132 And that's what the cycle for 38 00:02:57,132 --> 00:03:00,060 machine learning algorithm classification would look like. 39 00:03:01,350 --> 00:03:04,180 In this module, we've seen how to do classification. 40 00:03:04,180 --> 00:03:07,860 We've looked at various examples of where it can be applied. 41 00:03:07,860 --> 00:03:11,375 We'll talked about a few models for building classifications, 42 00:03:11,375 --> 00:03:14,505 especially in the context of sentiment analysis, we saw some live demos. 43 00:03:14,505 --> 00:03:19,115 And we even built a notebook where we built a classifier from data and 44 00:03:19,115 --> 00:03:20,435 analyzed it. 45 00:03:20,435 --> 00:03:24,642 And with this knowledge, you're ready to build an intelligent application 46 00:03:24,642 --> 00:03:26,592 that uses a classifier at its core. 47 00:03:26,592 --> 00:03:30,649 [MUSIC]