1 00:00:00,343 --> 00:00:04,183 [MUSIC] 2 00:00:04,183 --> 00:00:07,522 And as part of using this data set, 3 00:00:07,522 --> 00:00:13,625 the first thing that we're gonna do, just like in the module, 4 00:00:13,625 --> 00:00:18,013 we're going to build a word-count vector. 5 00:00:18,013 --> 00:00:25,513 Build the word count vector for each review. 6 00:00:28,463 --> 00:00:33,357 Now, normally, you'd have to implement this and explain something that goes 7 00:00:33,357 --> 00:00:38,410 review, separates words, called tokenizing, build the count vector. 8 00:00:38,410 --> 00:00:43,230 But one of the nice things about using the tools here in this 9 00:00:43,230 --> 00:00:48,190 course is that with just one command, we can build that word count vector. 10 00:00:50,030 --> 00:00:53,650 So in products, I'm going to add a new column 11 00:00:53,650 --> 00:00:58,910 called word_count, which is gonna start my word count. 12 00:00:58,910 --> 00:01:02,782 And if you just call graphlab.text_analytics, 13 00:01:02,782 --> 00:01:07,102 it's a text analytics toolbox for a bunch of functions, 14 00:01:07,102 --> 00:01:10,080 there is one called count_words. 15 00:01:10,080 --> 00:01:13,182 Notice there's also one called count_ngrams if you wanna use bi-grams, 16 00:01:13,182 --> 00:01:14,110 tri-grams and so on. 17 00:01:15,388 --> 00:01:20,530 And as input, I'm going to give the same products as frame, 18 00:01:20,530 --> 00:01:25,070 but I'm going to ask it to count the words in the review column. 19 00:01:26,600 --> 00:01:29,020 And so we're gonna execute that and it's done. 20 00:01:29,020 --> 00:01:33,000 And so now, if we take another look 21 00:01:33,000 --> 00:01:37,610 at the products table, so at the head of the table you'll 22 00:01:39,240 --> 00:01:43,910 see that now we have a fourth column with the word_count. 23 00:01:43,910 --> 00:01:46,750 So we're gonna explore that a little bit more soon. 24 00:01:46,750 --> 00:01:51,022 But you see, for this first review, include the word and 25 00:01:51,022 --> 00:01:53,310 five times, stink once. 26 00:01:53,310 --> 00:01:58,470 Probably that's why it wasn't a good products review, but there's others. 27 00:02:01,560 --> 00:02:05,429 [MUSIC]