[MUSIC] And so, let's actually compute TF/IDF. Now, I can't just compute TF/IDF and this is an important note by the way. Can't just compute TF/IDF for the Obama article in isolation because tf/idf depends on entire corpus. You need that normalizer which is the number of times a word appears in every article. So, I have to show it I have computed for the entire corpus. So let's go ahead and do just that. So here we go, so I'm gonna compute TF/IDF for the corpus. And I'm gonna do this in two steps. First, I'm gonna compute the word counts for the entire corpus. So, I'm gonna add a new column to the people table called word_count. Remember we just did this only for Barack Obama, so now, we're gonna do it for everyone. So I'm gonna call, graphlab.text_analytics.count_words and I'm going got put in the input, which is gonna be the the text column. In other words, I'm gonna count the words in the text column. And just so that we're clear. Then you just print the SFrame people after we do this, so I'm gonna print the first few lines of that SFrame. So here we are, we've executed. Now we have the URI column, the location, the webpage, the name of person, text, and now we have this dictionary of word counts on the right, the new column. Good, so next we're gonna compute the TF/IDFs. So just like with word counts, you can implement your own TF/IDF system, it will take you a little while to do. So, Graphlocate one already implemented, and we're just gonna use that to make this whole process pretty quick, so we're gonna call graphlab.text_analytics. Just like with word count, there's a function here, .tf_idf. And all you need to do is give an input, just like we're gonna give in the input the word_count. word_count, and it will output the TF/IDF. And let me just again show you what that looks like. Oops, I did a little typo here. That should be word_count. That's call not word_counts. And now I'm going through the whole corpus,with 50,000 documents. Computing the frequency of words, normalizing. And what we end up with is a table where for every document, it calls the table docs, it has a dictionary of TF/IDF's for each one of those documents. And just so that we get all this right, I'm gonna add a new column. To the people table, and this new column is gonna be called tfidf, and I'm just gonna store in there the tfidf's that I just computed. So it's all in one table, and it's a docs column. Here we go. We just added it in. So now we have tfidf's for every document computed and stored in there. Let's do some examination. So here's what we gonna do, we gonna Examine the TF-IDF for the Obama article. So just like we examined and sorted the word counts, we're not gonna examine sorting the TF ideas. I'm gonna reread the variable for Obama, because we now added these two new columns in the latest version of that. So I'm gonna take people, and I'm gonna select out of the people, the one whose name is equal to Barack Obama. Okay, I've done it, I've created this Obama, and now, just like we did with word counts, I will create an obama_tfidf_table so that we can sort it too. It's a dictionary. We're just gonna sort to the exactly the same way as we did before. We will stack it and then sort it. And we're gonna do this. Actually, instead of create a table, I'm just gonna do it in one line. Oops, I'm gonna do just in one line here. So I'm gonna write out what we did earlier, so I'm gonna just take the obama variable and when I select only the tfidf column so it looks a little prettier. Then I'm gonna call the stack method, which takes a dictionary and just stacks it into two columns. So I'm gonna stack tfidf. And I'm going to output new column names and those names are going to be the word and the tfidf, and let me show you something. Ooops, I, Forgot to close, click it here. And let me show you a little neat trick that you can use with Python in various ways. I'm just gonna chain the source comment at the end of this. So I'm just gonna type .sort. And I'm gonna sort this output on the tfidf column. And I'm gonna say ascending=false. So what I did in multiple lines before, now I'm doing it in just one line. I'm taking the obama column tfidf. I'm gonna stack it into a word column, tfidf column, and now I'm gonna sort it in descending order. So from highest to lowest. And if you remember, just before we run this. When we did it for just word count you look like this. There was the most popular word, then in, then and, then of, then to, his, then Obama, act, a, and he. So those words are mostly uninformative except for the word Obama. So let's execute it for TF-IDF. And what we see here, voila, the most informative word is Obama which makes a lot of sense because the article is about him. But then you have art, Iraq, control, law, ordered, military, involvement, response, democratic, as in Democratic Party. So you see, lots of action going on here around words that are important, with respect to Obama. [MUSIC]