[MUSIC] To build up to mixed membership models for documents though, it's helpful to first present an alternative clustering model to the mixture of Gaussian model we presented in the last module. So, just to emphasize, we're going back to our clustering model, where we're going to assume this more simple structure where every document is assigned to a single topic. But so far, when we've looked at our documents, we've represented them with this tf-idf vector and then we've taken all these tf-idf vectors associated with every document in the corpus and we've used a mixture of Gaussians to discover some set of clusters in this tf-idf space. But now what we're going to do is an alternative representation of a document called a Bag-of-words representation. Where we simply take, all of the words that are present in our document, throw them into a bag, and then shake that bag up, so that the order of the words doesn't matter. So our representation of the document is simply going to be an unordered set of words, but I use set loosely here because this set is going to have multiple occurrences of a unique word if that word appears multiple times in the document. So the multiplicity of the unique words matters here, unlike in standard sets. So this is formally called a multiset. So now let's present a clustering model for this new document representation, and to start with we need to specify the prior probability that a given document is associated with a specific cluster. And these are topic prevalences, are going to be exactly like what we had in our mixture of Gaussian case, where these just represent corpus wide prevalence of topics. But now, our likelihood term is going to be different because, instead of scoring every document, under a specific Gaussian, like in the mixture of Gaussian case, we're going to take our document and its bag-of-words representation, and we're going to score this set of words under a topic probability vector over words. Okay, so specifically every topic is going to be associated with a probability distribution over words in the vocabulary and using that, we're able to score the words present in this document. To say how probable are they under this specific topic and then we do this for every topic and we choose between topics using both this prior and the likelihood just like in our mixture of Gaussian example. So, just to be very clear, for every topic like one's about science, and technology, and sports, even though, of course, we don't have those labels, they're just going to be cluster one, two, three. We're going to have a probability vector over words in the vocabulary. And the way I'm showing them here on this slide is ordered by how probable those words are in the topic from most probable to least probable. Whereas, in the previous slide, I was just listing the words that actually appeared in the dataset or in that specific article. So now we can compare and contrast between our mixture of Gaussian clustering model and the clustering model that we just specified. So in both of these models are prior topic probabilities. So the probability before we actually look at the content of a document, that that document came from a given cluster are given by these pi ks and they're specified in exactly the same way in both cases. But in the mixture of Gaussian case, our documents were represented by these tf-idf vectors or some vector, could be a word count vector. And we scored that vector under each one of the Gaussians. So remember each cluster was defined by a Gaussian and you would compute the score of a given data point under each of these Gaussians. And then weigh these prior and likelihood terms to come up with our assignments for a given document. But now every document is represented with this bag-of-words representation. And when we go to score the document, we're just going to look at the probability of each of these words under the topic specific probability vector over words. [MUSIC]