[MUSIC] We're now ready to present our Latent Dirichlet allocation mixed membership model. And remember, here our goal is to associate with the document a collection of topics present in that document as well as their relative proportions in the document. So how prevalent those topics are. And remember in the clustering model that we just presented for our bag of words representation of a document, we introduce a set of topic specific vocabulary distributions. So associating probabilities over every word in the vocabulary, specific to each topic, and then the clustering model look to assign an entire document to a single topic. And for this assignment, it would score all of the words in the document under that topic distribution. And then the model also introduces a corpus wide distribution over the prevalence of topics throughout all the documents in this corpus. So that is, how likely for any given document that we grab that it's from one of these sets of topics? In Latent Dirichlet allocation we introduced this same set of topic-specific vocabulary distributions, but now when we look at a given document We're looking to assign every word to a single topic. Instead of assigning the entire document, every word is going to have an assignment variable Z-I-W. So for the wth word in document i, what topic it's assigned to. Then when we go to score a document, we're going to score all of the words under each of these assigned topics. So, scoring all the orange words under the orange topic, blue words under the blue topic and so on. And that's going to determine how good those assignments are. And finally, there's another thing different in LDA. Instead of introducing a corpus wide distribution on this topic prevalences, each document is going to have a distribution over the prevalence of topics in that document. So now instead of pi, globally throughout the corpus we have pi i specific to the document i. And what this vector is going to represent are our desired topic prevalences in this specific document. [MUSIC]