[MUSIC] In this module we're going to cover a really popular probabilistic model for document analysis called Latent Dirichlet Allocation or LDA. And LDA is an example of a class of methods called Mixed Membership Modeling. And to start with, let's motivate the use of mixed membership models in the context of our document analysis. So far we've described clustering models where the goal is to group related articles into disjoint sets or clusters where these clusters capture the topics prevalent in the corpus. And in this context every document is assigned to a single topic. A question though is, is an article really about just one topic, like science? In the last module, we talked about soft assignments capturing uncertainty in this cluster assignment, but the clustering model still assumes that each document is assigned to a single topic. We might just have uncertainty in what that assignment is from the observed data. To make this more concrete, let's look at a specific example where we have an article that's entitled Modeling the Complex Dynamics and Changing Correlations of Epileptic Events. And in this article we see words like patience, epilepsy, EG, clinical. So based on the content of this article. Maybe a clustering model would group this article with other articles related to science topic. And maybe that's Cluster 4 in orange for clustering. And what this bar chart represents is simply a one hand encoding of our assignment of this article, two cluster four zi = 4. But this article also has words like is asynchronize, automatic, model, and things like this which might mean that is really an article about technology. And so maybe we should cripple with other text articles which in this case is cluster 2. Well, our soft assignments that we talked about in the last module capture our uncertainty about whether this article should be assigned to cluster 2 or cluster 4, assigned to the science cluster or the technology cluster. But maybe what we really want to capture is the fact that the article's about science and technology. That is, "Zi" is really 2 and 4. And so, I put Zi in quotes because it's not going to be a single variable associated with a document to represent this mixed assignment like what we saw in our clustering model. But in essence, what we're saying is that this document has membership in both of these clusters 2 and 4. And we're going to go through in this module exactly how we think about formunist type of mixed membership assignment. And importantly, the other thing that we're going to want to capture is not only which topics are present in this document, but what's the relative proportion? How prevalent are these topics? So this is where mixed membership models come in because mixed membership models allow us to associate any given data point with a set of different cluster assignments or, in this case, a document with a set of topics. Rather than assuming that every document is associated with just a single cluster or topic or capturing uncertainty in that single assignment like the soft assignments we talked about before. [MUSIC]