[MUSIC] So this is a standard implementation of a Gibbs sampler, but we can actually do things that are a little bit fancier. For example, we can do something called collapsed Gibbs sampling. And we're going to describe this in the context of our LDA model. And the idea here is the fact that we can actually analytically marginalize over all of the uncertainty in our model parameters and just sample the word assignment variables. So we never have to sample our topic vocabulary distributions and all the document specific proportions. We just go through and iterate, sampling these word assignment variables. And that seems pretty cool because what we've done is we've dramatically, dramatically reduced the space in which we're exploring in this Gibbs sampler. And so this can actually lead to much better performance in practice because remember, we have very massive vocabularies in general, and so we have this distribution over the entire vocabulary for each one of our topics. So that's a lot of parameters to think about learning. And likewise, for every document in our corpus, we have a set of probabilities over topics being present in that document. So being able to collapse these things away, these model parameters away, and just look at these assignment variables and the documents can be quite helpful. So in pictures what we're saying is that we can completely collapse away all of these model parameters and just look at iteravely resampling our word assignment variables for every word in the document and then every word in the corpus. But there is a caveat here, and that's the fact that we now have to actually sequentially resample our words given all the other words in the corpus. And we didn't discuss this previously but I want to highlight it now. If you look at the uncollapsed sampler, our standard implementation of Gibbs sampling for LDA, when we looked at a given document, we ended up with a data parallel problem for resampling the word indicator variables. You could do each word indicator variable completely in parallel. All you needed to do was condition on the topic vocabulary distributions and the document specific topic proportions and then everything decoupled. There was no dependence on the assignments to other words in the document or in the corpus. But now, when we don't have these model parameters, what informs us of an assignment of a given word to a given topic? Well the other words and the other assignments that were made to those words. So we can think of all those other assignment variables in the corpus as a surrogate for the model parameters. And we'll discuss this in more detail in the coming slides. But the take home message here is that we never have to sample our topic vocabulary distributions or the document specific topic proportions. We just sample these word indicator variables. But we do so sequentially losing the ability to paralyze across that operation. [MUSIC]