1 00:00:00,000 --> 00:00:05,041 [MUSIC] 2 00:00:05,041 --> 00:00:07,656 So this is a standard implementation of a Gibbs sampler, but 3 00:00:07,656 --> 00:00:10,450 we can actually do things that are a little bit fancier. 4 00:00:10,450 --> 00:00:13,060 For example, we can do something called collapsed Gibbs sampling. 5 00:00:14,140 --> 00:00:17,390 And we're going to describe this in the context of our LDA model. 6 00:00:17,390 --> 00:00:22,870 And the idea here is the fact that we can actually analytically marginalize 7 00:00:22,870 --> 00:00:27,720 over all of the uncertainty in our model parameters and 8 00:00:27,720 --> 00:00:31,090 just sample the word assignment variables. 9 00:00:31,090 --> 00:00:35,510 So we never have to sample our topic vocabulary distributions and 10 00:00:35,510 --> 00:00:38,230 all the document specific proportions. 11 00:00:38,230 --> 00:00:42,380 We just go through and iterate, sampling these word assignment variables. 12 00:00:42,380 --> 00:00:45,790 And that seems pretty cool because what we've done is we've dramatically, 13 00:00:45,790 --> 00:00:50,780 dramatically reduced the space in which we're exploring in this Gibbs sampler. 14 00:00:50,780 --> 00:00:55,990 And so this can actually lead to much better performance in practice 15 00:00:55,990 --> 00:01:01,730 because remember, we have very massive vocabularies in general, and so 16 00:01:01,730 --> 00:01:06,450 we have this distribution over the entire vocabulary for each one of our topics. 17 00:01:06,450 --> 00:01:09,280 So that's a lot of parameters to think about learning. 18 00:01:09,280 --> 00:01:13,600 And likewise, for every document in our corpus, we have a set of 19 00:01:13,600 --> 00:01:18,750 probabilities over topics being present in that document. 20 00:01:18,750 --> 00:01:24,580 So being able to collapse these things away, these model parameters away, and 21 00:01:24,580 --> 00:01:29,750 just look at these assignment variables and the documents can be quite helpful. 22 00:01:31,200 --> 00:01:36,630 So in pictures what we're saying is that we can completely collapse away all 23 00:01:36,630 --> 00:01:43,620 of these model parameters and just look at iteravely resampling our word assignment 24 00:01:43,620 --> 00:01:48,500 variables for every word in the document and then every word in the corpus. 25 00:01:50,070 --> 00:01:54,980 But there is a caveat here, and that's the fact that we now have to actually 26 00:01:54,980 --> 00:02:01,660 sequentially resample our words given all the other words in the corpus. 27 00:02:03,020 --> 00:02:06,600 And we didn't discuss this previously but I want to highlight it now. 28 00:02:06,600 --> 00:02:10,290 If you look at the uncollapsed sampler, our standard implementation of Gibbs 29 00:02:10,290 --> 00:02:15,210 sampling for LDA, when we looked at a given document, we ended 30 00:02:15,210 --> 00:02:20,560 up with a data parallel problem for resampling the word indicator variables. 31 00:02:20,560 --> 00:02:25,620 You could do each word indicator variable completely in parallel. 32 00:02:25,620 --> 00:02:31,000 All you needed to do was condition on the topic vocabulary distributions and 33 00:02:31,000 --> 00:02:36,210 the document specific topic proportions and then everything decoupled. 34 00:02:36,210 --> 00:02:40,640 There was no dependence on the assignments to other words in the document or 35 00:02:40,640 --> 00:02:41,830 in the corpus. 36 00:02:41,830 --> 00:02:45,370 But now, when we don't have these model parameters, 37 00:02:45,370 --> 00:02:49,250 what informs us of an assignment of a given word to a given topic? 38 00:02:50,380 --> 00:02:54,570 Well the other words and the other assignments that were made to those words. 39 00:02:54,570 --> 00:02:58,370 So we can think of all those other assignment variables in 40 00:02:58,370 --> 00:03:02,020 the corpus as a surrogate for the model parameters. 41 00:03:03,490 --> 00:03:06,480 And we'll discuss this in more detail in the coming slides. 42 00:03:06,480 --> 00:03:10,750 But the take home message here is that we never have to sample our 43 00:03:10,750 --> 00:03:15,420 topic vocabulary distributions or the document specific topic proportions. 44 00:03:15,420 --> 00:03:17,910 We just sample these word indicator variables. 45 00:03:17,910 --> 00:03:23,637 But we do so sequentially losing the ability to paralyze across that operation. 46 00:03:23,637 --> 00:03:27,839 [MUSIC]