1
00:00:00,000 --> 00:00:05,041
[MUSIC]

2
00:00:05,041 --> 00:00:07,656
So this is a standard implementation
of a Gibbs sampler, but

3
00:00:07,656 --> 00:00:10,450
we can actually do things that
are a little bit fancier.

4
00:00:10,450 --> 00:00:13,060
For example, we can do something
called collapsed Gibbs sampling.

5
00:00:14,140 --> 00:00:17,390
And we're going to describe this
in the context of our LDA model.

6
00:00:17,390 --> 00:00:22,870
And the idea here is the fact that we
can actually analytically marginalize

7
00:00:22,870 --> 00:00:27,720
over all of the uncertainty
in our model parameters and

8
00:00:27,720 --> 00:00:31,090
just sample the word assignment variables.

9
00:00:31,090 --> 00:00:35,510
So we never have to sample our
topic vocabulary distributions and

10
00:00:35,510 --> 00:00:38,230
all the document specific proportions.

11
00:00:38,230 --> 00:00:42,380
We just go through and iterate,
sampling these word assignment variables.

12
00:00:42,380 --> 00:00:45,790
And that seems pretty cool because
what we've done is we've dramatically,

13
00:00:45,790 --> 00:00:50,780
dramatically reduced the space in which
we're exploring in this Gibbs sampler.

14
00:00:50,780 --> 00:00:55,990
And so this can actually lead to
much better performance in practice

15
00:00:55,990 --> 00:01:01,730
because remember, we have very massive
vocabularies in general, and so

16
00:01:01,730 --> 00:01:06,450
we have this distribution over the entire
vocabulary for each one of our topics.

17
00:01:06,450 --> 00:01:09,280
So that's a lot of parameters
to think about learning.

18
00:01:09,280 --> 00:01:13,600
And likewise, for every document
in our corpus, we have a set of

19
00:01:13,600 --> 00:01:18,750
probabilities over topics being
present in that document.

20
00:01:18,750 --> 00:01:24,580
So being able to collapse these things
away, these model parameters away, and

21
00:01:24,580 --> 00:01:29,750
just look at these assignment variables
and the documents can be quite helpful.

22
00:01:31,200 --> 00:01:36,630
So in pictures what we're saying is
that we can completely collapse away all

23
00:01:36,630 --> 00:01:43,620
of these model parameters and just look at
iteravely resampling our word assignment

24
00:01:43,620 --> 00:01:48,500
variables for every word in the document
and then every word in the corpus.

25
00:01:50,070 --> 00:01:54,980
But there is a caveat here, and that's
the fact that we now have to actually

26
00:01:54,980 --> 00:02:01,660
sequentially resample our words given
all the other words in the corpus.

27
00:02:03,020 --> 00:02:06,600
And we didn't discuss this previously but
I want to highlight it now.

28
00:02:06,600 --> 00:02:10,290
If you look at the uncollapsed sampler,
our standard implementation of Gibbs

29
00:02:10,290 --> 00:02:15,210
sampling for LDA, when we looked
at a given document, we ended

30
00:02:15,210 --> 00:02:20,560
up with a data parallel problem for
resampling the word indicator variables.

31
00:02:20,560 --> 00:02:25,620
You could do each word indicator
variable completely in parallel.

32
00:02:25,620 --> 00:02:31,000
All you needed to do was condition on
the topic vocabulary distributions and

33
00:02:31,000 --> 00:02:36,210
the document specific topic proportions
and then everything decoupled.

34
00:02:36,210 --> 00:02:40,640
There was no dependence on the assignments
to other words in the document or

35
00:02:40,640 --> 00:02:41,830
in the corpus.

36
00:02:41,830 --> 00:02:45,370
But now,
when we don't have these model parameters,

37
00:02:45,370 --> 00:02:49,250
what informs us of an assignment
of a given word to a given topic?

38
00:02:50,380 --> 00:02:54,570
Well the other words and the other
assignments that were made to those words.

39
00:02:54,570 --> 00:02:58,370
So we can think of all those
other assignment variables in

40
00:02:58,370 --> 00:03:02,020
the corpus as a surrogate for
the model parameters.

41
00:03:03,490 --> 00:03:06,480
And we'll discuss this in more
detail in the coming slides.

42
00:03:06,480 --> 00:03:10,750
But the take home message here is
that we never have to sample our

43
00:03:10,750 --> 00:03:15,420
topic vocabulary distributions or
the document specific topic proportions.

44
00:03:15,420 --> 00:03:17,910
We just sample these
word indicator variables.

45
00:03:17,910 --> 00:03:23,637
But we do so sequentially losing the
ability to paralyze across that operation.

46
00:03:23,637 --> 00:03:27,839
[MUSIC]