1
00:00:03,570 --> 00:00:05,730
In a previous video,

2
00:00:05,730 --> 00:00:09,760
we derived the formulas for updating the Q of theta.

3
00:00:09,760 --> 00:00:15,330
In this video, we'll derive the formulas for updating Q of Z.

4
00:00:15,330 --> 00:00:17,650
So, I'll remind you that we're doing

5
00:00:17,650 --> 00:00:22,000
the variational exploitation organization algorithm. And then E-step.

6
00:00:22,000 --> 00:00:24,955
When you try to minimize the KL divergence

7
00:00:24,955 --> 00:00:28,600
between the posterior distribution over the latent variables given

8
00:00:28,600 --> 00:00:37,040
the data and all our ration distribution that we are searching in the fit for S for.

9
00:00:37,040 --> 00:00:43,580
And we try to apply the formula that we derived for the new field approximation.

10
00:00:43,580 --> 00:00:45,405
All right. Q of Z.

11
00:00:45,405 --> 00:00:51,045
So the formula would be again the logarithm of Q of

12
00:00:51,045 --> 00:00:59,835
Z equals to the expectation over all variables except for the Z and then there is theta.

13
00:00:59,835 --> 00:01:02,465
So, Q of theta.

14
00:01:02,465 --> 00:01:07,380
Logarithm of the journal distribution.

15
00:01:09,850 --> 00:01:18,860
Probability of theta, Z and W plus some constant.

16
00:01:18,860 --> 00:01:24,475
All right.

17
00:01:24,475 --> 00:01:29,765
Now we have to plug in the formula from above into this function.

18
00:01:29,765 --> 00:01:34,020
Well, before we do this let's see which terms are constants.

19
00:01:34,020 --> 00:01:37,175
We are trying to estimate the distribution over Z.

20
00:01:37,175 --> 00:01:40,540
So, let's see which terms do not depend on Z.

21
00:01:40,540 --> 00:01:47,010
So, the first term does not depend on Z so we can say that it is constant.

22
00:01:48,790 --> 00:01:53,095
What about this term?

23
00:01:53,095 --> 00:01:57,340
So this term actually it depends on Z since we have it here.

24
00:01:57,340 --> 00:02:03,905
And those two, some of those would also depend on Z. All right.

25
00:02:03,905 --> 00:02:09,915
So let's rewrite this formula here and see what we get.

26
00:02:09,915 --> 00:02:15,215
So, it would be that expectation and with respect to Q of theta.

27
00:02:15,215 --> 00:02:20,610
Sum over all documents.

28
00:02:20,610 --> 00:02:28,490
Now we have also sum over all words.

29
00:02:28,490 --> 00:02:34,090
And from one to D-N. And also the sum over all topics.

30
00:02:34,090 --> 00:02:39,447
Sum over T from one to capital-T. All right,

31
00:02:39,447 --> 00:02:44,275
the indicator that ZDN equals to

32
00:02:44,275 --> 00:02:53,325
T. And now we have the logarithm of theta DT,

33
00:02:53,325 --> 00:02:58,370
plus the logarithm of

34
00:02:58,370 --> 00:03:08,850
FI TWDN, plus non-constant.

35
00:03:08,850 --> 00:03:16,030
Okay. Now let's take the expectation and put it under the summation.

36
00:03:16,030 --> 00:03:19,565
So expectation is taken with respect to theta.

37
00:03:19,565 --> 00:03:22,075
This term doesn't depend on theta,

38
00:03:22,075 --> 00:03:23,535
this doesn't depend either.

39
00:03:23,535 --> 00:03:28,690
So we can take the expectation here.

40
00:03:28,690 --> 00:03:34,130
All right. So, now I write sum over three variables,

41
00:03:34,130 --> 00:03:39,470
sum over D, over documents, sum over words.

42
00:03:39,470 --> 00:03:44,487
And some over topics.

43
00:03:44,487 --> 00:03:52,310
The indicator, that does not depend on theta so we put the expectation further.

44
00:03:52,310 --> 00:03:54,670
The expectation of the logarithm

45
00:03:54,670 --> 00:04:04,860
of theta DT.

46
00:04:04,860 --> 00:04:11,945
Plus the logarithm of FI TWDN.

47
00:04:11,945 --> 00:04:15,870
And again plus non-constant.

48
00:04:15,870 --> 00:04:23,035
All right. This is the logarithm of the distribution over Z.

49
00:04:23,035 --> 00:04:26,670
Let us take the exponent of the left hand side and

50
00:04:26,670 --> 00:04:31,695
the right hand side and we'll have the Q of Z.

51
00:04:31,695 --> 00:04:34,470
Actually equals to the products since we take

52
00:04:34,470 --> 00:04:37,965
the exponent the summations become products.

53
00:04:37,965 --> 00:04:49,045
D from 1 to D. Product of N from one to ND.

54
00:04:49,045 --> 00:04:52,440
Now, here is the trick. We know that Z,

55
00:04:52,440 --> 00:04:58,865
sum of 21 over T. Since we assigned only one topic for a word.

56
00:04:58,865 --> 00:05:05,334
And so since here we have summation and here we have the extra distribution over ZDN.

57
00:05:05,334 --> 00:05:07,620
We can see that we can write down

58
00:05:07,620 --> 00:05:12,805
the distribution Q of Z as a product of independent distributions.

59
00:05:12,805 --> 00:05:18,565
So it would be Q of ZDN.

60
00:05:18,565 --> 00:05:26,695
And Q of ZDN can be derived using this term. Let's do it.

61
00:05:26,695 --> 00:05:31,455
The probability is that ZDN equals to

62
00:05:31,455 --> 00:05:37,130
T would be proportional to the exponent of this term.

63
00:05:37,130 --> 00:05:40,055
So we can write down it as follows.

64
00:05:40,055 --> 00:05:47,136
So, it is FI TWDN,

65
00:05:47,136 --> 00:05:52,970
times the exponent of the expectation.

66
00:05:52,970 --> 00:06:04,340
Here we can write

67
00:06:04,340 --> 00:06:09,650
down the expectation only with respect to theta DT since the thing

68
00:06:09,650 --> 00:06:15,445
that we're going expectation of depends only on theta DT.

69
00:06:15,445 --> 00:06:26,545
So, Q of theta DT of the logarithm of theta DT.

70
00:06:26,545 --> 00:06:29,495
What would be the memorization constant?

71
00:06:29,495 --> 00:06:32,060
So actually this thing should sum up to one.

72
00:06:32,060 --> 00:06:37,100
So we can compute the summation over the numerator with respect to

73
00:06:37,100 --> 00:06:43,505
all possible values of T. And they're only capital-T possible ways,

74
00:06:43,505 --> 00:06:45,800
possible values of T. So,

75
00:06:45,800 --> 00:06:52,680
this would be t from 1 to capital-T. Of the same thing.

76
00:06:52,680 --> 00:06:55,245
So that down is prime here.

77
00:06:55,245 --> 00:07:00,260
FI T prime WDN.

78
00:07:03,920 --> 00:07:13,205
Exponent of expectation of Qdd.

79
00:07:13,205 --> 00:07:20,360
Logarithm of Qdt.

80
00:07:20,360 --> 00:07:26,620
Also know that this thing equals to gamma DN at position t.

81
00:07:26,620 --> 00:07:34,930
You can see here the definition.

82
00:07:34,930 --> 00:07:39,330
All right so now we know that this formula's for Q of Z,

83
00:07:39,330 --> 00:07:47,710
we also know the values of gamma DN and so we can iterate the updates of theta and Z.

84
00:07:47,710 --> 00:07:52,225
So, we start up for example with some rather than citation.

85
00:07:52,225 --> 00:07:55,930
We try to update Q of theta.

86
00:07:55,930 --> 00:07:57,460
We update it with this formula.

87
00:07:57,460 --> 00:08:02,310
Where gamma are computed on the previous step.

88
00:08:02,310 --> 00:08:06,845
So for this case those are the initial values of gamma.

89
00:08:06,845 --> 00:08:13,355
Then we go through this step where we update Q of Z and we update it with this formula.

90
00:08:13,355 --> 00:08:18,160
Here we need to compute the expected values logarithm.

91
00:08:18,160 --> 00:08:23,230
You can see the value of this function on the Wikipedia.

92
00:08:23,230 --> 00:08:28,220
So, this would be the expected failure of the stated distribution.

93
00:08:28,220 --> 00:08:32,195
So, we completed the formulas for the E-step and in

94
00:08:32,195 --> 00:08:37,910
the next video we'll derive the updates formulas for the M-step and the prediction.