1
00:00:00,000 --> 00:00:00,951
[MUSIC]

2
00:00:00,951 --> 00:00:08,916
In this video,
we will derive the formulas for the model.

3
00:00:08,916 --> 00:00:13,419
That is we want to try to train the model
by finding the optimal values of phi.

4
00:00:13,419 --> 00:00:18,281
We'll do this by
maximizing the likelihood.

5
00:00:18,281 --> 00:00:23,480
So we would try to have the probability
of our data, given the matrix phi.

6
00:00:23,480 --> 00:00:30,326
And to maximize it with respect to phi.

7
00:00:30,326 --> 00:00:35,695
We'll try to apply the variational
Algorithm for this purpose.

8
00:00:35,695 --> 00:00:37,927
So in variatiional EL we have two steps.

9
00:00:43,393 --> 00:00:47,383
We'll try to minimize
the counter versions,

10
00:00:47,383 --> 00:00:51,066
between the variation distribution, and

11
00:00:51,066 --> 00:00:56,190
this case we try to find
the solution in the following form.

12
00:00:56,190 --> 00:01:03,823
It would be the distribution over
theta times the distribution over Z.

13
00:01:03,823 --> 00:01:09,040
Between these and
the probability of theta and

14
00:01:09,040 --> 00:01:15,205
Z, the posterior probability
on them given the data.

15
00:01:15,205 --> 00:01:19,081
And minimize it.

16
00:01:19,081 --> 00:01:24,227
We'll just get back to theta,
with Q of theta and Q of Z.

17
00:01:24,227 --> 00:01:29,496
All right, this is an E-step and
an M-step.

18
00:01:34,630 --> 00:01:40,339
We maximize the expected value

19
00:01:43,211 --> 00:01:45,550
Of the logarithm,

20
00:01:47,210 --> 00:01:52,367
Of the joint probability of theta z and w,

21
00:01:56,921 --> 00:02:01,085
Maximize it with respect to theta,
with respect to phi.

22
00:02:03,165 --> 00:02:09,588
So our plan for now is to draw
the formulas for the E step,

23
00:02:09,588 --> 00:02:14,796
that is we want to find the q of theta and
q of z.

24
00:02:14,796 --> 00:02:16,787
Let's start with q of theta.

25
00:02:21,493 --> 00:02:24,504
All right, so let's start with theta.

26
00:02:24,504 --> 00:02:29,559
The formulas for theta are as follows.

27
00:02:29,559 --> 00:02:30,493
So the log of q of theta.

28
00:02:30,493 --> 00:02:37,221
Those are formulas from
the means field approximation.

29
00:02:37,221 --> 00:02:44,048
Equals to the expected value with
respect to, all variables except for

30
00:02:44,048 --> 00:02:49,051
theta and the only variable
that we have left is z, so

31
00:02:49,051 --> 00:02:54,659
with respect to q of z,
the logarithm of this distribution

32
00:02:57,069 --> 00:03:02,882
Log of p of theta z
given w + some constant.

33
00:03:06,853 --> 00:03:09,724
So we actually don't know this term.

34
00:03:09,724 --> 00:03:14,996
However, we can rewrite
it using Bay's formula.

35
00:03:14,996 --> 00:03:20,894
So it would be equal to the ratio
between the joint probability,

36
00:03:20,894 --> 00:03:24,281
actually this isn't base formula,

37
00:03:24,281 --> 00:03:29,527
it is just the definition of
the conditional probability

38
00:03:29,527 --> 00:03:33,916
over the probabiilty of W,
from this we have W.

39
00:03:33,916 --> 00:03:39,151
So this term the denominator
does not depend

40
00:03:39,151 --> 00:03:43,967
on Q of theta, so it's just a constant.

41
00:03:43,967 --> 00:03:49,540
And so we can rewrite it as expectation.

42
00:03:49,540 --> 00:03:54,551
Of q of c, the logarithm of the joined

43
00:03:54,551 --> 00:04:01,678
distribution over the theta c and
w plus a constant.

44
00:04:01,678 --> 00:04:08,959
All right, so we are estimating
a distribution over theta so

45
00:04:08,959 --> 00:04:15,257
before we plug in this value
from this huge formula.

46
00:04:15,257 --> 00:04:20,082
Let's see which terms are constant
with respect to theta.

47
00:04:20,082 --> 00:04:22,441
So this term depends on theta.

48
00:04:22,441 --> 00:04:24,746
We have theta here.

49
00:04:24,746 --> 00:04:25,433
We have theta here.

50
00:04:25,433 --> 00:04:29,249
However in this one we don't have theta.

51
00:04:29,249 --> 00:04:31,259
So this is actually a constant.

52
00:04:35,883 --> 00:04:40,350
All right, so,
actually we still have a lot of terms.

53
00:04:40,350 --> 00:04:44,369
I'll try and, Solve

54
00:04:44,369 --> 00:04:49,179
expectation with respect to q of z.

55
00:04:54,511 --> 00:04:58,214
And now we write out these terms.

56
00:04:58,214 --> 00:05:05,480
So it is the sum over all documents.

57
00:05:05,480 --> 00:05:11,886
Then we have bracket, sum over all topics.

58
00:05:14,765 --> 00:05:20,057
Alpha (t-1) logarithm of theta

59
00:05:20,057 --> 00:05:25,001
dt + the sum over hours n topics.

60
00:05:25,001 --> 00:05:29,371
So, this would be n=1 to Nd,

61
00:05:29,371 --> 00:05:34,084
number of words in the document.

62
00:05:34,084 --> 00:05:39,810
Sum over all topics that can
be assigned to each word,

63
00:05:39,810 --> 00:05:45,547
indicator that the written
variable has the value t.

64
00:05:45,547 --> 00:05:53,550
So in this case we assign the topic
t to the word n of the document d.

65
00:05:53,550 --> 00:05:59,450
Times a logarithm of theta dt as well,
I already have it here.

66
00:05:59,450 --> 00:06:04,367
And we have to close this bracket.

67
00:06:05,470 --> 00:06:09,875
All right, we have an expectation and

68
00:06:09,875 --> 00:06:14,012
we can take it under the summation.

69
00:06:14,012 --> 00:06:17,458
So this term does not depend on z.

70
00:06:17,458 --> 00:06:22,237
And here only dependence of z is here.

71
00:06:22,237 --> 00:06:27,325
So it will be the sum over all documents.

72
00:06:27,325 --> 00:06:31,610
This term will not change.

73
00:06:31,610 --> 00:06:37,652
So the sum for t,
from 1 to t over all topics,

74
00:06:37,652 --> 00:06:43,853
alpha t minus 1,
logarithm of theta dt plus,

75
00:06:43,853 --> 00:06:48,941
then we take the expectation here, so

76
00:06:48,941 --> 00:06:57,541
this will be the sum over our hours
[SOUND] the sum over all topics.

77
00:06:57,541 --> 00:07:02,505
Expectation with respect to q

78
00:07:02,505 --> 00:07:06,681
of zdn of the indicator.

79
00:07:11,121 --> 00:07:17,872
And finally,
times the logarithm of theta dt.

80
00:07:17,872 --> 00:07:19,863
[SOUND] All right.

81
00:07:19,863 --> 00:07:24,977
So, let's note this value,

82
00:07:24,977 --> 00:07:29,690
the expectation as gamma dn

83
00:07:32,692 --> 00:07:38,102
Now we can group the terms
that have the value of

84
00:07:38,102 --> 00:07:42,936
of theta dt, and
get the following expression.

85
00:07:47,062 --> 00:07:49,439
We'll have a sum over all documents.

86
00:07:54,032 --> 00:07:58,202
Sum over all topics.

87
00:08:03,032 --> 00:08:09,710
We'll have alpha t- 1 from this term.

88
00:08:09,710 --> 00:08:17,534
We'll have the sum over
all hours of the gamma dn.

89
00:08:21,644 --> 00:08:27,305
So summation gamma dn, and

90
00:08:27,305 --> 00:08:35,925
finally times the logarithm of theta d t.

91
00:08:35,925 --> 00:08:44,126
And actually we should add
a constant here, here and here.

92
00:08:44,126 --> 00:08:49,016
All right, so what is

93
00:08:49,016 --> 00:08:53,914
this distribution?

94
00:08:53,914 --> 00:08:55,510
Do you recognize the form of it.

95
00:08:55,510 --> 00:09:01,197
Actually this is a [INAUDIBLE]
distribution again.

96
00:09:01,197 --> 00:09:07,364
So if we take the exponent of this
function to get the q of theta

97
00:09:11,555 --> 00:09:15,534
we will see that it equals to

98
00:09:15,534 --> 00:09:20,378
the product over all documents,

99
00:09:20,378 --> 00:09:25,395
[SOUND], product over all topics,

100
00:09:25,395 --> 00:09:30,585
theta dt, the power would be in this

101
00:09:30,585 --> 00:09:36,124
term alpha t plus sum of gamma d n-1.

102
00:09:36,124 --> 00:09:41,327
And times some constant.

103
00:09:41,327 --> 00:09:45,628
So we can write the proportional
sign instead of equality.

104
00:09:45,628 --> 00:09:52,187
So actually this is
a [INAUDIBLE] distribution.

105
00:09:52,187 --> 00:09:58,066
We can write down [INAUDIBLE] the q
of theta equals to the product

106
00:09:58,066 --> 00:10:03,154
over all documents,
The distribution over theta d,

107
00:10:03,154 --> 00:10:07,241
and since that is a stanard distribution.

108
00:10:07,241 --> 00:10:11,754
So you recognise,
you should recognise the form here.

109
00:10:11,754 --> 00:10:19,120
Q of theta d would be a distribution.

110
00:10:19,120 --> 00:10:22,306
With the following parameters.

111
00:10:22,306 --> 00:10:26,513
So we should add one to the power and

112
00:10:26,513 --> 00:10:30,584
we will have the following form.

113
00:10:30,584 --> 00:10:32,718
So this would be vector alpha.

114
00:10:32,718 --> 00:10:38,451
Also vector of summations of gamma dn.

115
00:10:38,451 --> 00:10:43,832
So as a parameter of this distribution,

116
00:10:43,832 --> 00:10:50,014
we sum up the vector alpha and
the vector sum of gamma d n.

117
00:10:50,014 --> 00:10:58,210
So gamma d n itself depends on t
since here we had a value of t.

118
00:10:58,210 --> 00:11:05,370
Alright so we derive the update
formula for q of theta.

119
00:11:05,370 --> 00:11:08,219
Also know that here we use
the value of gamma dn.

120
00:11:08,219 --> 00:11:13,027
And the gamma dn can be approximated

121
00:11:13,027 --> 00:11:17,839
only using the distribution q of z.

122
00:11:17,839 --> 00:11:21,681
So before we have the training algorithm,

123
00:11:21,681 --> 00:11:26,694
we have to derive the value,
the logarithm of q of z, and

124
00:11:26,694 --> 00:11:34,080
also we'll have to compute the expectation
of this distribution of zdn equals t.

125
00:11:34,080 --> 00:11:44,080
[MUSIC].