[MUSIC] In this video, we will derive the formulas for the model. That is we want to try to train the model by finding the optimal values of phi. We'll do this by maximizing the likelihood. So we would try to have the probability of our data, given the matrix phi. And to maximize it with respect to phi. We'll try to apply the variational Algorithm for this purpose. So in variatiional EL we have two steps. We'll try to minimize the counter versions, between the variation distribution, and this case we try to find the solution in the following form. It would be the distribution over theta times the distribution over Z. Between these and the probability of theta and Z, the posterior probability on them given the data. And minimize it. We'll just get back to theta, with Q of theta and Q of Z. All right, this is an E-step and an M-step. We maximize the expected value Of the logarithm, Of the joint probability of theta z and w, Maximize it with respect to theta, with respect to phi. So our plan for now is to draw the formulas for the E step, that is we want to find the q of theta and q of z. Let's start with q of theta. All right, so let's start with theta. The formulas for theta are as follows. So the log of q of theta. Those are formulas from the means field approximation. Equals to the expected value with respect to, all variables except for theta and the only variable that we have left is z, so with respect to q of z, the logarithm of this distribution Log of p of theta z given w + some constant. So we actually don't know this term. However, we can rewrite it using Bay's formula. So it would be equal to the ratio between the joint probability, actually this isn't base formula, it is just the definition of the conditional probability over the probabiilty of W, from this we have W. So this term the denominator does not depend on Q of theta, so it's just a constant. And so we can rewrite it as expectation. Of q of c, the logarithm of the joined distribution over the theta c and w plus a constant. All right, so we are estimating a distribution over theta so before we plug in this value from this huge formula. Let's see which terms are constant with respect to theta. So this term depends on theta. We have theta here. We have theta here. However in this one we don't have theta. So this is actually a constant. All right, so, actually we still have a lot of terms. I'll try and, Solve expectation with respect to q of z. And now we write out these terms. So it is the sum over all documents. Then we have bracket, sum over all topics. Alpha (t-1) logarithm of theta dt + the sum over hours n topics. So, this would be n=1 to Nd, number of words in the document. Sum over all topics that can be assigned to each word, indicator that the written variable has the value t. So in this case we assign the topic t to the word n of the document d. Times a logarithm of theta dt as well, I already have it here. And we have to close this bracket. All right, we have an expectation and we can take it under the summation. So this term does not depend on z. And here only dependence of z is here. So it will be the sum over all documents. This term will not change. So the sum for t, from 1 to t over all topics, alpha t minus 1, logarithm of theta dt plus, then we take the expectation here, so this will be the sum over our hours [SOUND] the sum over all topics. Expectation with respect to q of zdn of the indicator. And finally, times the logarithm of theta dt. [SOUND] All right. So, let's note this value, the expectation as gamma dn Now we can group the terms that have the value of of theta dt, and get the following expression. We'll have a sum over all documents. Sum over all topics. We'll have alpha t- 1 from this term. We'll have the sum over all hours of the gamma dn. So summation gamma dn, and finally times the logarithm of theta d t. And actually we should add a constant here, here and here. All right, so what is this distribution? Do you recognize the form of it. Actually this is a [INAUDIBLE] distribution again. So if we take the exponent of this function to get the q of theta we will see that it equals to the product over all documents, [SOUND], product over all topics, theta dt, the power would be in this term alpha t plus sum of gamma d n-1. And times some constant. So we can write the proportional sign instead of equality. So actually this is a [INAUDIBLE] distribution. We can write down [INAUDIBLE] the q of theta equals to the product over all documents, The distribution over theta d, and since that is a stanard distribution. So you recognise, you should recognise the form here. Q of theta d would be a distribution. With the following parameters. So we should add one to the power and we will have the following form. So this would be vector alpha. Also vector of summations of gamma dn. So as a parameter of this distribution, we sum up the vector alpha and the vector sum of gamma d n. So gamma d n itself depends on t since here we had a value of t. Alright so we derive the update formula for q of theta. Also know that here we use the value of gamma dn. And the gamma dn can be approximated only using the distribution q of z. So before we have the training algorithm, we have to derive the value, the logarithm of q of z, and also we'll have to compute the expectation of this distribution of zdn equals t. [MUSIC].