[MUSIC] All right, let's see the mean field approximation. Here is how it works. We select a family of distributions Q as variational inference. So we select a family of distributions in the following way. It is a set of all distributions that are factorized over the dimension itself, the latent variables. So, this will be a product of qi, the distribution of the ith latent variable. And then we do an optimization by minimizing the KL diversions between the variational distribution and the full posterior. So here is an example, here we have a distribution over two random variables. The true posterior is the star and the factorized one would be the q1(z1) q2(z2). I mentioned that the true posterior is a normal distribution with some covariance matox sigma and the mean 0. Then the factorized distribution would be also normal, but it would have a diagonal convergence matrix. If this was our true posterior, then the approximation would be like this. So not here that we don't have the diagonal of the diagonal elements of the current matrix in the red Gaussian. So optimization works as follows. We'll start with some distribution q, and we'll perform a coordinate descend with respect to different elements of q. And at first side we'll optimize with respect to q1, we'll get a new distribution, then on the second set we optimize with respect to q2, and so on. This will be loop and we'll optimize in two conversions. Let's now drive the formulas for update and each step. Our plan for now is to draw the formulas for the updates of the mean field approximation. So, here's our step of coordinate descend. We are trying to minimize the KL conversions with respect to one dimension, qk. Let's write down the values of KL diversions. So this would be an integral of the product of our all dimensions, i from 1 to d, qi times the logarithm of the ratio, so logarithm. And again, the product for all dimensions, q i over p*. And we integrate it over all z. We can take out the products from the logarithm and it will be the sum of the logarithms. And we can also take the denominator s and as a separate integral. Now we'll have, sum for i from 1 to d, integral of the product. Again, let's write it down as j, so that we'll have separate indices. So j for 1 to d, q j logarithm of q i, d z, minus the integral of the denominator. So again, it would be integral, product over j from 1 to d, qj log p*. [SOUND] Easy. All right, we're interested only in the component qk. So let's separate all this summation into two terms. One with i = k and all others. So this would be equal to Integral j from 1 to d qj log qk. This is the term that we're particularly interested in. d z and plus sum over all other components, so it would be sum of i not equal to k, and the same integral. So it would be the project of j from 1 to d, qj log qi dz and minus this term, product for j from 1 to d, qj log of p*dz. So let's find out which terms are constants with respect to qk. Let's start by rewriting the first term. So it actually equals to the integral of qk times the logarithm of qk and also we can eventually integrate all other dimensions. So here is an integral of the product over j not equal to k, qj dz, let me write it down like, not equal to j. So those are all variables except for the k, short k here. And finally, we integrate over zk. So this term actually equals to 1, since q is a distribution and the integral of the distribution actually equals to 1. So this equals to 1, and finally we get the integral of qk logarithm of qk dzk. All right, so term surely depends on qk. However, these terms will have a similar form, so those would be equal to the integral of qj times the logarithm of qj dzj. And since these don't depend on qk, we can say that these are just constants. Those are just constant, so constant. All right, so here's our distribution again, let's continue deriving the formula This would be equal to an integral of qk times the logarithm of qk dzk minus this integral. Let me separate the dimension number k out from the integral. So this would be integral of qk, here I will derive the integral over all other dimensions. So product of over j not equal to k, qj times logarithm of p* d, so here we integrate over oj is not equal to k, so we come right down again is z not equal to k, and we have to close the brackets. So here d z and I guess that's it. All right, let's now put into one integral. So we can group these as an integral of qk times the difference between this term and this term. So, we'll have an integral, qk times the following difference, logarithm of qk minus integral, The product for jl equal to k, qj log p*, and dz not equal to k, and finally, we close this bracket and integrate over the last dimension, zk. All right, what is this term? This term actually equals to the expectation of logarithmic p* over all dimensions except for the k. So we have this term to be equal to the expected value for q except for the k, so let me write it down like this. The logarithm of p*. So let's write it down as some function of the fk. For example, h(zk). So actually, we can turn it out, we can modify it a little bit and get a distribution. For this, we'll just have to renormalize the exponent of. So, let's have some new distribution that equals to exponent of h(zk), and had to renormalized it. So how to integrate over zk, e to the h(zk) prime, dzk. This will be our new distribution, let's call it t. Actually, I should write here + const. And here also has some constant. Now, we can notice that it actually equals to the KL diversions, between distributions qk and t. So this will be an integral of qk, the lower end of the ratio between qk and t, dzk, plus some new constant since we don't have here the denominator. So, plus some new constant. And we try to minimize it. So, again this is a KL divergence between qk and t + const. So we just have to minimize the KL diversion. And so we already know what the answer is, we have to take the qk equal to the distribution t as written here. So, qk goes through t, the better way to write it down is as follows. So actually while we say is that those two terms are equal, we say that this term equals to this term. And so we have the log qk = h(zk), there is the expectation over all variable except for the k, so expectation over q without k log p* + const. And so this is our final formula. We'll see how to use it in the next video. [MUSIC]