1 00:00:03,570 --> 00:00:05,730 In a previous video, 2 00:00:05,730 --> 00:00:09,760 we derived the formulas for updating the Q of theta. 3 00:00:09,760 --> 00:00:15,330 In this video, we'll derive the formulas for updating Q of Z. 4 00:00:15,330 --> 00:00:17,650 So, I'll remind you that we're doing 5 00:00:17,650 --> 00:00:22,000 the variational exploitation organization algorithm. And then E-step. 6 00:00:22,000 --> 00:00:24,955 When you try to minimize the KL divergence 7 00:00:24,955 --> 00:00:28,600 between the posterior distribution over the latent variables given 8 00:00:28,600 --> 00:00:37,040 the data and all our ration distribution that we are searching in the fit for S for. 9 00:00:37,040 --> 00:00:43,580 And we try to apply the formula that we derived for the new field approximation. 10 00:00:43,580 --> 00:00:45,405 All right. Q of Z. 11 00:00:45,405 --> 00:00:51,045 So the formula would be again the logarithm of Q of 12 00:00:51,045 --> 00:00:59,835 Z equals to the expectation over all variables except for the Z and then there is theta. 13 00:00:59,835 --> 00:01:02,465 So, Q of theta. 14 00:01:02,465 --> 00:01:07,380 Logarithm of the journal distribution. 15 00:01:09,850 --> 00:01:18,860 Probability of theta, Z and W plus some constant. 16 00:01:18,860 --> 00:01:24,475 All right. 17 00:01:24,475 --> 00:01:29,765 Now we have to plug in the formula from above into this function. 18 00:01:29,765 --> 00:01:34,020 Well, before we do this let's see which terms are constants. 19 00:01:34,020 --> 00:01:37,175 We are trying to estimate the distribution over Z. 20 00:01:37,175 --> 00:01:40,540 So, let's see which terms do not depend on Z. 21 00:01:40,540 --> 00:01:47,010 So, the first term does not depend on Z so we can say that it is constant. 22 00:01:48,790 --> 00:01:53,095 What about this term? 23 00:01:53,095 --> 00:01:57,340 So this term actually it depends on Z since we have it here. 24 00:01:57,340 --> 00:02:03,905 And those two, some of those would also depend on Z. All right. 25 00:02:03,905 --> 00:02:09,915 So let's rewrite this formula here and see what we get. 26 00:02:09,915 --> 00:02:15,215 So, it would be that expectation and with respect to Q of theta. 27 00:02:15,215 --> 00:02:20,610 Sum over all documents. 28 00:02:20,610 --> 00:02:28,490 Now we have also sum over all words. 29 00:02:28,490 --> 00:02:34,090 And from one to D-N. And also the sum over all topics. 30 00:02:34,090 --> 00:02:39,447 Sum over T from one to capital-T. All right, 31 00:02:39,447 --> 00:02:44,275 the indicator that ZDN equals to 32 00:02:44,275 --> 00:02:53,325 T. And now we have the logarithm of theta DT, 33 00:02:53,325 --> 00:02:58,370 plus the logarithm of 34 00:02:58,370 --> 00:03:08,850 FI TWDN, plus non-constant. 35 00:03:08,850 --> 00:03:16,030 Okay. Now let's take the expectation and put it under the summation. 36 00:03:16,030 --> 00:03:19,565 So expectation is taken with respect to theta. 37 00:03:19,565 --> 00:03:22,075 This term doesn't depend on theta, 38 00:03:22,075 --> 00:03:23,535 this doesn't depend either. 39 00:03:23,535 --> 00:03:28,690 So we can take the expectation here. 40 00:03:28,690 --> 00:03:34,130 All right. So, now I write sum over three variables, 41 00:03:34,130 --> 00:03:39,470 sum over D, over documents, sum over words. 42 00:03:39,470 --> 00:03:44,487 And some over topics. 43 00:03:44,487 --> 00:03:52,310 The indicator, that does not depend on theta so we put the expectation further. 44 00:03:52,310 --> 00:03:54,670 The expectation of the logarithm 45 00:03:54,670 --> 00:04:04,860 of theta DT. 46 00:04:04,860 --> 00:04:11,945 Plus the logarithm of FI TWDN. 47 00:04:11,945 --> 00:04:15,870 And again plus non-constant. 48 00:04:15,870 --> 00:04:23,035 All right. This is the logarithm of the distribution over Z. 49 00:04:23,035 --> 00:04:26,670 Let us take the exponent of the left hand side and 50 00:04:26,670 --> 00:04:31,695 the right hand side and we'll have the Q of Z. 51 00:04:31,695 --> 00:04:34,470 Actually equals to the products since we take 52 00:04:34,470 --> 00:04:37,965 the exponent the summations become products. 53 00:04:37,965 --> 00:04:49,045 D from 1 to D. Product of N from one to ND. 54 00:04:49,045 --> 00:04:52,440 Now, here is the trick. We know that Z, 55 00:04:52,440 --> 00:04:58,865 sum of 21 over T. Since we assigned only one topic for a word. 56 00:04:58,865 --> 00:05:05,334 And so since here we have summation and here we have the extra distribution over ZDN. 57 00:05:05,334 --> 00:05:07,620 We can see that we can write down 58 00:05:07,620 --> 00:05:12,805 the distribution Q of Z as a product of independent distributions. 59 00:05:12,805 --> 00:05:18,565 So it would be Q of ZDN. 60 00:05:18,565 --> 00:05:26,695 And Q of ZDN can be derived using this term. Let's do it. 61 00:05:26,695 --> 00:05:31,455 The probability is that ZDN equals to 62 00:05:31,455 --> 00:05:37,130 T would be proportional to the exponent of this term. 63 00:05:37,130 --> 00:05:40,055 So we can write down it as follows. 64 00:05:40,055 --> 00:05:47,136 So, it is FI TWDN, 65 00:05:47,136 --> 00:05:52,970 times the exponent of the expectation. 66 00:05:52,970 --> 00:06:04,340 Here we can write 67 00:06:04,340 --> 00:06:09,650 down the expectation only with respect to theta DT since the thing 68 00:06:09,650 --> 00:06:15,445 that we're going expectation of depends only on theta DT. 69 00:06:15,445 --> 00:06:26,545 So, Q of theta DT of the logarithm of theta DT. 70 00:06:26,545 --> 00:06:29,495 What would be the memorization constant? 71 00:06:29,495 --> 00:06:32,060 So actually this thing should sum up to one. 72 00:06:32,060 --> 00:06:37,100 So we can compute the summation over the numerator with respect to 73 00:06:37,100 --> 00:06:43,505 all possible values of T. And they're only capital-T possible ways, 74 00:06:43,505 --> 00:06:45,800 possible values of T. So, 75 00:06:45,800 --> 00:06:52,680 this would be t from 1 to capital-T. Of the same thing. 76 00:06:52,680 --> 00:06:55,245 So that down is prime here. 77 00:06:55,245 --> 00:07:00,260 FI T prime WDN. 78 00:07:03,920 --> 00:07:13,205 Exponent of expectation of Qdd. 79 00:07:13,205 --> 00:07:20,360 Logarithm of Qdt. 80 00:07:20,360 --> 00:07:26,620 Also know that this thing equals to gamma DN at position t. 81 00:07:26,620 --> 00:07:34,930 You can see here the definition. 82 00:07:34,930 --> 00:07:39,330 All right so now we know that this formula's for Q of Z, 83 00:07:39,330 --> 00:07:47,710 we also know the values of gamma DN and so we can iterate the updates of theta and Z. 84 00:07:47,710 --> 00:07:52,225 So, we start up for example with some rather than citation. 85 00:07:52,225 --> 00:07:55,930 We try to update Q of theta. 86 00:07:55,930 --> 00:07:57,460 We update it with this formula. 87 00:07:57,460 --> 00:08:02,310 Where gamma are computed on the previous step. 88 00:08:02,310 --> 00:08:06,845 So for this case those are the initial values of gamma. 89 00:08:06,845 --> 00:08:13,355 Then we go through this step where we update Q of Z and we update it with this formula. 90 00:08:13,355 --> 00:08:18,160 Here we need to compute the expected values logarithm. 91 00:08:18,160 --> 00:08:23,230 You can see the value of this function on the Wikipedia. 92 00:08:23,230 --> 00:08:28,220 So, this would be the expected failure of the stated distribution. 93 00:08:28,220 --> 00:08:32,195 So, we completed the formulas for the E-step and in 94 00:08:32,195 --> 00:08:37,910 the next video we'll derive the updates formulas for the M-step and the prediction.