1 00:00:04,772 --> 00:00:08,058 But before we continue our story about the Algorithm, 2 00:00:08,058 --> 00:00:12,165 I wanted to spend a little bit of time showing how these soft assignments 3 00:00:12,165 --> 00:00:16,600 are actually an application of something called Bayes' rule. 4 00:00:16,600 --> 00:00:21,150 And this is advanced material, so we're going to make this an optional video. 5 00:00:21,150 --> 00:00:24,030 And for those who are happy with the level of description 6 00:00:24,030 --> 00:00:28,600 of the responsibility equation in the previous video, then that's fine. 7 00:00:28,600 --> 00:00:30,550 Skip ahead to the next video. 8 00:00:30,550 --> 00:00:34,150 But for those who are interested in a little bit more detailed derivation and 9 00:00:34,150 --> 00:00:37,820 explanation of the equation, feel free to watch this video and 10 00:00:37,820 --> 00:00:39,079 see a few more details. 11 00:00:40,310 --> 00:00:43,610 Okay, so remember that our responsibilities were the probability 12 00:00:43,610 --> 00:00:46,500 of an assignment of observation i to cluster k, 13 00:00:46,500 --> 00:00:52,180 given our cluster parameters and given the observed data value xi. 14 00:00:53,200 --> 00:00:58,250 Well let's just call all those cluster parameters just params. 15 00:00:58,250 --> 00:01:03,950 And then what we see is that we can rewrite this equation and say pi k, 16 00:01:03,950 --> 00:01:10,590 so the prior probability of an assignment of zi=k is just what's written here. 17 00:01:10,590 --> 00:01:13,060 Probability zi=k given our parameters, 18 00:01:13,060 --> 00:01:16,910 where in this case the only parameter we actually need is pi k. 19 00:01:17,940 --> 00:01:21,045 And likewise for this likelihood term, 20 00:01:21,045 --> 00:01:26,523 we can write it generically as probability of our observed data point, 21 00:01:26,523 --> 00:01:30,005 xi given an assignment to cluster k, zi = k. 22 00:01:30,005 --> 00:01:33,697 And given our model parameters where again here in this case, 23 00:01:33,697 --> 00:01:37,470 the only parameters that we're going to need are mu k and sigma k. 24 00:01:38,690 --> 00:01:41,260 Okay, but we're trying to write things more generically so 25 00:01:41,260 --> 00:01:44,760 that we can show how this is an application of Bayes' rule. 26 00:01:44,760 --> 00:01:49,334 But to do this, we're going to say that the event zi = k, 27 00:01:49,334 --> 00:01:52,027 we're going to call that event A. 28 00:01:52,027 --> 00:01:56,743 And the event that we observe a value xi for our ith data point, 29 00:01:56,743 --> 00:01:58,840 we're going to call event B. 30 00:02:00,090 --> 00:02:05,300 Now using this, we can rewrite the terms in our numerator as probability 31 00:02:05,300 --> 00:02:10,640 of A given our parameters and probability of B given A and our parameters. 32 00:02:12,060 --> 00:02:16,913 Then we can do this same thing in the denominator where now we're going to 33 00:02:16,913 --> 00:02:21,615 call the event zi = j, an assignment of observation i to cluster j. 34 00:02:21,615 --> 00:02:24,360 We're going to call that sum event C. 35 00:02:24,360 --> 00:02:28,104 And so we can rewrite the numerator as a probability of C given our 36 00:02:28,104 --> 00:02:32,418 parameters times the probability of B, still the same observed valued. 37 00:02:32,418 --> 00:02:37,661 Now given this event C and again our model parameters, and 38 00:02:37,661 --> 00:02:43,979 in the denominator we're going to sum over all these possible events C. 39 00:02:43,979 --> 00:02:46,890 And then putting all these pieces together, we have the following. 40 00:02:46,890 --> 00:02:51,535 We have that our responsibilities, our probability of A given B, and 41 00:02:51,535 --> 00:02:56,855 our parameters, and we rewrote that on a previous slide in the following form. 42 00:02:56,855 --> 00:03:03,366 Where here in the denominator, if we look at this thing here inside, 43 00:03:03,366 --> 00:03:07,745 we get that this is by definition equivalent to 44 00:03:07,745 --> 00:03:13,270 probability of events B and C given our model parameters. 45 00:03:13,270 --> 00:03:17,738 And so when we look at this sum event, 46 00:03:17,738 --> 00:03:22,640 we have sum over event C, probability of 47 00:03:22,640 --> 00:03:27,122 events B and C given our parameters. 48 00:03:27,122 --> 00:03:32,005 So this means just to be 49 00:03:32,005 --> 00:03:37,150 clear, events B and C. 50 00:03:37,150 --> 00:03:40,288 Then again, if we sum over one of these events, 51 00:03:40,288 --> 00:03:45,073 if we're looking at every time B occurs with a specific example of C, and 52 00:03:45,073 --> 00:03:48,720 then we're summing over all possible values of C. 53 00:03:48,720 --> 00:03:54,047 That's exactly equivalent to just the probability of that single event, 54 00:03:54,047 --> 00:03:56,424 B, given our model parameters. 55 00:03:58,986 --> 00:04:04,415 So then using this fact, this is how we get the denominator here, 56 00:04:04,415 --> 00:04:11,720 and now what we see is we have just a much general form of the standard Bayes' rule. 57 00:04:11,720 --> 00:04:16,680 In particular, the standard way 58 00:04:16,680 --> 00:04:21,770 we would write Bayes' rule is that probability of A given B 59 00:04:23,500 --> 00:04:28,506 is equal to probability of A probability of B 60 00:04:28,506 --> 00:04:34,010 given A divided by probability of B. 61 00:04:35,320 --> 00:04:39,810 And we see exactly this form, except the only difference is we're also conditioning 62 00:04:39,810 --> 00:04:42,790 on something, our model parameters. 63 00:04:42,790 --> 00:04:46,240 And we're doing so for every term in this Bayes' rule. 64 00:04:47,440 --> 00:04:53,060 Okay, so in summary, the derivation of the equation for 65 00:04:53,060 --> 00:04:56,960 the soft assignments is just an application of Bayes' rule. 66 00:04:56,960 --> 00:04:59,989 So that we can utilize known quantities from our model, 67 00:04:59,989 --> 00:05:04,308 like the prior probabilities of cluster assignments and the likelihood of data 68 00:05:04,308 --> 00:05:08,126 points, given a cluster assignment, to form this responsibility. 69 00:05:08,126 --> 00:05:13,099 [MUSIC]