But before we continue our story about the Algorithm, I wanted to spend a little bit of time showing how these soft assignments are actually an application of something called Bayes' rule. And this is advanced material, so we're going to make this an optional video. And for those who are happy with the level of description of the responsibility equation in the previous video, then that's fine. Skip ahead to the next video. But for those who are interested in a little bit more detailed derivation and explanation of the equation, feel free to watch this video and see a few more details. Okay, so remember that our responsibilities were the probability of an assignment of observation i to cluster k, given our cluster parameters and given the observed data value xi. Well let's just call all those cluster parameters just params. And then what we see is that we can rewrite this equation and say pi k, so the prior probability of an assignment of zi=k is just what's written here. Probability zi=k given our parameters, where in this case the only parameter we actually need is pi k. And likewise for this likelihood term, we can write it generically as probability of our observed data point, xi given an assignment to cluster k, zi = k. And given our model parameters where again here in this case, the only parameters that we're going to need are mu k and sigma k. Okay, but we're trying to write things more generically so that we can show how this is an application of Bayes' rule. But to do this, we're going to say that the event zi = k, we're going to call that event A. And the event that we observe a value xi for our ith data point, we're going to call event B. Now using this, we can rewrite the terms in our numerator as probability of A given our parameters and probability of B given A and our parameters. Then we can do this same thing in the denominator where now we're going to call the event zi = j, an assignment of observation i to cluster j. We're going to call that sum event C. And so we can rewrite the numerator as a probability of C given our parameters times the probability of B, still the same observed valued. Now given this event C and again our model parameters, and in the denominator we're going to sum over all these possible events C. And then putting all these pieces together, we have the following. We have that our responsibilities, our probability of A given B, and our parameters, and we rewrote that on a previous slide in the following form. Where here in the denominator, if we look at this thing here inside, we get that this is by definition equivalent to probability of events B and C given our model parameters. And so when we look at this sum event, we have sum over event C, probability of events B and C given our parameters. So this means just to be clear, events B and C. Then again, if we sum over one of these events, if we're looking at every time B occurs with a specific example of C, and then we're summing over all possible values of C. That's exactly equivalent to just the probability of that single event, B, given our model parameters. So then using this fact, this is how we get the denominator here, and now what we see is we have just a much general form of the standard Bayes' rule. In particular, the standard way we would write Bayes' rule is that probability of A given B is equal to probability of A probability of B given A divided by probability of B. And we see exactly this form, except the only difference is we're also conditioning on something, our model parameters. And we're doing so for every term in this Bayes' rule. Okay, so in summary, the derivation of the equation for the soft assignments is just an application of Bayes' rule. So that we can utilize known quantities from our model, like the prior probabilities of cluster assignments and the likelihood of data points, given a cluster assignment, to form this responsibility. [MUSIC]