The algorithm we're going to use to infer the cluster parameters as well as soft assignments is something called expectation maximization or. And the idea is we start with a set of unlabeled, observed inputs. Such as the data shown here. And the goal is to output a set of soft assignments per data point meaning an allocation of that data point to each one of the clusters. And in this example, there are three different clusters. A fuchsia cluster, a green cluster and a blue cluster and the shadings of the individual data points indicates the uncertainty of our assignment of that data point to each of these different clusters. And a key question here is how are we going to output the soft assignments just from this set of unlabeled data points. well to begin with lets just assume we actually know the cluster parameters and we just want to compute the soft assignments having fixed the values of this cluster parameters. The soft assignments are quantified by something called the responsibility vector, so for each observation I. We form a responsibility vector with elements Ri1, Ri2, all the way up to Ri capital k where capital k is the total number of clusters in our mixture model and each element Ri little k, represents the responsibility that cluster k takes for observation i. So in particular, it's the probability that observation i is assigned to k cluster, so that's this term here. Given, remember this bar here means given. The set of cluster weights and shapes. So that's here, remember the pi's or the weights and the muse in the sigma's specify the shapes of each of the clusters and were going to condition on observed value of the i data point so that's xi here. So just to make sure this notation is very clear. What we're writing here when we write probability of the first thing is the random variable that the distribution is over. And then on the right-hand side of this given sign this var. These are going to be a set of fixed values. That define the probability distribution. But before we get to specifying this probability in equations, let's gain some intuition by looking at a set of pictures. And in this example, we're going to assume that there's just two clusters, this green and blue cluster, with spherically symmetric shapes, and we're going to assume that there's basically equal weightings between the two clusters, which we see from the fact that they have. Very similar numbers of data points which are represented by these little stars. And then we're going to hone in on a single observation which is outlined with this pink color. And we're going to look at the soft assignment of this observation to these two clusters. And what we see in this case here is that this highlighted observation is closer to the center of the green cluster than it is to the blue cluster. So as a result, the green cluster is going to take more responsibility for this observation than the blue cluster. Though still both split responsibility for this observation. But in contrast in this situation here, notice that we've shifted our highlighted data point towards the blue cluster. So now this key observation is closer to the center of the blue cluster. And this blue cluster, as a result, is going to take more responsibility for this data point. But, finally, if the key observation were somewhere in-between the two clusters, then there would be basically split responsibility between the green and blue cluster, representing a lot of uncertainty. In the cluster membership for that data point. But what if one cluster weighed much more heavily in a mixture model than the other cluster. So, for example, what if this green cluster have a lot more mass than the blue one? And we see this now by the green cluster having lots and lots of data points and the blue cluster having just a few. On this case, if we look at this highlighted observation that was on the boundary of the green and blue clusters. We're still of course uncertain about the cluster membership, but the green cluster starts to seem like a much more likely explanation for this observed point. And so, what happens is because of this imbalance in the proportion of the clusters in our mixture model, the green cluster starts to take more responsibility for that point than the blue point whereas previously it was exactly split between the two clusters. In equations, we're saying that we need to weigh this initial or prior probability that a given observation comes from anyone of our clusters with how likely the observed value is under each of these clusters. So, that's the second term here. So, for example let me switch colors here, yeah I'll choose a red color. If we look at a data point here. Maybe we would say that this data point is very unlikely under the green cluster. Even though our prior probability on the green cluster is much higher. So we really have to consider both a these two terms together. Then in order to make this a valid probability, we need to normalize over all possible cluster assignments. These equations quantify the discussion that we had in the last section where we talked about how in to weigh the prior term which represented just for example that prior probability that out of our set of images were going to grab out an image and that image is going to be a cloud image with the likelihood term. Which represented for a given image based on the values that we're observing on of that image. How likely are those values under a giving cluster assignment. And as we see now, it is indeed the combination of these two terms that determine our soft assignments. So, in summery, if we know the cluster parameters, then computing these soft assignments is really trivial. All we have to do is for each one of our possible clusters, we compute the prior times the likelyhood. To form our responsibility and then just normalize this vector, so that it sums to one over all the possible clusters. But the story isn't over, because we don't actually know our cluster parameters. That's something that we also have to somehow infer just from our unlabeled data.