[MUSIC] Continuing our Story so far we've shown that if we know the cluster parameters then the soft assignments are very easy to compute. Well, now lets consider the alternative. Let's assume that we know the cluster assignments and we want to infer our cluster parameters. And for this part, we're going to assume that we have hard assignments of observations to clusters. And in pictures, these hard assignments are showed by the colors of each data point and when we have these hard assignments. We just have three different colors in this plot rather than the spectrum of colors that we showed earlier when we were talking about having soft assignments. But once we've fixed these cluster assignments, a question is. Does this data point here, which is assigned to this green cluster influence our parameter estimation problem for estimating the parameters of the fuchsia cluster or blue cluster? And the answer is no. Only the observations that are assigned to a given cluster inform the parameters of that cluster. We saw this in k-means when we were talking about updating the means just using the observations that were assigned to a given cluster. But now were going to have a more general form of an update where were not just updating the centers of these clusters, but also the shapes. But just to emphasize, fixing the cluster assignments means that our estimation problem decouples over our different clusters. Let's go back to our image clustering task and assume that me store our data, the RGB values associated with each image. As well as these hard cluster assignments in a table as shown here. Well, the first thing that we're going to do is we're going to split up this table based on these hard cluster assignments. So we're going to have one table of data points that are assigned to cluster three. One table of data points assigned to cluster two, and another table of data points assigned to cluster one. And then we're going to consider each one of these data tables completely independently when we're forming our parameter estimates for each one of these clusters. So let's look for example, at the data points associated with cluster three. When we're going to form our parameter estimates, what we're going to do is something called maximum likelihood estimation, which we saw in the classification course. Where we're going to search over all possible parameter settings, and find the settings that maximize the likelihood of our observed data under our specified model. So, remember that each one of our clusters is specified with a Gaussian distribution that has two parameters. A mean and a covariance. Let's spend a little bit of time talking about the form of the maximum likelihood estimate of these two parameters. Well for the mean, the maximum likelihood estimate is exactly equal to just the sample mean. So what we're going to do is simply average data points that are in cluster k. So here Nk represents the number of observations in cluster k. And here we're just summing over the data indices of points in this table that had this hard assignment to cluster 3, for example. And we're summing up all these values. So it would be this vector. So we would just literally sum these vectors, and then we would divide by three, The total number of observations. Not to be confused with the fact that it's also the label of this cluster three that's not what I mean. I mean the fact that there are three rows in this table. Okay, so that would provide our estimate of the mean, and we denote our estimate with this little hat here. And if you look at this equation, you see that it's exactly the same equation that we used in k means when we went to update the cluster centers. We looked at all data points assigned to a given cluster and just computed the average of those data points. Okay. But now, we also have this covariance term, which determines the spread and orientation of the ellipses in the Gaussian. And for this, the maximum likelihood estimate is also given by what's called the sample covariance estimate. Where at first we're going to subtract off from each of our data points the estimated mean, and then we're going to compute the outer product of these data points. Only summing over again the data points in cluster k, and dividing by the total number of observations in cluster k. So in the Scalar case, if we just have our estimate. This would be our estimate of the variance in the kth cluster. This would be equal to 1 over number of observations in that cluster, summing over observations in cluster little k. And here instead of this transpose, because this is a vector. We would just have scalars. So we'd have xi- mu hat k, which would just be a single number, and then squared. And finally, for our cluster proportions, we simply count the number of observations in cluster k and divide by the total number of observations. And that forms our estimate of the weight on the kth cluster. So remember, pi k was the weight on the kth cluster, and the hat represents the fact that this is our estimate of this. In particular, our maximum likelihood estimate. And I want to emphasize here that the form of pi hat k. Its not specific to mixtures of Gaussians, which is what we've been focusing on in this module. But it would hold if you had for example, mixtures of multinomials, or mixtures of other distributions. But of course, when we talked about the mean and covariance estimates on the previous slide, that was specific to having Gaussians defining each one of our clusters. So in summary, if we knew the cluster assignments, so the assignments of each data point to a given cluster. Then computing the estimates of the cluster parameters is very, very straightforward. But again, we don't know these hard assignments. So what are we going to do? [MUSIC]