[MUSIC] Finally, I wanted to draw a connection between For mixtures of Gaussian's and our k-means algorithm. And to do this let's consider a mixture of Gaussian's, where we have spherically symmetric Gaussian's that are the same between the different clusters, just having different cluster centers. So this is equivalent to having covariances that have sigma squared along the diagonal, and the sigma squared is exactly equal along every element of this diagonal. So this is going to be the covariance matrix associated with each of the different clusters. And then what we're going to do is we're going to take this variance term and we're going to drive it to 0. So we have these infinitesimally tight clusters. Well because we have these spherically symmetric clusters, when we go to compute the relative likelihood of an assignment of an observation to one cluster versus another cluster, that just depends on the relative distances to the cluster centers. And because we've driven those variances to 0, those relative likelihoods go to either 1 or 0. And this happens because of the certainty the clusters have, indicated by their 0 variances, or in the limit, 0 variances. And of course, when we're computing our cluster responsibilities, we're also weighing in the relative proportions of these different clusters. But this is completely dominated by these significant differences in the likelihoods, that ratio being either 0 or 1. And so what ends up happening here is datapoints are going to be fully assigned to a single cluster, just based on the distance to that cluster center. And this is just like what we see in k-means. And so when we apply our Algorithm to this mixture of Gaussian's with these variances, that are of course the same across dimensions but then shrinking those to 0. Is in the E-step when we're estimating our responsibilities, we're going to get these responsibilities that are just 0 or 1 for the reasons that we just described. And then when we go to do our M-step, we're just going to update our cluster means, the variances are fixed, and the limit to 0. And when we're estimating our cluster means and of course, our cluster proportions too, these cluster means are just going to look at the datapoints that are now hard observe, hard assigned to that cluster and use that to estimate the mean in that cluster. And so what we see is that these two steps of our Algorithm are exactly what we do in k-means. Where we make hard assignments of observations to clusters just based on the distance to the cluster center. And then we recenter the clusters, or we just compute the average of the datapoints assigned to each cluster. [MUSIC]