1 00:00:00,025 --> 00:00:04,299 [MUSIC] 2 00:00:04,299 --> 00:00:08,510 Finally, I wanted to draw a connection between For mixtures of Gaussian's and 3 00:00:08,510 --> 00:00:09,860 our k-means algorithm. 4 00:00:09,860 --> 00:00:14,631 And to do this let's consider a mixture of Gaussian's, where we have spherically 5 00:00:14,631 --> 00:00:18,926 symmetric Gaussian's that are the same between the different clusters, 6 00:00:18,926 --> 00:00:21,670 just having different cluster centers. 7 00:00:21,670 --> 00:00:26,840 So this is equivalent to having covariances that have 8 00:00:26,840 --> 00:00:29,340 sigma squared along the diagonal, and 9 00:00:29,340 --> 00:00:34,050 the sigma squared is exactly equal along every element of this diagonal. 10 00:00:34,050 --> 00:00:38,600 So this is going to be the covariance matrix associated with each of 11 00:00:38,600 --> 00:00:40,390 the different clusters. 12 00:00:40,390 --> 00:00:44,510 And then what we're going to do is we're going to take this variance term and 13 00:00:44,510 --> 00:00:45,725 we're going to drive it to 0. 14 00:00:46,750 --> 00:00:53,110 So we have these infinitesimally tight clusters. 15 00:00:53,110 --> 00:00:58,308 Well because we have these spherically symmetric clusters, when we go to compute 16 00:00:58,308 --> 00:01:03,431 the relative likelihood of an assignment of an observation to one cluster versus 17 00:01:03,431 --> 00:01:09,034 another cluster, that just depends on the relative distances to the cluster centers. 18 00:01:09,034 --> 00:01:13,421 And because we've driven those variances to 0, 19 00:01:13,421 --> 00:01:17,812 those relative likelihoods go to either 1 or 0. 20 00:01:17,812 --> 00:01:22,425 And this happens because of the certainty the clusters have, 21 00:01:22,425 --> 00:01:27,400 indicated by their 0 variances, or in the limit, 0 variances. 22 00:01:28,400 --> 00:01:31,230 And of course, when we're computing our cluster responsibilities, 23 00:01:31,230 --> 00:01:35,410 we're also weighing in the relative proportions of these different clusters. 24 00:01:35,410 --> 00:01:40,100 But this is completely dominated by these significant differences 25 00:01:40,100 --> 00:01:44,480 in the likelihoods, that ratio being either 0 or 1. 26 00:01:44,480 --> 00:01:49,320 And so what ends up happening here is datapoints are going to be fully assigned 27 00:01:49,320 --> 00:01:53,420 to a single cluster, just based on the distance to that cluster center. 28 00:01:53,420 --> 00:01:56,110 And this is just like what we see in k-means. 29 00:01:56,110 --> 00:02:00,675 And so when we apply our Algorithm to this mixture of Gaussian's 30 00:02:00,675 --> 00:02:05,647 with these variances, that are of course the same across dimensions but 31 00:02:05,647 --> 00:02:07,619 then shrinking those to 0. 32 00:02:07,619 --> 00:02:11,787 Is in the E-step when we're estimating our responsibilities, we're going to get these 33 00:02:11,787 --> 00:02:15,574 responsibilities that are just 0 or 1 for the reasons that we just described. 34 00:02:15,574 --> 00:02:20,506 And then when we go to do our M-step, we're just going to update our 35 00:02:20,506 --> 00:02:25,274 cluster means, the variances are fixed, and the limit to 0. 36 00:02:25,274 --> 00:02:29,618 And when we're estimating our cluster means and of course, 37 00:02:29,618 --> 00:02:34,645 our cluster proportions too, these cluster means are just going to look 38 00:02:34,645 --> 00:02:40,268 at the datapoints that are now hard observe, hard assigned to that cluster and 39 00:02:40,268 --> 00:02:44,050 use that to estimate the mean in that cluster. 40 00:02:44,050 --> 00:02:47,130 And so what we see is that these two steps of our 41 00:02:47,130 --> 00:02:50,160 Algorithm are exactly what we do in k-means. 42 00:02:50,160 --> 00:02:53,630 Where we make hard assignments of observations 43 00:02:53,630 --> 00:02:57,760 to clusters just based on the distance to the cluster center. 44 00:02:57,760 --> 00:03:00,472 And then we recenter the clusters, or 45 00:03:00,472 --> 00:03:05,662 we just compute the average of the datapoints assigned to each cluster. 46 00:03:05,662 --> 00:03:11,219 [MUSIC]