1 00:00:00,000 --> 00:00:04,753 [MUSIC] 2 00:00:04,753 --> 00:00:06,993 We showed that if you know the cluster parameters, 3 00:00:06,993 --> 00:00:09,360 then it's easy to compute the soft assignments. 4 00:00:09,360 --> 00:00:11,430 And then we showed that if you know the soft assignments, 5 00:00:11,430 --> 00:00:14,470 it's very straightforward to estimate the cluster parameters. 6 00:00:14,470 --> 00:00:17,280 So this motivates an iterative algorithm where we're going to cycle 7 00:00:17,280 --> 00:00:19,560 between these two steps. 8 00:00:19,560 --> 00:00:22,900 And the algorithm that we're going to present is a special case of a more 9 00:00:22,900 --> 00:00:27,730 generic algorithm called expectation maximization or. 10 00:00:27,730 --> 00:00:32,030 So more formally, the Algorithm for mixtures of Gaussian is going to 11 00:00:32,030 --> 00:00:37,120 cycle between two steps where in the first step, which is called the E step. 12 00:00:37,120 --> 00:00:41,400 We're going to estimate our cluster responsibilities 13 00:00:41,400 --> 00:00:46,190 given the current estimate of our cluster parameters and the equation for this for 14 00:00:46,190 --> 00:00:48,720 the mixture of Gaussian case is shown here 15 00:00:48,720 --> 00:00:53,140 which is just a specific example of the equations that we showed previously. 16 00:00:53,140 --> 00:00:57,200 Then the second step of that algorithm which is called the M-step 17 00:00:57,200 --> 00:01:00,980 is to compute the maximum likelihood estimate of our cluster parameters 18 00:01:00,980 --> 00:01:03,470 given our current estimate of the soft assignments. 19 00:01:04,610 --> 00:01:11,430 And so the update here, the maximum likelihood estimate based on a set 20 00:01:11,430 --> 00:01:15,430 of soft assignments takes the form of the equations that we showed previously. 21 00:01:17,560 --> 00:01:21,070 In pictures, the Algorithm works as follows. 22 00:01:21,070 --> 00:01:25,270 The first step is to initialize our set of model parameters. 23 00:01:27,390 --> 00:01:32,610 So our mixture weights at the zeroth iteration, our means, 24 00:01:33,700 --> 00:01:38,440 and our covariances and 25 00:01:38,440 --> 00:01:41,600 those are represented by the ellipses shown in this picture, 26 00:01:41,600 --> 00:01:46,050 where each one has a center and then give and spread an orientation. 27 00:01:46,050 --> 00:01:51,333 Then having fixed these cluster 28 00:01:51,333 --> 00:01:59,960 parameters then we compute our responsibilities. 29 00:01:59,960 --> 00:02:05,070 So our estimate of the responsibilities at the first 30 00:02:05,070 --> 00:02:10,460 iteration are based just by computing that prior times the likelihood term for 31 00:02:10,460 --> 00:02:15,070 each cluster and then normalizing so I won't repeat the equations here. 32 00:02:15,070 --> 00:02:19,000 But just to show in pictures that the point here is that this data point. 33 00:02:19,000 --> 00:02:21,390 So each one of these big circles is a data point. 34 00:02:21,390 --> 00:02:23,050 And it's also a pie chart, 35 00:02:23,050 --> 00:02:28,110 which is indicating the responsibilities associated with that data point. 36 00:02:28,110 --> 00:02:30,530 So for example this data point here, 37 00:02:30,530 --> 00:02:35,040 maybe this has a responsibility vector at the first iteration. 38 00:02:35,040 --> 00:02:40,731 That is maybe it's something like 0.52 39 00:02:40,731 --> 00:02:47,650 in the fuchsia color and 0.4 in the blue cluster and 40 00:02:47,650 --> 00:02:54,900 has a responsibility of 0.08 to the green cluster. 41 00:02:56,970 --> 00:02:59,930 And then having computed these responsibilities for 42 00:02:59,930 --> 00:03:04,880 each of our data points, we go and we reestimate our cluster parameters. 43 00:03:04,880 --> 00:03:08,015 So we're going to maximize 44 00:03:10,896 --> 00:03:17,781 The likelihood given our soft assignments. 45 00:03:21,131 --> 00:03:25,031 Rik at the first iteration. 46 00:03:26,245 --> 00:03:31,335 And this is going to lead to updated estimates of our cluster parameters pi hat 47 00:03:31,335 --> 00:03:36,520 k at iteration one, mu hat k at iteration one, sigma hat k, at iteration one. 48 00:03:39,900 --> 00:03:45,410 And what you see in this picture is that these ellipses have shifted and changed. 49 00:03:46,460 --> 00:03:50,340 So then, based on these new set of clusters 50 00:03:50,340 --> 00:03:52,540 are the parameters associated with these clusters. 51 00:03:53,840 --> 00:03:55,880 We're going to recompute our responsibilities. 52 00:04:00,330 --> 00:04:05,112 So estimate of the responsibilities at the second iteration. 53 00:04:05,112 --> 00:04:08,080 And then we are going to rinse and repeat. 54 00:04:11,610 --> 00:04:13,050 Until convergence and 55 00:04:13,050 --> 00:04:15,680 we are going to talk about convergence more in the next section. 56 00:04:17,250 --> 00:04:19,640 But here's our plot at convergence, and 57 00:04:19,640 --> 00:04:23,140 what you see is that our clusters are pretty clean. 58 00:04:23,140 --> 00:04:29,900 They, in my point of view, capture the structure of the data fairly well. 59 00:04:29,900 --> 00:04:35,020 And, really critically, if we look at, for example, this observation here. 60 00:04:35,020 --> 00:04:39,657 We see that even after convergence, there's still uncertainty that's 61 00:04:39,657 --> 00:04:43,259 represented in that observations cluster assignment. 62 00:04:57,766 --> 00:05:02,490 And I want to emphasize that this is exactly what we want from this algorithm. 63 00:05:02,490 --> 00:05:06,330 In contrast to k mean, that would have made a hard decision about whether that 64 00:05:06,330 --> 00:05:10,700 observation was going to be associated with the blue cluster or fuschia cluster. 65 00:05:10,700 --> 00:05:14,520 Here, Provides the sense of uncertainty 66 00:05:14,520 --> 00:05:18,930 about which cluster this observation should be associated with. 67 00:05:18,930 --> 00:05:22,670 And this is true for other observations as well, not as much as for 68 00:05:22,670 --> 00:05:26,180 this observation that's clearly on the boundary of these two clusters. 69 00:05:27,270 --> 00:05:31,830 But in summary, hopefully we see now in pictures the difference between what 70 00:05:31,830 --> 00:05:33,380 Gives us and K means. 71 00:05:33,380 --> 00:05:37,370 And again, just to emphasize, not only is it giving us this soft assignments but 72 00:05:37,370 --> 00:05:42,170 we also have estimates of the cluster parameters so the shapes of these 73 00:05:42,170 --> 00:05:46,990 different clusters and their locations in the space, not just the cluster locations. 74 00:05:48,680 --> 00:05:50,740 So let's watch this one more time and 75 00:05:50,740 --> 00:05:54,230 before we do just remember each one of this circles or 76 00:05:54,230 --> 00:05:59,110 this little pipe charts is a data point and the color shown in the pie 77 00:05:59,110 --> 00:06:03,140 chart represent the responsibilities associated with that observation. 78 00:06:03,140 --> 00:06:08,340 So how much its currently associated with anyone of the three clusters and 79 00:06:08,340 --> 00:06:09,230 our model. 80 00:06:11,150 --> 00:06:17,140 And this ellipses represents the three different 2D Gaussians in this example and 81 00:06:17,140 --> 00:06:21,220 as we're going through these plots it's going to show the updates to both 82 00:06:21,220 --> 00:06:25,300 the responsibilities and our estimates of these cluster parameters. 83 00:06:27,000 --> 00:06:30,458 So here is our initialization of the algorithm and 84 00:06:30,458 --> 00:06:35,493 then this is after one iteration, two iterations and act convergence. 85 00:06:35,493 --> 00:06:39,729 [MUSIC]