[MUSIC] We showed that if you know the cluster parameters, then it's easy to compute the soft assignments. And then we showed that if you know the soft assignments, it's very straightforward to estimate the cluster parameters. So this motivates an iterative algorithm where we're going to cycle between these two steps. And the algorithm that we're going to present is a special case of a more generic algorithm called expectation maximization or. So more formally, the Algorithm for mixtures of Gaussian is going to cycle between two steps where in the first step, which is called the E step. We're going to estimate our cluster responsibilities given the current estimate of our cluster parameters and the equation for this for the mixture of Gaussian case is shown here which is just a specific example of the equations that we showed previously. Then the second step of that algorithm which is called the M-step is to compute the maximum likelihood estimate of our cluster parameters given our current estimate of the soft assignments. And so the update here, the maximum likelihood estimate based on a set of soft assignments takes the form of the equations that we showed previously. In pictures, the Algorithm works as follows. The first step is to initialize our set of model parameters. So our mixture weights at the zeroth iteration, our means, and our covariances and those are represented by the ellipses shown in this picture, where each one has a center and then give and spread an orientation. Then having fixed these cluster parameters then we compute our responsibilities. So our estimate of the responsibilities at the first iteration are based just by computing that prior times the likelihood term for each cluster and then normalizing so I won't repeat the equations here. But just to show in pictures that the point here is that this data point. So each one of these big circles is a data point. And it's also a pie chart, which is indicating the responsibilities associated with that data point. So for example this data point here, maybe this has a responsibility vector at the first iteration. That is maybe it's something like 0.52 in the fuchsia color and 0.4 in the blue cluster and has a responsibility of 0.08 to the green cluster. And then having computed these responsibilities for each of our data points, we go and we reestimate our cluster parameters. So we're going to maximize The likelihood given our soft assignments. Rik at the first iteration. And this is going to lead to updated estimates of our cluster parameters pi hat k at iteration one, mu hat k at iteration one, sigma hat k, at iteration one. And what you see in this picture is that these ellipses have shifted and changed. So then, based on these new set of clusters are the parameters associated with these clusters. We're going to recompute our responsibilities. So estimate of the responsibilities at the second iteration. And then we are going to rinse and repeat. Until convergence and we are going to talk about convergence more in the next section. But here's our plot at convergence, and what you see is that our clusters are pretty clean. They, in my point of view, capture the structure of the data fairly well. And, really critically, if we look at, for example, this observation here. We see that even after convergence, there's still uncertainty that's represented in that observations cluster assignment. And I want to emphasize that this is exactly what we want from this algorithm. In contrast to k mean, that would have made a hard decision about whether that observation was going to be associated with the blue cluster or fuschia cluster. Here, Provides the sense of uncertainty about which cluster this observation should be associated with. And this is true for other observations as well, not as much as for this observation that's clearly on the boundary of these two clusters. But in summary, hopefully we see now in pictures the difference between what Gives us and K means. And again, just to emphasize, not only is it giving us this soft assignments but we also have estimates of the cluster parameters so the shapes of these different clusters and their locations in the space, not just the cluster locations. So let's watch this one more time and before we do just remember each one of this circles or this little pipe charts is a data point and the color shown in the pie chart represent the responsibilities associated with that observation. So how much its currently associated with anyone of the three clusters and our model. And this ellipses represents the three different 2D Gaussians in this example and as we're going through these plots it's going to show the updates to both the responsibilities and our estimates of these cluster parameters. So here is our initialization of the algorithm and then this is after one iteration, two iterations and act convergence. [MUSIC]