[MUSIC] Well, to move towards something that we might be able to do, let's talk about what would change if we only knew the soft assignments instead of hard assignments? That is, how would we estimate our cluster parameters from a set of soft assignments of observations to our set of clusters? Well now, instead of having an observation in a single cluster and no other cluster, each observation is really going to be in every cluster. It's going to have some split allocation across our different clusters based on the responsibility vector, the soft assignment of that observation. And so we're going to think about how to update our cluster parameters based on allocating a single observation to our entire set of clusters. So when we go to do maximum likelihood estimation from soft assignments, what we're going to do is we're going to form a data table where for every observation, every row of this table, we're going to introduce a set of weights that correspond to the responsibility vector. So this looks very similar to what we did in boosting with weighted observations. But now instead of having a single weight associated with each row, each different data point, we have a set of weights and these set of weights sum to one. But just like in boosting with weighted observations, these weights are going to modify our row operations that we're performing in doing our maximum likelihood estimation. But when we have these split allocations of data points across our multiple clusters, it's hard to think about counting observations in a cluster. So in this case, what we can think about doing is computing the total weight in each one of these clusters just some in over the responsibilities in the cluster. And think of this as the effective number of observations in each one of our clusters. And then what we're going to do is pretty similar to what we did when we had hard assignments where we're going to form cluster specific data tables. But now, instead of dividing up our data table into here are the ones in cluster one, here are the ones in cluster two, cluster three, every observation is going to appear in each of these data tables because there is some responsibility taken by each cluster for that observation. Of course it could be zero, but generically, let's think of some responsibility for cluster to a given observation. And so we're going to form a table where we specify the cluster weights associated with each one of our observations. So here would be the table associated with cluster one and here's the table for cluster two and cluster three. Then for each one of our clusters, we're going to compute a maximum likelihood estimate of the cluster parameter but with these weights modifying the row operations that we're doing. So here are updated maximum likelihood estimates where we're accounting for weights on our observation which we see here. So in particular we see that every row operations. So every time we're touching a given data element xi it's going to be multiplied by rik. And the other thing I want to know is that when we're computing the total number of observations in the cluster, now we're going to call this Nk soft, based on our set of soft assignment. And this is going to be the effective number of observations in that cluster which is just the sum of the responsibilities in that cluster. We do a similar modification when we go to estimate our cluster proportions. Where now instead of calculating the total number of observations in a cluster, we compute the effective number of observations in that cluster. And then we simply divide by the total number of observations to the data set which is equivalent to the total effective number of observation, just the sum of this vector here. So an equations are updated estimate of our cluster proportions is written as follows where we simply replace Nk with this Nksoft. Note that if our response for these just took values zero or one, that is if we were making hard assignments of observations to a cluster. Then things would just default back to what we showed in the previous section. So in particular we can think of this responsibility vector which now just has a single one in one of these clusters as representing a one-hot encoding of the cluster assignment. So to see the equivalence between what we have here and what we showed in the previous section we can look at our set of equations for our maximum likely or the estimates based on soft assignments. And we can show that if we actually plug in hard assignments then we get out the set of equations we presented in section 2A. So to begin with we can look at our estimate of the cluster proportions and note that if rik, just takes values in the set zero or one. Then what this sum is going to be is we're just going to count. Observation i In cluster k if rik equals one. So this sum here is just going to default to counting the number of observations in cluster k. And that's exactly what we had before, and now if we go to our estimate of the mean. And when we think about multiplying by rik, we're only going to be adding xi to this sum, we remember here our sum is over all the data points if xi is in cluster k. So only add xi if i is in k, that is if rik equals one. And in this case this equation would reduce to one over Nk sum i in cluster k xi which again is exactly what we had before. And finally for estimate of the covariance it's going to be the same as above where we're only going to be adding these outer products if observation i is in cluster k. And things are going to default to the same equation as in the hard assignment case. And just to make this explicit, it's 1 over NK sum i in k, xi minus mu hat k, xi minus mu hat k transpose. So this at least serves as a little sanity check that the equations that we presented in this section for our maximum likelihood estimates based on soft assignments might at least not be wrong because we showed that if we plug in a hard assignment we got out the maximum likely or the estimates that we knew to be true from before. What we've seen in this is that based just on soft assignments it's still straight forward to compute our estimates of the cluster parameters. [MUSIC]