[MUSIC] Well let's start talking about clustering algorithms and to begin with, let's talk about the bread and butter clustering algorithm called k-means. So the k-means algorithm assumes this simple definition of score that we described before where you just measured the distance between the observation, and the cluster center as a way of associating data points to specific clusters. So, in particular here, these little gray boxes are going to be our data points to cluster. So this might be datapoint xi. And the first step of the algorithm is to initialize our cluster centers. So these circles are going to represent our cluster centers and we're going to notate the cluster centers with the Greek letter mu, so we have mu 1, mu 2 and mu 3. Here we're looking at an example with k = 3 clusters. So we have three clusters centers. And then having to find those cluster centers, the next step of the algorithm is to assign observations to the clusters based on the minimum distance to the cluster center. So in equations, that's written right here where this is our jth cluster center and xi is our observation. This is the cluster label for observation i. So remember, this is going to be our inferred cluster label for the ith observation, whereas in supervised learning we were assuming we had a given label yi. But what we're doing here with this function is this thing called arg min. So let's describe what arg min does. So arg min is going to return the index j, Of the cluster whose center is closest to observation Xi And this is contrast to min, Whereas if we think about minimizing a function, returns the value of the objective, which would be minimum value of this distance calculation we're doing here. So just to be clear by saying, arg min we're returning which index as we're searching over all cluster centers muj, mu1, mu2, mu3, with the ith observation fixed. And varying, This cluster center. We're saying which was the index where this distance was the smallest. Where as if we had just written min, we would return the minimum distance itself, rather than the index of the cluster that achieved that minimum distance. Okay, so it's important to explain this equation because we're going to see this equation throughout the rest of the module. Okay, so the point here is that we're computing distances to cluster centers and All of these colored points were ones where the distance to. Okay, this blue is not visible at all. Let me just for this one switch colors. The distances to these cluster centers were smaller than had we computed distances to any of the other cluster centers, so that's how they got assigned to these points. And this here, this little, Partitioning of this 2D space into these different colors is called a Voronoi tessellation. And what it represents is that, if any point falls into, for example, this blue region here, that means that the distance to this cluster center is smaller than it is to the distance to any of the other cluster centers. So this is the region in which any observation here will get assigned to this blue cluster. And any observation here will get assigned to the green cluster and finally, any observation here will get assigned to the red cluster. But I want to emphasize this is for visualization only. You don't need to compute this to do k-means. Okay, so far we've initialized our cluster centers. We've assigned each one of our observations based on which cluster center it was closest to. And then the next step in the k-means algorithm is we're going to update the positions of the cluster centers based on the points that were assigned to them. In particular we're going to compute the mean of the data points in that cluster or equivalently, we can think of this as the center mass of these data points. And that's going to be our new cluster center. So in equations, we're going to say that our new cluster center for the jth cluster is equal to 1/nj, where nj is the number of observations in cluster j, and then we're going to sum over all observations. So this is all observations i, such that zi = J which means that observation i is in cluster j. So just to make sure this is clear, if we look at this sum, how many terms are appearing in this sum, n j terms. That's the size of the set. The size of the number of observations where the cluster indicator is j. Okay, so this is just an equation to say, we're summing over all our observations, in cluster j. And then, we're dividing by the total number of observations in the cluster and that gives us our average or the center of mass in that cluster. Finally, the last step or it won't actually be the last step but, the last stage of k-means is to repeat the last two steps that we did. So we're going to keep iterating between assigning observations to cluster centers, then updating the cluster centers and going on and on and on until convergence, and we're going to talk about the notion of convergence a little bit later. [MUSIC]