[MUSIC] Well let's start talking about
clustering algorithms and to begin with, let's talk about the bread and butter
clustering algorithm called k-means. So the k-means algorithm assumes this
simple definition of score that we described before where you just measured
the distance between the observation, and the cluster center as a way of associating
data points to specific clusters. So, in particular here, these little gray boxes are going to
be our data points to cluster. So this might be datapoint xi. And the first step of the algorithm
is to initialize our cluster centers. So these circles are going to
represent our cluster centers and we're going to notate the cluster
centers with the Greek letter mu, so we have mu 1, mu 2 and mu 3. Here we're looking at
an example with k = 3 clusters. So we have three clusters centers. And then having to find
those cluster centers, the next step of the algorithm is to
assign observations to the clusters based on the minimum distance
to the cluster center. So in equations,
that's written right here where this is our jth cluster center and xi is our observation. This is the cluster label for
observation i. So remember, this is going to be
our inferred cluster label for the ith observation, whereas in supervised learning we were
assuming we had a given label yi. But what we're doing here with this
function is this thing called arg min. So let's describe what arg min does. So arg min is going to return the index j, Of the cluster whose center is closest to observation Xi And
this is contrast to min, Whereas if we think about
minimizing a function, returns the value of the objective, which would be minimum value of this distance calculation we're doing here. So just to be clear by saying,
arg min we're returning which index as we're searching over all cluster
centers muj, mu1, mu2, mu3, with the ith observation fixed. And varying, This cluster center. We're saying which was the index
where this distance was the smallest. Where as if we had just written min, we
would return the minimum distance itself, rather than the index of the cluster
that achieved that minimum distance. Okay, so it's important to explain this
equation because we're going to see this equation throughout
the rest of the module. Okay, so the point here is that we're
computing distances to cluster centers and All of these colored points were
ones where the distance to. Okay, this blue is not visible at all. Let me just for this one switch colors. The distances to these cluster
centers were smaller than had we computed distances to any of
the other cluster centers, so that's how they got
assigned to these points. And this here, this little, Partitioning of this 2D
space into these different colors is called a Voronoi tessellation. And what it represents is that,
if any point falls into, for example, this blue region here, that means
that the distance to this cluster center is smaller than it is to the distance
to any of the other cluster centers. So this is the region in which
any observation here will get assigned to this blue cluster. And any observation here will get assigned
to the green cluster and finally, any observation here will get
assigned to the red cluster. But I want to emphasize this is for
visualization only. You don't need to compute
this to do k-means. Okay, so
far we've initialized our cluster centers. We've assigned each one of
our observations based on which cluster center it was closest to. And then the next step in the k-means
algorithm is we're going to update the positions of the cluster centers based
on the points that were assigned to them. In particular we're going to compute the
mean of the data points in that cluster or equivalently, we can think of this as
the center mass of these data points. And that's going to be
our new cluster center. So in equations, we're going to say
that our new cluster center for the jth cluster is equal to 1/nj,
where nj is the number of observations in cluster j, and then we're going to
sum over all observations. So this is all observations i, such that zi = J which means that observation i is in cluster j. So just to make sure this is clear,
if we look at this sum, how many terms are appearing in this sum,
n j terms. That's the size of the set. The size of the number of observations
where the cluster indicator is j. Okay, so this is just an equation to say, we're summing over all our observations,
in cluster j. And then, we're dividing by the total
number of observations in the cluster and that gives us our average or
the center of mass in that cluster. Finally, the last step or
it won't actually be the last step but, the last stage of k-means is to repeat
the last two steps that we did. So we're going to keep iterating between
assigning observations to cluster centers, then updating the cluster centers and
going on and on and on until convergence, and we're going to talk about the notion
of convergence a little bit later. [MUSIC]