[MUSIC] So, we've discussed
sensitivity to initialization. Now, let's turn to a method
that provides a smarter and more reliable form of
initializing our algorithm. And this is referred to as k-means++. So, how we think about initializing
our k-means algorithm is really, really, really critical to
the quality of the solution found. And so one way to think about a smart
initialization is as follows. The first step is to select the first
cluster center, just randomly amongst all our data points, to find it
to be one of those data points at random. Then the second step is for
each one of our observations, compute the distance to
the nearest cluster center. And so, for the first pass through
there's only one cluster center, so it's just the distance
to that cluster center. But if there are multiple cluster centers
as we go through this initialization scheme, then we're going to maintain the
distance to the closest cluster center. And then when we go to sample,
our next cluster center, we're going to choose it proportional
to those distances squared. So we're going to bias towards selecting
an observation that's far from existing cluster centers. And then we're going to repeat
the steps until we've sampled k different cluster centers. So let's visualize what this means. So again, here are the data
points that we wish to cluster, the same example that we've been using. And the first step of our k-mean is ++
algorithm is just to select one of these data points at random and
call this the first cluster center. Then the next step of the algorithm is to
compute distances to this cluster center. And then based on the square
of these distances we're going to sample a new cluster center. So, remember that we're
going to be more likely To select a data point As a cluster center If that data point is far away. And the fact that in the algorithm
we're looking at the distant squares, makes that even more exaggerated. So we're very likely to sample
one of the data points that's far from this cluster center
as the next cluster center. And so, maybe we select this
observation as the next cluster center. So, that's going to be some green cluster. Now we repeat this
process where we compute distance to the nearest cluster center. So, maybe these are going to be
the distances that we compute. It gets a little bit hard to tell,
I think. This is a smaller distance than this,
perhaps not, but you get the point. And then based of of this we're going to,
again, compute the square of each of these
distances and then sample a point. And this one here is really far away
from the nearest cluster center, so it's very likely to get selected. And, indeed, in this example,
we select that. So our initialization of
three cluster centers for our k-means algorithm are going to
be at these locations. So running k-means++ to initialize our
k-means algorithm is definitely more computationally costly than just randomly
selecting a set of cluster centers. But the subsequent k-means algorithm
that we're going to run often converges more rapidly. And so, k-means++ tends to improve
the quality of the local optimum of found. And in addition, reduce the runtime
because it converges more rapid to that high quality solution. So in practice, running k-means++ tends to
be a really, really helpful thing to do. [MUSIC]