[MUSIC] So, we've discussed sensitivity to initialization. Now, let's turn to a method that provides a smarter and more reliable form of initializing our algorithm. And this is referred to as k-means++. So, how we think about initializing our k-means algorithm is really, really, really critical to the quality of the solution found. And so one way to think about a smart initialization is as follows. The first step is to select the first cluster center, just randomly amongst all our data points, to find it to be one of those data points at random. Then the second step is for each one of our observations, compute the distance to the nearest cluster center. And so, for the first pass through there's only one cluster center, so it's just the distance to that cluster center. But if there are multiple cluster centers as we go through this initialization scheme, then we're going to maintain the distance to the closest cluster center. And then when we go to sample, our next cluster center, we're going to choose it proportional to those distances squared. So we're going to bias towards selecting an observation that's far from existing cluster centers. And then we're going to repeat the steps until we've sampled k different cluster centers. So let's visualize what this means. So again, here are the data points that we wish to cluster, the same example that we've been using. And the first step of our k-mean is ++ algorithm is just to select one of these data points at random and call this the first cluster center. Then the next step of the algorithm is to compute distances to this cluster center. And then based on the square of these distances we're going to sample a new cluster center. So, remember that we're going to be more likely To select a data point As a cluster center If that data point is far away. And the fact that in the algorithm we're looking at the distant squares, makes that even more exaggerated. So we're very likely to sample one of the data points that's far from this cluster center as the next cluster center. And so, maybe we select this observation as the next cluster center. So, that's going to be some green cluster. Now we repeat this process where we compute distance to the nearest cluster center. So, maybe these are going to be the distances that we compute. It gets a little bit hard to tell, I think. This is a smaller distance than this, perhaps not, but you get the point. And then based of of this we're going to, again, compute the square of each of these distances and then sample a point. And this one here is really far away from the nearest cluster center, so it's very likely to get selected. And, indeed, in this example, we select that. So our initialization of three cluster centers for our k-means algorithm are going to be at these locations. So running k-means++ to initialize our k-means algorithm is definitely more computationally costly than just randomly selecting a set of cluster centers. But the subsequent k-means algorithm that we're going to run often converges more rapidly. And so, k-means++ tends to improve the quality of the local optimum of found. And in addition, reduce the runtime because it converges more rapid to that high quality solution. So in practice, running k-means++ tends to be a really, really helpful thing to do. [MUSIC]