[MUSIC] So that was our deep dive into retrieval where a lot of our focus was not only on how do we think about representing our data and computing distances, which are very critical to the performance of the nearest neighbour algorithm, but we also focus very heavily on how we think about scaling up nearest neighbour search to really large data sets. Thinking about using KD trees, and locality sense of hashing, and using both of these to perform approximate nearest neighbor search. But for the remainder of the course, we turn to this idea of clustering. And in particular, in our second module, we first discuss, what does clustering mean? And we return to our document analysis where the goal here was to discover groups of related articles. And the first algorithm that we presented for performing clustering was something called k-means. And that is probably the most widely used algorithm for clustering. I remember that in this algorithm we somehow initialized by just placing a set of cluster centers, and then the algorithm iterates between making hard assignments of data points to the nearest cluster center, and then updating the cluster centers. And this procedure repeats until convergence. And in particular, we cascade means as a coordinate descent algorithm, and through this we know that it's going to converge to a local motive of the objective. So importantly, we know that we're not going to converge necessarily to the global mode, but the algorithm does converge just to a local mode. So we went through this in pictures, showing that for different initializations, we can converge to quite different solutions. So in practice people often run multiple initializations of the algorithm and choose the best one. And we discussed how initialization matters really significantly for the quality of the solution found, as well as the convergence rates. But, just like in our first module on retrieval, in our second module, one of our big emphasis was on the scalability of the procedure. And for this, we introduce a generic framework called MapReduce for parellelizing computations across multiple machines under certain specific setups. In particular, we talked about the MapReduce abstraction in terms of a map step that has to be data-parallel over elements, and a reduce step that performs some aggregation that has to be data-parallel over keys. But if your algorithm has this structure, then you can think about using MapReduce to paralyze and distribute the computations. And we showed that indeed K-means has this type of structure. Where in particular we can think about the classify step of making the hard assignments of data points to cluster centers as data parallel over elements. And the recenter step, where we recompute the centers of each of the clusters as data parallel over keys. So we have a map and reduce step here, but in k-means we mention that we have to iterate between these steps again and again, which is a non-standard implementation of MapReduce. But it can definitely be applied in this context and give us really good scaling properties for k-means. [MUSIC]