[MUSIC] Even though we plowed through just a ton of material as we just reviewed in the recap of past modules, I wanted to spend some time discussing another really popular form of clustering called hierarchical clustering. But before we delve into the specifics of hierarchical clustering, let's spend a little bit of time motivating why one might want to use hierarchical clustering. And one is because it allows us to avoid that pesky problem of having to fix the number of clusters ahead of time. Though, of course, we have to set other things instead. And we'll discuss this. But another benefit is that the dendrogram, which captures the results of the hierarchical clustering, can allow us to very quickly visualize clusterings at different granularities without having to rerun the algorithm. Additionally, hierarchical clustering typically allows us to specify any distance metric we want for defining distances between points. Whereas, for example, when we looked at k-means, we were implicitly specifying you Ecuclidean distance as our distance metric. And finally, hierarchical clustering can allow us to capture more complex shapes to our clusters than the types of things that we saw with k-means or our mixture models. So for example, in k-means, remember that we were implicitly assuming spherically symmetric clusters. And when we looked at mixtures of Gaussians, we're assuming that our clusters had these elliptical shapes. And these can both be rather restrictive assumptions. But what we're going to show is that through hierarchical clustering, we can capture more intricate shapes such as the challenging cluster structure as shown here. Which were examples that we showed when we introduced the notion of clustering and k-means and said that those algorithms would struggle with those types of challenging cluster structures. Even though visually, they just pop out at us. Well, there are two main classes of hierarchical clustering algorithms. One is called divisive, or top-down, where we start with all the data in one cluster and then we recursively split this cluster. And one example of a divisive algorithm would be to recursively apply k-means. The other approach is called agglomerative clustering, or, bottom-up. Where we start with every data point in its own cluster and then we recursively merge clusters. And an example of this is called single linkage, which we're going to describe. [MUSIC]