[MUSIC] Even though we plowed through just
a ton of material as we just reviewed in the recap of past modules,
I wanted to spend some time discussing another really popular form of
clustering called hierarchical clustering. But before we delve into the specifics
of hierarchical clustering, let's spend a little bit of time motivating why one
might want to use hierarchical clustering. And one is because it allows
us to avoid that pesky problem of having to fix the number
of clusters ahead of time. Though, of course,
we have to set other things instead. And we'll discuss this. But another benefit is that the
dendrogram, which captures the results of the hierarchical clustering, can allow
us to very quickly visualize clusterings at different granularities without
having to rerun the algorithm. Additionally, hierarchical clustering
typically allows us to specify any distance metric we want for
defining distances between points. Whereas, for example, when we looked at
k-means, we were implicitly specifying you Ecuclidean distance as
our distance metric. And finally, hierarchical clustering can
allow us to capture more complex shapes to our clusters than the types of things that
we saw with k-means or our mixture models. So for example, in k-means, remember that we were implicitly
assuming spherically symmetric clusters. And when we looked at
mixtures of Gaussians, we're assuming that our clusters
had these elliptical shapes. And these can both be rather
restrictive assumptions. But what we're going to show is that
through hierarchical clustering, we can capture more intricate
shapes such as the challenging cluster structure as shown here. Which were examples that we showed when we
introduced the notion of clustering and k-means and said that those
algorithms would struggle with those types of challenging
cluster structures. Even though visually,
they just pop out at us. Well, there are two main classes of
hierarchical clustering algorithms. One is called divisive, or top-down, where
we start with all the data in one cluster and then we recursively
split this cluster. And one example of a divisive algorithm
would be to recursively apply k-means. The other approach is called
agglomerative clustering, or, bottom-up. Where we start with every data
point in its own cluster and then we recursively merge clusters. And an example of this is called single
linkage, which we're going to describe. [MUSIC]