1 00:00:00,000 --> 00:00:04,645 [MUSIC] 2 00:00:04,645 --> 00:00:08,391 Even though we plowed through just a ton of material as we just reviewed 3 00:00:08,391 --> 00:00:11,374 in the recap of past modules, I wanted to spend some time 4 00:00:11,374 --> 00:00:16,220 discussing another really popular form of clustering called hierarchical clustering. 5 00:00:17,480 --> 00:00:21,050 But before we delve into the specifics of hierarchical clustering, let's spend 6 00:00:21,050 --> 00:00:25,770 a little bit of time motivating why one might want to use hierarchical clustering. 7 00:00:25,770 --> 00:00:29,520 And one is because it allows us to avoid that pesky problem 8 00:00:29,520 --> 00:00:32,360 of having to fix the number of clusters ahead of time. 9 00:00:33,390 --> 00:00:35,740 Though, of course, we have to set other things instead. 10 00:00:35,740 --> 00:00:37,310 And we'll discuss this. 11 00:00:37,310 --> 00:00:41,694 But another benefit is that the dendrogram, which captures the results of 12 00:00:41,694 --> 00:00:46,567 the hierarchical clustering, can allow us to very quickly visualize clusterings 13 00:00:46,567 --> 00:00:50,618 at different granularities without having to rerun the algorithm. 14 00:00:50,618 --> 00:00:55,079 Additionally, hierarchical clustering typically allows us to specify any 15 00:00:55,079 --> 00:00:59,320 distance metric we want for defining distances between points. 16 00:00:59,320 --> 00:01:03,665 Whereas, for example, when we looked at k-means, we were implicitly specifying you 17 00:01:03,665 --> 00:01:06,540 Ecuclidean distance as our distance metric. 18 00:01:06,540 --> 00:01:10,700 And finally, hierarchical clustering can allow us to capture more complex shapes to 19 00:01:10,700 --> 00:01:15,440 our clusters than the types of things that we saw with k-means or our mixture models. 20 00:01:15,440 --> 00:01:16,620 So for example, in k-means, 21 00:01:16,620 --> 00:01:20,020 remember that we were implicitly assuming spherically symmetric clusters. 22 00:01:20,020 --> 00:01:21,490 And when we looked at mixtures of Gaussians, 23 00:01:21,490 --> 00:01:24,880 we're assuming that our clusters had these elliptical shapes. 24 00:01:24,880 --> 00:01:27,470 And these can both be rather restrictive assumptions. 25 00:01:27,470 --> 00:01:30,617 But what we're going to show is that through hierarchical clustering, 26 00:01:30,617 --> 00:01:33,337 we can capture more intricate shapes such as the challenging 27 00:01:33,337 --> 00:01:34,899 cluster structure as shown here. 28 00:01:34,899 --> 00:01:38,749 Which were examples that we showed when we introduced the notion of clustering and 29 00:01:38,749 --> 00:01:41,596 k-means and said that those algorithms would struggle with 30 00:01:41,596 --> 00:01:43,965 those types of challenging cluster structures. 31 00:01:43,965 --> 00:01:47,600 Even though visually, they just pop out at us. 32 00:01:47,600 --> 00:01:51,270 Well, there are two main classes of hierarchical clustering algorithms. 33 00:01:51,270 --> 00:01:56,030 One is called divisive, or top-down, where we start with all the data in one cluster 34 00:01:56,030 --> 00:01:58,730 and then we recursively split this cluster. 35 00:01:58,730 --> 00:02:04,395 And one example of a divisive algorithm would be to recursively apply k-means. 36 00:02:04,395 --> 00:02:08,745 The other approach is called agglomerative clustering, or, bottom-up. 37 00:02:08,745 --> 00:02:11,855 Where we start with every data point in its own cluster and 38 00:02:11,855 --> 00:02:14,215 then we recursively merge clusters. 39 00:02:14,215 --> 00:02:18,048 And an example of this is called single linkage, which we're going to describe. 40 00:02:18,048 --> 00:02:22,309 [MUSIC]