1
00:00:00,000 --> 00:00:04,645
[MUSIC]

2
00:00:04,645 --> 00:00:08,391
Even though we plowed through just
a ton of material as we just reviewed

3
00:00:08,391 --> 00:00:11,374
in the recap of past modules,
I wanted to spend some time

4
00:00:11,374 --> 00:00:16,220
discussing another really popular form of
clustering called hierarchical clustering.

5
00:00:17,480 --> 00:00:21,050
But before we delve into the specifics
of hierarchical clustering, let's spend

6
00:00:21,050 --> 00:00:25,770
a little bit of time motivating why one
might want to use hierarchical clustering.

7
00:00:25,770 --> 00:00:29,520
And one is because it allows
us to avoid that pesky problem

8
00:00:29,520 --> 00:00:32,360
of having to fix the number
of clusters ahead of time.

9
00:00:33,390 --> 00:00:35,740
Though, of course,
we have to set other things instead.

10
00:00:35,740 --> 00:00:37,310
And we'll discuss this.

11
00:00:37,310 --> 00:00:41,694
But another benefit is that the
dendrogram, which captures the results of

12
00:00:41,694 --> 00:00:46,567
the hierarchical clustering, can allow
us to very quickly visualize clusterings

13
00:00:46,567 --> 00:00:50,618
at different granularities without
having to rerun the algorithm.

14
00:00:50,618 --> 00:00:55,079
Additionally, hierarchical clustering
typically allows us to specify any

15
00:00:55,079 --> 00:00:59,320
distance metric we want for
defining distances between points.

16
00:00:59,320 --> 00:01:03,665
Whereas, for example, when we looked at
k-means, we were implicitly specifying you

17
00:01:03,665 --> 00:01:06,540
Ecuclidean distance as
our distance metric.

18
00:01:06,540 --> 00:01:10,700
And finally, hierarchical clustering can
allow us to capture more complex shapes to

19
00:01:10,700 --> 00:01:15,440
our clusters than the types of things that
we saw with k-means or our mixture models.

20
00:01:15,440 --> 00:01:16,620
So for example, in k-means,

21
00:01:16,620 --> 00:01:20,020
remember that we were implicitly
assuming spherically symmetric clusters.

22
00:01:20,020 --> 00:01:21,490
And when we looked at
mixtures of Gaussians,

23
00:01:21,490 --> 00:01:24,880
we're assuming that our clusters
had these elliptical shapes.

24
00:01:24,880 --> 00:01:27,470
And these can both be rather
restrictive assumptions.

25
00:01:27,470 --> 00:01:30,617
But what we're going to show is that
through hierarchical clustering,

26
00:01:30,617 --> 00:01:33,337
we can capture more intricate
shapes such as the challenging

27
00:01:33,337 --> 00:01:34,899
cluster structure as shown here.

28
00:01:34,899 --> 00:01:38,749
Which were examples that we showed when we
introduced the notion of clustering and

29
00:01:38,749 --> 00:01:41,596
k-means and said that those
algorithms would struggle with

30
00:01:41,596 --> 00:01:43,965
those types of challenging
cluster structures.

31
00:01:43,965 --> 00:01:47,600
Even though visually,
they just pop out at us.

32
00:01:47,600 --> 00:01:51,270
Well, there are two main classes of
hierarchical clustering algorithms.

33
00:01:51,270 --> 00:01:56,030
One is called divisive, or top-down, where
we start with all the data in one cluster

34
00:01:56,030 --> 00:01:58,730
and then we recursively
split this cluster.

35
00:01:58,730 --> 00:02:04,395
And one example of a divisive algorithm
would be to recursively apply k-means.

36
00:02:04,395 --> 00:02:08,745
The other approach is called
agglomerative clustering, or, bottom-up.

37
00:02:08,745 --> 00:02:11,855
Where we start with every data
point in its own cluster and

38
00:02:11,855 --> 00:02:14,215
then we recursively merge clusters.

39
00:02:14,215 --> 00:02:18,048
And an example of this is called single
linkage, which we're going to describe.

40
00:02:18,048 --> 00:02:22,309
[MUSIC]