1
00:00:00,000 --> 00:00:04,696
[MUSIC]

2
00:00:04,696 --> 00:00:08,901
So one way to compactly represent
the results of hierarchical equestrian

3
00:00:08,901 --> 00:00:11,498
are through something called a dendrogram.

4
00:00:11,498 --> 00:00:14,880
And we're going to explain the dendrogram
in the context of agglomerative

5
00:00:14,880 --> 00:00:19,000
clustering, even though this type
of representation can be used for

6
00:00:19,000 --> 00:00:21,740
other hierarchical equestrian
approaches as well.

7
00:00:21,740 --> 00:00:24,270
So the dendrogram
representation is as follows.

8
00:00:24,270 --> 00:00:27,920
Along the x axis, we're going to place
each one of our data points, but

9
00:00:27,920 --> 00:00:33,460
carefully ordered so that we get a very
nice visual coming out of this and then

10
00:00:33,460 --> 00:00:39,330
along the y-axis, what we're indicating is
the distance between different clusters.

11
00:00:39,330 --> 00:00:42,420
So more specifically,
when we're thinking about single linkage,

12
00:00:42,420 --> 00:00:47,800
where we're looking at the minimum
distance between the set of points

13
00:00:47,800 --> 00:00:52,340
in any pair of clusters,
then what the height of a specific

14
00:00:52,340 --> 00:00:57,810
merge point is going to represent is
the distance between those two clusters.

15
00:00:57,810 --> 00:01:03,100
So let's describe this a little bit more
where if we look at a given merge point,

16
00:01:03,100 --> 00:01:05,870
and below that we have two separate trees,

17
00:01:07,030 --> 00:01:10,010
each one of those trees is going to
represent a different cluster.

18
00:01:10,010 --> 00:01:15,730
So in this example, we have this
blue cluster, so all these points

19
00:01:15,730 --> 00:01:19,810
that are colored in blue are data points
that are assigned to this blue cluster

20
00:01:19,810 --> 00:01:23,640
at some iteration of this algorithm and
likewise, we have this green cluster.

21
00:01:24,720 --> 00:01:28,940
And then the height of the merge
between this blue branch and

22
00:01:28,940 --> 00:01:34,630
this green branch indicates what
was the minimum distance between

23
00:01:34,630 --> 00:01:39,290
any point in that blue cluster and
any point in that green cluster.

24
00:01:39,290 --> 00:01:44,590
And so that minimum distance is where
we're going to put that merge of the blue

25
00:01:44,590 --> 00:01:50,710
branch with the green branch, so that
specifies the height along the y-axis.

26
00:01:50,710 --> 00:01:54,880
And so, throughout this tree, we're seeing
the different clusters that are present

27
00:01:54,880 --> 00:01:58,840
and the distances between these
clusters as different merges were made

28
00:01:58,840 --> 00:02:02,830
throughout the algorithm, going from every
data point being in its own cluster,

29
00:02:02,830 --> 00:02:07,270
all the way to all the data
points being in one cluster.

30
00:02:07,270 --> 00:02:10,730
And likewise, if we look at
any path down this dendrogram,

31
00:02:11,990 --> 00:02:17,680
what that indicates is the cluster
membership of a given data point

32
00:02:17,680 --> 00:02:19,410
throughout all the different merge steps.

33
00:02:19,410 --> 00:02:23,370
So what cluster that data
point belonged to and

34
00:02:23,370 --> 00:02:26,689
the sequence of merges made for
those clusters.

35
00:02:27,750 --> 00:02:32,270
So in summary, we see that the dendrogram
is able to capture all the critical

36
00:02:32,270 --> 00:02:35,380
elements of the hierarchical
clustering result.

37
00:02:35,380 --> 00:02:37,652
So what are the different
cluster memberships?

38
00:02:37,652 --> 00:02:40,140
Which cluster is merged with which?

39
00:02:40,140 --> 00:02:43,480
And what were the distances
between those clusters

40
00:02:43,480 --> 00:02:45,660
at the point at which they merged?

41
00:02:45,660 --> 00:02:48,880
Well, one important thing we can
think of doing from the dendrogram

42
00:02:48,880 --> 00:02:50,440
is extracting a partition.

43
00:02:50,440 --> 00:02:54,320
Because remember, if we run agglomerative
clustering from beginning all the way to

44
00:02:54,320 --> 00:02:59,770
the end, we're starting with all data
points in separate individual clusters and

45
00:02:59,770 --> 00:03:01,860
then ending with all data
points in one cluster.

46
00:03:01,860 --> 00:03:05,230
And what we really want is we want to
produce some clustering of our data

47
00:03:05,230 --> 00:03:07,300
points, that's somewhere in between.

48
00:03:07,300 --> 00:03:10,040
Not at the granularity of every
point being its own cluster and

49
00:03:10,040 --> 00:03:14,700
not at this very coarse granularity
of everything being one cluster.

50
00:03:14,700 --> 00:03:18,870
So this leads to a question of
how do we extract some partition?

51
00:03:18,870 --> 00:03:22,010
How do we define a set of clusters
to produce from this hierarchical

52
00:03:22,010 --> 00:03:23,700
clustering procedure?

53
00:03:23,700 --> 00:03:26,940
And one really simple
approach is to perform a cut

54
00:03:26,940 --> 00:03:30,140
along the y-axis of the dendrogram.

55
00:03:30,140 --> 00:03:33,800
Then every branch that crosses
this line that we chose

56
00:03:35,020 --> 00:03:37,770
is going to define a separate cluster.

57
00:03:37,770 --> 00:03:42,140
So in this example, we see we have this
fuchsia cluster, blue, green, orange, and

58
00:03:42,140 --> 00:03:43,670
gray clusters.

59
00:03:43,670 --> 00:03:46,860
But remember in this visualization,
each of these data points,

60
00:03:46,860 --> 00:03:50,070
this just represented different
data indices in our data set.

61
00:03:50,070 --> 00:03:53,950
But what we can do is we can go back
to our original feature space and

62
00:03:53,950 --> 00:03:56,880
visualize what the resulting
clusters look like.

63
00:03:56,880 --> 00:04:01,230
So if our data were just in 2D,
like all the examples we've been given,

64
00:04:01,230 --> 00:04:06,140
because of our nice 2D slide
representation, this might be

65
00:04:06,140 --> 00:04:10,550
the clustering associated with the cut
that we showed on the previous slide here.

66
00:04:11,680 --> 00:04:17,360
And as we think about varying where this
cut point is, we're looking at different

67
00:04:17,360 --> 00:04:21,470
possible clusterings of our dataset
through different granularities, going

68
00:04:21,470 --> 00:04:26,680
from really, really fine granularity all
the way up to very coarse granularities.

69
00:04:26,680 --> 00:04:30,710
So let's spend a little time thinking
about what it means to cut the dendrogram

70
00:04:30,710 --> 00:04:32,094
at some level D*.

71
00:04:33,100 --> 00:04:36,120
Well what we're saying then is that, for

72
00:04:36,120 --> 00:04:40,950
the resulting clusters produced
by this cut, there are no pair of

73
00:04:40,950 --> 00:04:46,560
clusters at a distance less than D*
that have not already been merged.

74
00:04:46,560 --> 00:04:50,602
So that means that D* is the minimum
distance between our clusters at

75
00:04:50,602 --> 00:04:52,420
this level of the clustering.

76
00:04:52,420 --> 00:04:56,729
[MUSIC]