1
00:00:00,000 --> 00:00:04,546
[MUSIC]

2
00:00:04,546 --> 00:00:08,670
Well let's start talking about
clustering algorithms and to begin with,

3
00:00:08,670 --> 00:00:13,094
let's talk about the bread and butter
clustering algorithm called k-means.

4
00:00:13,094 --> 00:00:18,490
So the k-means algorithm assumes this
simple definition of score that we

5
00:00:18,490 --> 00:00:23,230
described before where you just measured
the distance between the observation, and

6
00:00:23,230 --> 00:00:29,050
the cluster center as a way of associating
data points to specific clusters.

7
00:00:29,050 --> 00:00:31,620
So, in particular here,

8
00:00:31,620 --> 00:00:35,670
these little gray boxes are going to
be our data points to cluster.

9
00:00:35,670 --> 00:00:43,210
So this might be datapoint xi.

10
00:00:43,210 --> 00:00:49,260
And the first step of the algorithm
is to initialize our cluster centers.

11
00:00:49,260 --> 00:00:54,350
So these circles are going to
represent our cluster centers and

12
00:00:54,350 --> 00:00:59,938
we're going to notate the cluster
centers with the Greek letter mu,

13
00:00:59,938 --> 00:01:02,647
so we have mu 1, mu 2 and mu 3.

14
00:01:02,647 --> 00:01:08,770
Here we're looking at
an example with k = 3 clusters.

15
00:01:09,940 --> 00:01:15,564
So we have three clusters centers.

16
00:01:15,564 --> 00:01:18,488
And then having to find
those cluster centers,

17
00:01:18,488 --> 00:01:23,239
the next step of the algorithm is to
assign observations to the clusters based

18
00:01:23,239 --> 00:01:26,810
on the minimum distance
to the cluster center.

19
00:01:26,810 --> 00:01:30,850
So in equations,
that's written right here where

20
00:01:32,040 --> 00:01:37,454
this is our jth cluster

21
00:01:37,454 --> 00:01:44,900
center and xi is our observation.

22
00:01:44,900 --> 00:01:50,090
This is the cluster label for
observation i.

23
00:01:50,090 --> 00:01:54,170
So remember, this is going to be
our inferred cluster label for

24
00:01:54,170 --> 00:01:55,550
the ith observation,

25
00:01:55,550 --> 00:02:01,720
whereas in supervised learning we were
assuming we had a given label yi.

26
00:02:01,720 --> 00:02:07,420
But what we're doing here with this
function is this thing called arg min.

27
00:02:07,420 --> 00:02:10,200
So let's describe what arg min does.

28
00:02:10,200 --> 00:02:13,843
So arg min is going to return the index j,

29
00:02:17,047 --> 00:02:22,199
Of the cluster whose center is closest

30
00:02:22,199 --> 00:02:30,361
to observation Xi And
this is contrast to min,

31
00:02:32,936 --> 00:02:39,770
Whereas if we think about
minimizing a function,

32
00:02:39,770 --> 00:02:44,814
returns the value of the objective,

33
00:02:44,814 --> 00:02:49,696
which would be minimum value of this

34
00:02:49,696 --> 00:02:56,180
distance calculation we're doing here.

35
00:02:56,180 --> 00:03:01,052
So just to be clear by saying,
arg min we're returning which index as

36
00:03:01,052 --> 00:03:06,008
we're searching over all cluster
centers muj, mu1, mu2, mu3,

37
00:03:06,008 --> 00:03:08,361
with the ith observation fixed.

38
00:03:10,946 --> 00:03:17,080
And varying, This cluster center.

39
00:03:17,080 --> 00:03:21,660
We're saying which was the index
where this distance was the smallest.

40
00:03:21,660 --> 00:03:27,080
Where as if we had just written min, we
would return the minimum distance itself,

41
00:03:27,080 --> 00:03:30,400
rather than the index of the cluster
that achieved that minimum distance.

42
00:03:31,410 --> 00:03:35,380
Okay, so it's important to explain this
equation because we're going to see

43
00:03:35,380 --> 00:03:40,150
this equation throughout
the rest of the module.

44
00:03:40,150 --> 00:03:46,932
Okay, so the point here is that we're
computing distances to cluster centers and

45
00:03:48,876 --> 00:03:54,414
All of these colored points were
ones where the distance to.

46
00:03:54,414 --> 00:03:57,609
Okay, this blue is not visible at all.

47
00:03:57,609 --> 00:03:59,838
Let me just for this one switch colors.

48
00:04:05,148 --> 00:04:08,431
The distances to these cluster
centers were smaller than had

49
00:04:08,431 --> 00:04:11,776
we computed distances to any of
the other cluster centers, so

50
00:04:11,776 --> 00:04:14,370
that's how they got
assigned to these points.

51
00:04:16,430 --> 00:04:20,962
And this here, this little,

52
00:04:23,414 --> 00:04:27,911
Partitioning of this 2D
space into these different

53
00:04:27,911 --> 00:04:31,600
colors is called a Voronoi tessellation.

54
00:04:34,730 --> 00:04:40,060
And what it represents is that,
if any point falls into, for

55
00:04:40,060 --> 00:04:45,910
example, this blue region here, that means
that the distance to this cluster center

56
00:04:45,910 --> 00:04:50,090
is smaller than it is to the distance
to any of the other cluster centers.

57
00:04:50,090 --> 00:04:54,564
So this is the region in which
any observation here will get

58
00:04:54,564 --> 00:04:57,041
assigned to this blue cluster.

59
00:04:57,041 --> 00:05:01,289
And any observation here will get assigned
to the green cluster and finally,

60
00:05:01,289 --> 00:05:04,570
any observation here will get
assigned to the red cluster.

61
00:05:07,620 --> 00:05:13,860
But I want to emphasize this is for
visualization only.

62
00:05:16,177 --> 00:05:19,178
You don't need to compute
this to do k-means.

63
00:05:23,842 --> 00:05:27,440
Okay, so
far we've initialized our cluster centers.

64
00:05:27,440 --> 00:05:30,130
We've assigned each one of
our observations based on

65
00:05:31,230 --> 00:05:33,350
which cluster center it was closest to.

66
00:05:33,350 --> 00:05:38,890
And then the next step in the k-means
algorithm is we're going to update

67
00:05:38,890 --> 00:05:44,940
the positions of the cluster centers based
on the points that were assigned to them.

68
00:05:44,940 --> 00:05:50,370
In particular we're going to compute the
mean of the data points in that cluster or

69
00:05:50,370 --> 00:05:54,720
equivalently, we can think of this as
the center mass of these data points.

70
00:05:54,720 --> 00:05:57,410
And that's going to be
our new cluster center.

71
00:05:57,410 --> 00:06:02,370
So in equations, we're going to say
that our new cluster center for

72
00:06:02,370 --> 00:06:07,752
the jth cluster is equal to 1/nj,
where nj is the number of

73
00:06:07,752 --> 00:06:12,670
observations in cluster j,

74
00:06:12,670 --> 00:06:18,740
and then we're going to
sum over all observations.

75
00:06:18,740 --> 00:06:26,674
So this is all observations i,

76
00:06:26,674 --> 00:06:32,167
such that zi = J which

77
00:06:32,167 --> 00:06:38,272
means that observation

78
00:06:38,272 --> 00:06:42,859
i is in cluster j.

79
00:06:42,859 --> 00:06:47,764
So just to make sure this is clear,
if we look at this sum,

80
00:06:47,764 --> 00:06:52,740
how many terms are appearing in this sum,
n j terms.

81
00:06:52,740 --> 00:06:54,230
That's the size of the set.

82
00:06:54,230 --> 00:07:02,200
The size of the number of observations
where the cluster indicator is j.

83
00:07:02,200 --> 00:07:04,670
Okay, so this is just an equation to say,

84
00:07:04,670 --> 00:07:08,750
we're summing over all our observations,
in cluster j.

85
00:07:08,750 --> 00:07:12,620
And then, we're dividing by the total
number of observations in the cluster and

86
00:07:12,620 --> 00:07:16,560
that gives us our average or
the center of mass in that cluster.

87
00:07:16,560 --> 00:07:21,432
Finally, the last step or
it won't actually be the last step but,

88
00:07:21,432 --> 00:07:26,478
the last stage of k-means is to repeat
the last two steps that we did.

89
00:07:26,478 --> 00:07:32,133
So we're going to keep iterating between
assigning observations to cluster centers,

90
00:07:32,133 --> 00:07:37,391
then updating the cluster centers and
going on and on and on until convergence,

91
00:07:37,391 --> 00:07:42,274
and we're going to talk about the notion
of convergence a little bit later.

92
00:07:42,274 --> 00:07:42,774
[MUSIC]