1
00:00:00,025 --> 00:00:04,299
[MUSIC]

2
00:00:04,299 --> 00:00:08,510
Finally, I wanted to draw a connection
between For mixtures of Gaussian's and

3
00:00:08,510 --> 00:00:09,860
our k-means algorithm.

4
00:00:09,860 --> 00:00:14,631
And to do this let's consider a mixture
of Gaussian's, where we have spherically

5
00:00:14,631 --> 00:00:18,926
symmetric Gaussian's that are the same
between the different clusters,

6
00:00:18,926 --> 00:00:21,670
just having different cluster centers.

7
00:00:21,670 --> 00:00:26,840
So this is equivalent to
having covariances that have

8
00:00:26,840 --> 00:00:29,340
sigma squared along the diagonal, and

9
00:00:29,340 --> 00:00:34,050
the sigma squared is exactly equal
along every element of this diagonal.

10
00:00:34,050 --> 00:00:38,600
So this is going to be the covariance
matrix associated with each of

11
00:00:38,600 --> 00:00:40,390
the different clusters.

12
00:00:40,390 --> 00:00:44,510
And then what we're going to do is we're
going to take this variance term and

13
00:00:44,510 --> 00:00:45,725
we're going to drive it to 0.

14
00:00:46,750 --> 00:00:53,110
So we have these
infinitesimally tight clusters.

15
00:00:53,110 --> 00:00:58,308
Well because we have these spherically
symmetric clusters, when we go to compute

16
00:00:58,308 --> 00:01:03,431
the relative likelihood of an assignment
of an observation to one cluster versus

17
00:01:03,431 --> 00:01:09,034
another cluster, that just depends on the
relative distances to the cluster centers.

18
00:01:09,034 --> 00:01:13,421
And because we've driven
those variances to 0,

19
00:01:13,421 --> 00:01:17,812
those relative likelihoods
go to either 1 or 0.

20
00:01:17,812 --> 00:01:22,425
And this happens because of
the certainty the clusters have,

21
00:01:22,425 --> 00:01:27,400
indicated by their 0 variances,
or in the limit, 0 variances.

22
00:01:28,400 --> 00:01:31,230
And of course, when we're computing
our cluster responsibilities,

23
00:01:31,230 --> 00:01:35,410
we're also weighing in the relative
proportions of these different clusters.

24
00:01:35,410 --> 00:01:40,100
But this is completely dominated
by these significant differences

25
00:01:40,100 --> 00:01:44,480
in the likelihoods,
that ratio being either 0 or 1.

26
00:01:44,480 --> 00:01:49,320
And so what ends up happening here is
datapoints are going to be fully assigned

27
00:01:49,320 --> 00:01:53,420
to a single cluster, just based on
the distance to that cluster center.

28
00:01:53,420 --> 00:01:56,110
And this is just like
what we see in k-means.

29
00:01:56,110 --> 00:02:00,675
And so when we apply our Algorithm
to this mixture of Gaussian's

30
00:02:00,675 --> 00:02:05,647
with these variances, that are of
course the same across dimensions but

31
00:02:05,647 --> 00:02:07,619
then shrinking those to 0.

32
00:02:07,619 --> 00:02:11,787
Is in the E-step when we're estimating our
responsibilities, we're going to get these

33
00:02:11,787 --> 00:02:15,574
responsibilities that are just 0 or 1 for
the reasons that we just described.

34
00:02:15,574 --> 00:02:20,506
And then when we go to do our M-step,
we're just going to update our

35
00:02:20,506 --> 00:02:25,274
cluster means, the variances are fixed,
and the limit to 0.

36
00:02:25,274 --> 00:02:29,618
And when we're estimating our
cluster means and of course,

37
00:02:29,618 --> 00:02:34,645
our cluster proportions too,
these cluster means are just going to look

38
00:02:34,645 --> 00:02:40,268
at the datapoints that are now hard
observe, hard assigned to that cluster and

39
00:02:40,268 --> 00:02:44,050
use that to estimate
the mean in that cluster.

40
00:02:44,050 --> 00:02:47,130
And so
what we see is that these two steps of our

41
00:02:47,130 --> 00:02:50,160
Algorithm are exactly
what we do in k-means.

42
00:02:50,160 --> 00:02:53,630
Where we make hard
assignments of observations

43
00:02:53,630 --> 00:02:57,760
to clusters just based on
the distance to the cluster center.

44
00:02:57,760 --> 00:03:00,472
And then we recenter the clusters, or

45
00:03:00,472 --> 00:03:05,662
we just compute the average of
the datapoints assigned to each cluster.

46
00:03:05,662 --> 00:03:11,219
[MUSIC]