1
00:00:00,000 --> 00:00:04,753
[MUSIC]

2
00:00:04,753 --> 00:00:06,993
We showed that if you know
the cluster parameters,

3
00:00:06,993 --> 00:00:09,360
then it's easy to compute
the soft assignments.

4
00:00:09,360 --> 00:00:11,430
And then we showed that if you
know the soft assignments,

5
00:00:11,430 --> 00:00:14,470
it's very straightforward to
estimate the cluster parameters.

6
00:00:14,470 --> 00:00:17,280
So this motivates an iterative
algorithm where we're going to cycle

7
00:00:17,280 --> 00:00:19,560
between these two steps.

8
00:00:19,560 --> 00:00:22,900
And the algorithm that we're going to
present is a special case of a more

9
00:00:22,900 --> 00:00:27,730
generic algorithm called
expectation maximization or.

10
00:00:27,730 --> 00:00:32,030
So more formally, the Algorithm for
mixtures of Gaussian is going to

11
00:00:32,030 --> 00:00:37,120
cycle between two steps where in the first
step, which is called the E step.

12
00:00:37,120 --> 00:00:41,400
We're going to estimate our
cluster responsibilities

13
00:00:41,400 --> 00:00:46,190
given the current estimate of our cluster
parameters and the equation for this for

14
00:00:46,190 --> 00:00:48,720
the mixture of Gaussian case is shown here

15
00:00:48,720 --> 00:00:53,140
which is just a specific example of
the equations that we showed previously.

16
00:00:53,140 --> 00:00:57,200
Then the second step of that
algorithm which is called the M-step

17
00:00:57,200 --> 00:01:00,980
is to compute the maximum likelihood
estimate of our cluster parameters

18
00:01:00,980 --> 00:01:03,470
given our current estimate
of the soft assignments.

19
00:01:04,610 --> 00:01:11,430
And so the update here, the maximum
likelihood estimate based on a set

20
00:01:11,430 --> 00:01:15,430
of soft assignments takes the form of
the equations that we showed previously.

21
00:01:17,560 --> 00:01:21,070
In pictures,
the Algorithm works as follows.

22
00:01:21,070 --> 00:01:25,270
The first step is to initialize
our set of model parameters.

23
00:01:27,390 --> 00:01:32,610
So our mixture weights at
the zeroth iteration, our means,

24
00:01:33,700 --> 00:01:38,440
and our covariances and

25
00:01:38,440 --> 00:01:41,600
those are represented by
the ellipses shown in this picture,

26
00:01:41,600 --> 00:01:46,050
where each one has a center and
then give and spread an orientation.

27
00:01:46,050 --> 00:01:51,333
Then having fixed these cluster

28
00:01:51,333 --> 00:01:59,960
parameters then we compute
our responsibilities.

29
00:01:59,960 --> 00:02:05,070
So our estimate of
the responsibilities at the first

30
00:02:05,070 --> 00:02:10,460
iteration are based just by computing
that prior times the likelihood term for

31
00:02:10,460 --> 00:02:15,070
each cluster and then normalizing so
I won't repeat the equations here.

32
00:02:15,070 --> 00:02:19,000
But just to show in pictures that
the point here is that this data point.

33
00:02:19,000 --> 00:02:21,390
So each one of these big
circles is a data point.

34
00:02:21,390 --> 00:02:23,050
And it's also a pie chart,

35
00:02:23,050 --> 00:02:28,110
which is indicating the responsibilities
associated with that data point.

36
00:02:28,110 --> 00:02:30,530
So for example this data point here,

37
00:02:30,530 --> 00:02:35,040
maybe this has a responsibility
vector at the first iteration.

38
00:02:35,040 --> 00:02:40,731
That is maybe it's something like 0.52

39
00:02:40,731 --> 00:02:47,650
in the fuchsia color and
0.4 in the blue cluster and

40
00:02:47,650 --> 00:02:54,900
has a responsibility of
0.08 to the green cluster.

41
00:02:56,970 --> 00:02:59,930
And then having computed
these responsibilities for

42
00:02:59,930 --> 00:03:04,880
each of our data points, we go and
we reestimate our cluster parameters.

43
00:03:04,880 --> 00:03:08,015
So we're going to maximize

44
00:03:10,896 --> 00:03:17,781
The likelihood given our soft assignments.

45
00:03:21,131 --> 00:03:25,031
Rik at the first iteration.

46
00:03:26,245 --> 00:03:31,335
And this is going to lead to updated
estimates of our cluster parameters pi hat

47
00:03:31,335 --> 00:03:36,520
k at iteration one, mu hat k at iteration
one, sigma hat k, at iteration one.

48
00:03:39,900 --> 00:03:45,410
And what you see in this picture is that
these ellipses have shifted and changed.

49
00:03:46,460 --> 00:03:50,340
So then,
based on these new set of clusters

50
00:03:50,340 --> 00:03:52,540
are the parameters associated
with these clusters.

51
00:03:53,840 --> 00:03:55,880
We're going to recompute
our responsibilities.

52
00:04:00,330 --> 00:04:05,112
So estimate of the responsibilities
at the second iteration.

53
00:04:05,112 --> 00:04:08,080
And then we are going to rinse and repeat.

54
00:04:11,610 --> 00:04:13,050
Until convergence and

55
00:04:13,050 --> 00:04:15,680
we are going to talk about
convergence more in the next section.

56
00:04:17,250 --> 00:04:19,640
But here's our plot at convergence, and

57
00:04:19,640 --> 00:04:23,140
what you see is that our
clusters are pretty clean.

58
00:04:23,140 --> 00:04:29,900
They, in my point of view, capture
the structure of the data fairly well.

59
00:04:29,900 --> 00:04:35,020
And, really critically, if we look at,
for example, this observation here.

60
00:04:35,020 --> 00:04:39,657
We see that even after convergence,
there's still uncertainty that's

61
00:04:39,657 --> 00:04:43,259
represented in that observations
cluster assignment.

62
00:04:57,766 --> 00:05:02,490
And I want to emphasize that this is
exactly what we want from this algorithm.

63
00:05:02,490 --> 00:05:06,330
In contrast to k mean, that would have
made a hard decision about whether that

64
00:05:06,330 --> 00:05:10,700
observation was going to be associated
with the blue cluster or fuschia cluster.

65
00:05:10,700 --> 00:05:14,520
Here, Provides the sense of uncertainty

66
00:05:14,520 --> 00:05:18,930
about which cluster this observation
should be associated with.

67
00:05:18,930 --> 00:05:22,670
And this is true for other observations
as well, not as much as for

68
00:05:22,670 --> 00:05:26,180
this observation that's clearly on
the boundary of these two clusters.

69
00:05:27,270 --> 00:05:31,830
But in summary, hopefully we see now
in pictures the difference between what

70
00:05:31,830 --> 00:05:33,380
Gives us and K means.

71
00:05:33,380 --> 00:05:37,370
And again, just to emphasize, not only is
it giving us this soft assignments but

72
00:05:37,370 --> 00:05:42,170
we also have estimates of the cluster
parameters so the shapes of these

73
00:05:42,170 --> 00:05:46,990
different clusters and their locations in
the space, not just the cluster locations.

74
00:05:48,680 --> 00:05:50,740
So let's watch this one more time and

75
00:05:50,740 --> 00:05:54,230
before we do just remember
each one of this circles or

76
00:05:54,230 --> 00:05:59,110
this little pipe charts is a data
point and the color shown in the pie

77
00:05:59,110 --> 00:06:03,140
chart represent the responsibilities
associated with that observation.

78
00:06:03,140 --> 00:06:08,340
So how much its currently associated
with anyone of the three clusters and

79
00:06:08,340 --> 00:06:09,230
our model.

80
00:06:11,150 --> 00:06:17,140
And this ellipses represents the three
different 2D Gaussians in this example and

81
00:06:17,140 --> 00:06:21,220
as we're going through these plots
it's going to show the updates to both

82
00:06:21,220 --> 00:06:25,300
the responsibilities and
our estimates of these cluster parameters.

83
00:06:27,000 --> 00:06:30,458
So here is our initialization
of the algorithm and

84
00:06:30,458 --> 00:06:35,493
then this is after one iteration,
two iterations and act convergence.

85
00:06:35,493 --> 00:06:39,729
[MUSIC]