1
00:00:00,000 --> 00:00:04,760
[MUSIC]

2
00:00:04,760 --> 00:00:08,180
Now that we provided some background
on Gaussian distributions,

3
00:00:08,180 --> 00:00:11,599
we can turn to a very important
special case of a mixture model, and

4
00:00:11,599 --> 00:00:15,749
one that we're going to emphasize quite a
lot in this course and in the assignment,

5
00:00:15,749 --> 00:00:18,030
and that's called a mixture of Gaussians.

6
00:00:19,040 --> 00:00:22,890
And remember that for
any one of our image categories, and for

7
00:00:22,890 --> 00:00:27,910
any dimension of our observed vector like
the blue intensity in that image, we're

8
00:00:27,910 --> 00:00:32,270
going to assume a Gaussian distribution
to model that random variable.

9
00:00:33,280 --> 00:00:38,420
So for example, for forest images,
if we just look at the blue intensity,

10
00:00:38,420 --> 00:00:42,850
then we might have a Gaussian distribution
shown with the green curve here,

11
00:00:42,850 --> 00:00:45,741
which is centered about this value 0.42.

12
00:00:45,741 --> 00:00:51,166
And I want to mention here that we're
actually assuming a Gaussian for

13
00:00:51,166 --> 00:00:54,732
the entire three-dimensional vector RGB.

14
00:00:54,732 --> 00:00:58,918
And that Gaussian can have
correlation structure and

15
00:00:58,918 --> 00:01:04,872
it will have correlation structure
between these different intensities,

16
00:01:04,872 --> 00:01:09,988
because the amount of RGB in an image
tends not to be independent,

17
00:01:09,988 --> 00:01:13,164
especially within a given image class.

18
00:01:13,164 --> 00:01:17,364
But for the sake of illustrations and
keeping all the drawings simple,

19
00:01:17,364 --> 00:01:21,590
we're just going to look at one
dimension like this blue intensity here.

20
00:01:21,590 --> 00:01:22,160
But really,

21
00:01:22,160 --> 00:01:27,240
in your head, imagine these Gaussians
in this higher dimensional space.

22
00:01:28,310 --> 00:01:33,390
Okay, but just to go back to what we
were saying, we have a Gaussian per

23
00:01:33,390 --> 00:01:37,990
category, forest, sunsets and clouds.

24
00:01:39,740 --> 00:01:43,220
But remember, of course we don't
have those labels of our images.

25
00:01:43,220 --> 00:01:47,340
We can't separate and analyze each
one of those classes separately

26
00:01:47,340 --> 00:01:50,220
to learn the parameters
of those distributions.

27
00:01:50,220 --> 00:01:54,539
Instead what we have is
this pretty nasty space of

28
00:01:54,539 --> 00:01:59,095
intensities mixed over all
these different images.

29
00:01:59,095 --> 00:02:03,790
So again, if we just look at the slice
along the blue intensity value,

30
00:02:03,790 --> 00:02:06,870
then maybe our histogram would
look like this three-humped

31
00:02:06,870 --> 00:02:09,330
distribution that we showed before.

32
00:02:09,330 --> 00:02:13,400
So this is just a histogram of
our values in our data set.

33
00:02:13,400 --> 00:02:15,960
But we're going to take
a model-based approach and

34
00:02:15,960 --> 00:02:18,390
somehow we want to model
this distribution.

35
00:02:18,390 --> 00:02:24,240
So the distribution over blue
intensity in our entire data set.

36
00:02:24,240 --> 00:02:26,370
And the question is,
how are we going to do this?

37
00:02:27,890 --> 00:02:32,380
Well, to do this, we're going to take
each one of our category-specific

38
00:02:32,380 --> 00:02:35,640
distributions and
simply average them together.

39
00:02:37,270 --> 00:02:41,467
So the resulting density is going to
be an average of each one of these

40
00:02:41,467 --> 00:02:43,695
category-specific Gaussians.

41
00:02:43,695 --> 00:02:48,505
But the simple averaging assumes that
each one of these different categories

42
00:02:48,505 --> 00:02:51,465
appears in equal proportion
in our data set, so

43
00:02:51,465 --> 00:02:55,480
we have equal numbers of sunset,
cloud and forest images.

44
00:02:55,480 --> 00:03:00,330
But what if, for example, there are many
more forest images than sunset or

45
00:03:00,330 --> 00:03:02,410
cloud images in our dataset?

46
00:03:02,410 --> 00:03:05,320
Well in that case,
we'd want to more heavily weight

47
00:03:06,570 --> 00:03:10,290
the forest distribution in this mixture,
in this average.

48
00:03:10,290 --> 00:03:14,470
So we would do a weighted average
over these different distributions

49
00:03:14,470 --> 00:03:18,950
where the forest distribution gets
higher weight in that average.

50
00:03:20,350 --> 00:03:21,130
More formally,

51
00:03:21,130 --> 00:03:26,080
we introduce a set of cluster weights,
pi k, one for each cluster k.

52
00:03:28,380 --> 00:03:32,620
So in this example, where we're going
to assume a model with three different

53
00:03:32,620 --> 00:03:38,070
clusters, which happens to correspond to
the truth where we have forest, clouds and

54
00:03:38,070 --> 00:03:43,730
sunsets, we'd have pi 1, pi 2 and
pi 3, where these weights

55
00:03:43,730 --> 00:03:49,340
are capturing the relative proportion
of these images in our entire data set.

56
00:03:49,340 --> 00:03:53,770
And each one of these weights
has to live between 0 and 1.

57
00:03:53,770 --> 00:04:00,760
And the sum of the weights across each
of the clusters has to equal exactly 1.

58
00:04:00,760 --> 00:04:04,391
And so what this means is when we're
doing our weighted average, and

59
00:04:04,391 --> 00:04:07,898
these are the weights that we're
using in that weighted average,

60
00:04:07,898 --> 00:04:10,924
this is something that's
called a convex combination.

61
00:04:10,924 --> 00:04:15,247
So remember, or I didn't actually mention
this, so you might not know it, but

62
00:04:15,247 --> 00:04:18,550
it's a distribution,
a Gaussian is a distribution.

63
00:04:18,550 --> 00:04:21,532
So that means, if we look at all the mask,

64
00:04:21,532 --> 00:04:27,330
if we integrate over that distribution
over our random variable, the sum is 1,

65
00:04:27,330 --> 00:04:32,727
probabilities add to 1 or in
the continuous space, they integrate to 1.

66
00:04:32,727 --> 00:04:36,329
Now, when we're looking
at this mixture model,

67
00:04:36,329 --> 00:04:42,251
it's because we're doing a convex
combination of these individual Gaussians,

68
00:04:42,251 --> 00:04:46,882
each of which integrates to 1,
then the overall result is still

69
00:04:46,882 --> 00:04:51,803
going to integrate to 1, so
it still provides a valid distribution.

70
00:04:51,803 --> 00:04:55,529
And then, finally, to complete our
mixture model specification or

71
00:04:55,529 --> 00:04:59,576
mixtures of Gaussians in particular,
not only do we have a mixture wave for

72
00:04:59,576 --> 00:05:03,818
each one of our different clusters,
we also have a cluster-specific mean and

73
00:05:03,818 --> 00:05:05,640
variance term in one dimension.

74
00:05:06,860 --> 00:05:10,630
And remember that those means and
variances specify the location and

75
00:05:10,630 --> 00:05:14,440
the spread of each one of
the these different distributions

76
00:05:14,440 --> 00:05:16,280
that comprise our mixture model.

77
00:05:17,470 --> 00:05:20,500
And when were thinking about mixtures
of Gaussians in higher dimensions, so

78
00:05:20,500 --> 00:05:22,800
I'm only going to go up to
picture in two dimensions,

79
00:05:22,800 --> 00:05:27,890
not really the three-dimensional Gaussians
that we would be working with for

80
00:05:27,890 --> 00:05:33,360
this RGB vector, but two dimensions
is as much as I can manage to plot.

81
00:05:33,360 --> 00:05:37,140
We end up with just a generalization of
the cluster parameters I talked about on

82
00:05:37,140 --> 00:05:37,690
a previous slide.

83
00:05:37,690 --> 00:05:39,650
The mixture waves are exactly the same.

84
00:05:39,650 --> 00:05:42,680
That doesn't change or
vary with dimensionality, but

85
00:05:42,680 --> 00:05:47,350
now instead of just a mean and a scalar
variance, we have a mean vector and

86
00:05:47,350 --> 00:05:50,430
covariance matrix for
each one of these Gaussians.

87
00:05:50,430 --> 00:05:55,760
So remember these means and
covariances specify not only

88
00:05:55,760 --> 00:06:01,780
the location but also the shape and the
orientation of these different ellipses.

89
00:06:01,780 --> 00:06:05,090
And then when we're thinking
about the mixture weights,

90
00:06:05,090 --> 00:06:09,090
it's pretty hard to annotate on
these types of contour plots.

91
00:06:09,090 --> 00:06:13,830
Just think of taking each one of these
distributions, which are shown in these

92
00:06:13,830 --> 00:06:18,480
green, fuchsia and blue colors,
and then weighing them and

93
00:06:18,480 --> 00:06:23,530
this weight, pi k is coming out of
the board, out of the slide here.

94
00:06:23,530 --> 00:06:28,060
And that represents how much we're
emphasizing each one of these different

95
00:06:28,060 --> 00:06:30,761
mixture components in the overall mixture.

96
00:06:30,761 --> 00:06:35,059
[MUSIC]