1
00:00:00,000 --> 00:00:04,336
[MUSIC]

2
00:00:04,336 --> 00:00:08,524
And when we're thinking about using
mixture models to do clustering,

3
00:00:08,524 --> 00:00:13,150
note that they can also be used just to
do what's called density estimations.

4
00:00:13,150 --> 00:00:18,780
So estimate those types of curves over
the histograms that we drew earlier.

5
00:00:18,780 --> 00:00:22,490
But in our case, we're going to focus
in on the clustering application.

6
00:00:22,490 --> 00:00:27,060
And there, there's a really important
other variable that we're going to

7
00:00:27,060 --> 00:00:30,210
introduce, and
that is the cluster indicator,

8
00:00:30,210 --> 00:00:33,240
the assignment variable for
every one of our observations.

9
00:00:33,240 --> 00:00:39,272
So we have, this is the cluster

10
00:00:39,272 --> 00:00:45,780
assignment for observation Xi.

11
00:00:45,780 --> 00:00:50,060
So this is exactly the same variable that
we had in k-means that was assigning

12
00:00:50,060 --> 00:00:55,580
observations to clusters, but in that
case just using the cluster center.

13
00:00:55,580 --> 00:00:58,940
Okay, so let's step back and
think about what our model is saying.

14
00:00:58,940 --> 00:01:04,170
And the first question we can think about
is, what's the probability that the ith

15
00:01:04,170 --> 00:01:09,550
data point in our data set is
associated with the kth cluster?

16
00:01:09,550 --> 00:01:12,980
So for example, when we're talking
about our images we could say,

17
00:01:12,980 --> 00:01:16,450
what's the probability that
the ith image I see is,

18
00:01:16,450 --> 00:01:20,620
let's say,
in the cluster of clouds images?

19
00:01:20,620 --> 00:01:24,950
And let's talk about this before we've
actually observed what the image is.

20
00:01:24,950 --> 00:01:27,530
All we know is it's the ith
index in our data set.

21
00:01:29,140 --> 00:01:34,520
Okay, well this is fully specified
by the mixture weight pi k,

22
00:01:34,520 --> 00:01:39,540
because that tells us how prevalent
cloud images are in our data set.

23
00:01:41,290 --> 00:01:43,250
So that's given right here.

24
00:01:43,250 --> 00:01:50,170
And if we don't observe the content of the
image, then we just are caring about how

25
00:01:50,170 --> 00:01:55,180
many cloud images do we have relative to
forest images relative to sunset images.

26
00:01:55,180 --> 00:01:58,310
So we say that the prior probability,

27
00:01:58,310 --> 00:02:02,940
that the ith image is assigned
to cluster k, is given by pi k.

28
00:02:05,730 --> 00:02:10,870
Another question is, what if I know
that an image comes from cluster k?

29
00:02:10,870 --> 00:02:12,080
So I'm going to fix that.

30
00:02:12,080 --> 00:02:14,260
I already know that it's a clouds image.

31
00:02:15,600 --> 00:02:16,680
Now I can say,

32
00:02:16,680 --> 00:02:22,166
what's the likelihood of observing the RGB
vector associated with this image?

33
00:02:22,166 --> 00:02:28,040
So Xi, given that the image
came from the kth cluster,

34
00:02:28,040 --> 00:02:31,370
this cluster of cloud images.

35
00:02:33,140 --> 00:02:40,370
And in this case what we do is we simply
go to, this is the distribution of

36
00:02:42,280 --> 00:02:46,633
cloud images, or distribution of blue for

37
00:02:46,633 --> 00:02:51,810
cloud images.

38
00:02:51,810 --> 00:02:54,600
And we say okay,
let's take this one image I have.

39
00:02:54,600 --> 00:02:57,490
This is my Xi image.

40
00:02:57,490 --> 00:03:02,466
And I look at its blue intensity and
I say, under this distribution for clouds,

41
00:03:02,466 --> 00:03:04,580
how likely is it?

42
00:03:04,580 --> 00:03:06,735
Well, it's pretty likely.

43
00:03:06,735 --> 00:03:14,050
So it's reasonable to say
that this was a clouds image.

44
00:03:14,050 --> 00:03:17,420
But I can also look at
this probability under,

45
00:03:17,420 --> 00:03:19,540
remember this was the forest category.

46
00:03:21,460 --> 00:03:22,120
And I could say,

47
00:03:22,120 --> 00:03:25,190
well what's the likelihood of this
image under the forest category?

48
00:03:25,190 --> 00:03:27,910
Well, it's not that high.

49
00:03:27,910 --> 00:03:31,580
But on the other hand,
what we know is that

50
00:03:31,580 --> 00:03:36,970
there are many more forest images
in our data set than cloud images.

51
00:03:36,970 --> 00:03:40,600
So what we're going to be doing when
we're going to form our soft assignments,

52
00:03:40,600 --> 00:03:43,360
which we'll talk about
in the next section,

53
00:03:43,360 --> 00:03:46,130
is we're going to be thinking
about weighting these two terms.

54
00:03:46,130 --> 00:03:50,380
Saying, well what's the prior
probability that this image

55
00:03:50,380 --> 00:03:53,010
is form any one of these
different classes?

56
00:03:53,010 --> 00:03:58,380
So in this case,
it's most likely a forest image.

57
00:03:58,380 --> 00:04:02,240
But the I say, okay, well now I've
observed the content of this image,

58
00:04:02,240 --> 00:04:08,320
the RGB vector for this image, and
I want to say, I need to weight that in.

59
00:04:08,320 --> 00:04:16,332
And under the sunset category,
it's extremely extremely unlikely.

60
00:04:16,332 --> 00:04:19,320
There is basically zero
probability of observing

61
00:04:19,320 --> 00:04:21,410
this blue intensity value
under that category.

62
00:04:21,410 --> 00:04:26,470
So I can rule it out regardless of
what the weight is on that category.

63
00:04:26,470 --> 00:04:29,630
But for these other categories,
these other clusters,

64
00:04:29,630 --> 00:04:35,750
there's going to be some competition
between how much I'm likely to just see

65
00:04:35,750 --> 00:04:40,950
images of that type versus how
likely it is under that category.

66
00:04:40,950 --> 00:04:44,010
And we're going to use
both of these things to

67
00:04:44,010 --> 00:04:47,069
represent our uncertainty
about the cluster assignment.

68
00:04:48,170 --> 00:04:51,500
So just to circle back and make sure
we're very clear when we're looking at

69
00:04:51,500 --> 00:04:57,300
the probability of an observed RGB vector.

70
00:04:57,300 --> 00:05:02,050
RGB for image i,
given that it's in cluster k,

71
00:05:03,430 --> 00:05:09,720
then this is just a single Gaussian
with mean mu k and covariance sigma k.

72
00:05:09,720 --> 00:05:14,300
And this is referred to
as the likelihood term,

73
00:05:14,300 --> 00:05:17,150
whereas before we called
this the prior term.

74
00:05:19,104 --> 00:05:23,435
And I want to point out that this image,
indeed there should be

75
00:05:23,435 --> 00:05:28,104
uncertainty about whether it's
assigned to the clouds cluster or

76
00:05:28,104 --> 00:05:31,879
the forest cluster,
because here we see some trees.

77
00:05:34,114 --> 00:05:36,620
And here we see some clouds.

78
00:05:38,270 --> 00:05:42,761
So it'd be natural to have uncertainty
on the assignment of this image.

79
00:05:42,761 --> 00:05:46,979
[MUSIC]