1 00:00:00,000 --> 00:00:04,760 [MUSIC] 2 00:00:04,760 --> 00:00:08,180 Now that we provided some background on Gaussian distributions, 3 00:00:08,180 --> 00:00:11,599 we can turn to a very important special case of a mixture model, and 4 00:00:11,599 --> 00:00:15,749 one that we're going to emphasize quite a lot in this course and in the assignment, 5 00:00:15,749 --> 00:00:18,030 and that's called a mixture of Gaussians. 6 00:00:19,040 --> 00:00:22,890 And remember that for any one of our image categories, and for 7 00:00:22,890 --> 00:00:27,910 any dimension of our observed vector like the blue intensity in that image, we're 8 00:00:27,910 --> 00:00:32,270 going to assume a Gaussian distribution to model that random variable. 9 00:00:33,280 --> 00:00:38,420 So for example, for forest images, if we just look at the blue intensity, 10 00:00:38,420 --> 00:00:42,850 then we might have a Gaussian distribution shown with the green curve here, 11 00:00:42,850 --> 00:00:45,741 which is centered about this value 0.42. 12 00:00:45,741 --> 00:00:51,166 And I want to mention here that we're actually assuming a Gaussian for 13 00:00:51,166 --> 00:00:54,732 the entire three-dimensional vector RGB. 14 00:00:54,732 --> 00:00:58,918 And that Gaussian can have correlation structure and 15 00:00:58,918 --> 00:01:04,872 it will have correlation structure between these different intensities, 16 00:01:04,872 --> 00:01:09,988 because the amount of RGB in an image tends not to be independent, 17 00:01:09,988 --> 00:01:13,164 especially within a given image class. 18 00:01:13,164 --> 00:01:17,364 But for the sake of illustrations and keeping all the drawings simple, 19 00:01:17,364 --> 00:01:21,590 we're just going to look at one dimension like this blue intensity here. 20 00:01:21,590 --> 00:01:22,160 But really, 21 00:01:22,160 --> 00:01:27,240 in your head, imagine these Gaussians in this higher dimensional space. 22 00:01:28,310 --> 00:01:33,390 Okay, but just to go back to what we were saying, we have a Gaussian per 23 00:01:33,390 --> 00:01:37,990 category, forest, sunsets and clouds. 24 00:01:39,740 --> 00:01:43,220 But remember, of course we don't have those labels of our images. 25 00:01:43,220 --> 00:01:47,340 We can't separate and analyze each one of those classes separately 26 00:01:47,340 --> 00:01:50,220 to learn the parameters of those distributions. 27 00:01:50,220 --> 00:01:54,539 Instead what we have is this pretty nasty space of 28 00:01:54,539 --> 00:01:59,095 intensities mixed over all these different images. 29 00:01:59,095 --> 00:02:03,790 So again, if we just look at the slice along the blue intensity value, 30 00:02:03,790 --> 00:02:06,870 then maybe our histogram would look like this three-humped 31 00:02:06,870 --> 00:02:09,330 distribution that we showed before. 32 00:02:09,330 --> 00:02:13,400 So this is just a histogram of our values in our data set. 33 00:02:13,400 --> 00:02:15,960 But we're going to take a model-based approach and 34 00:02:15,960 --> 00:02:18,390 somehow we want to model this distribution. 35 00:02:18,390 --> 00:02:24,240 So the distribution over blue intensity in our entire data set. 36 00:02:24,240 --> 00:02:26,370 And the question is, how are we going to do this? 37 00:02:27,890 --> 00:02:32,380 Well, to do this, we're going to take each one of our category-specific 38 00:02:32,380 --> 00:02:35,640 distributions and simply average them together. 39 00:02:37,270 --> 00:02:41,467 So the resulting density is going to be an average of each one of these 40 00:02:41,467 --> 00:02:43,695 category-specific Gaussians. 41 00:02:43,695 --> 00:02:48,505 But the simple averaging assumes that each one of these different categories 42 00:02:48,505 --> 00:02:51,465 appears in equal proportion in our data set, so 43 00:02:51,465 --> 00:02:55,480 we have equal numbers of sunset, cloud and forest images. 44 00:02:55,480 --> 00:03:00,330 But what if, for example, there are many more forest images than sunset or 45 00:03:00,330 --> 00:03:02,410 cloud images in our dataset? 46 00:03:02,410 --> 00:03:05,320 Well in that case, we'd want to more heavily weight 47 00:03:06,570 --> 00:03:10,290 the forest distribution in this mixture, in this average. 48 00:03:10,290 --> 00:03:14,470 So we would do a weighted average over these different distributions 49 00:03:14,470 --> 00:03:18,950 where the forest distribution gets higher weight in that average. 50 00:03:20,350 --> 00:03:21,130 More formally, 51 00:03:21,130 --> 00:03:26,080 we introduce a set of cluster weights, pi k, one for each cluster k. 52 00:03:28,380 --> 00:03:32,620 So in this example, where we're going to assume a model with three different 53 00:03:32,620 --> 00:03:38,070 clusters, which happens to correspond to the truth where we have forest, clouds and 54 00:03:38,070 --> 00:03:43,730 sunsets, we'd have pi 1, pi 2 and pi 3, where these weights 55 00:03:43,730 --> 00:03:49,340 are capturing the relative proportion of these images in our entire data set. 56 00:03:49,340 --> 00:03:53,770 And each one of these weights has to live between 0 and 1. 57 00:03:53,770 --> 00:04:00,760 And the sum of the weights across each of the clusters has to equal exactly 1. 58 00:04:00,760 --> 00:04:04,391 And so what this means is when we're doing our weighted average, and 59 00:04:04,391 --> 00:04:07,898 these are the weights that we're using in that weighted average, 60 00:04:07,898 --> 00:04:10,924 this is something that's called a convex combination. 61 00:04:10,924 --> 00:04:15,247 So remember, or I didn't actually mention this, so you might not know it, but 62 00:04:15,247 --> 00:04:18,550 it's a distribution, a Gaussian is a distribution. 63 00:04:18,550 --> 00:04:21,532 So that means, if we look at all the mask, 64 00:04:21,532 --> 00:04:27,330 if we integrate over that distribution over our random variable, the sum is 1, 65 00:04:27,330 --> 00:04:32,727 probabilities add to 1 or in the continuous space, they integrate to 1. 66 00:04:32,727 --> 00:04:36,329 Now, when we're looking at this mixture model, 67 00:04:36,329 --> 00:04:42,251 it's because we're doing a convex combination of these individual Gaussians, 68 00:04:42,251 --> 00:04:46,882 each of which integrates to 1, then the overall result is still 69 00:04:46,882 --> 00:04:51,803 going to integrate to 1, so it still provides a valid distribution. 70 00:04:51,803 --> 00:04:55,529 And then, finally, to complete our mixture model specification or 71 00:04:55,529 --> 00:04:59,576 mixtures of Gaussians in particular, not only do we have a mixture wave for 72 00:04:59,576 --> 00:05:03,818 each one of our different clusters, we also have a cluster-specific mean and 73 00:05:03,818 --> 00:05:05,640 variance term in one dimension. 74 00:05:06,860 --> 00:05:10,630 And remember that those means and variances specify the location and 75 00:05:10,630 --> 00:05:14,440 the spread of each one of the these different distributions 76 00:05:14,440 --> 00:05:16,280 that comprise our mixture model. 77 00:05:17,470 --> 00:05:20,500 And when were thinking about mixtures of Gaussians in higher dimensions, so 78 00:05:20,500 --> 00:05:22,800 I'm only going to go up to picture in two dimensions, 79 00:05:22,800 --> 00:05:27,890 not really the three-dimensional Gaussians that we would be working with for 80 00:05:27,890 --> 00:05:33,360 this RGB vector, but two dimensions is as much as I can manage to plot. 81 00:05:33,360 --> 00:05:37,140 We end up with just a generalization of the cluster parameters I talked about on 82 00:05:37,140 --> 00:05:37,690 a previous slide. 83 00:05:37,690 --> 00:05:39,650 The mixture waves are exactly the same. 84 00:05:39,650 --> 00:05:42,680 That doesn't change or vary with dimensionality, but 85 00:05:42,680 --> 00:05:47,350 now instead of just a mean and a scalar variance, we have a mean vector and 86 00:05:47,350 --> 00:05:50,430 covariance matrix for each one of these Gaussians. 87 00:05:50,430 --> 00:05:55,760 So remember these means and covariances specify not only 88 00:05:55,760 --> 00:06:01,780 the location but also the shape and the orientation of these different ellipses. 89 00:06:01,780 --> 00:06:05,090 And then when we're thinking about the mixture weights, 90 00:06:05,090 --> 00:06:09,090 it's pretty hard to annotate on these types of contour plots. 91 00:06:09,090 --> 00:06:13,830 Just think of taking each one of these distributions, which are shown in these 92 00:06:13,830 --> 00:06:18,480 green, fuchsia and blue colors, and then weighing them and 93 00:06:18,480 --> 00:06:23,530 this weight, pi k is coming out of the board, out of the slide here. 94 00:06:23,530 --> 00:06:28,060 And that represents how much we're emphasizing each one of these different 95 00:06:28,060 --> 00:06:30,761 mixture components in the overall mixture. 96 00:06:30,761 --> 00:06:35,059 [MUSIC]