[MUSIC] Now that we provided some background on Gaussian distributions, we can turn to a very important special case of a mixture model, and one that we're going to emphasize quite a lot in this course and in the assignment, and that's called a mixture of Gaussians. And remember that for any one of our image categories, and for any dimension of our observed vector like the blue intensity in that image, we're going to assume a Gaussian distribution to model that random variable. So for example, for forest images, if we just look at the blue intensity, then we might have a Gaussian distribution shown with the green curve here, which is centered about this value 0.42. And I want to mention here that we're actually assuming a Gaussian for the entire three-dimensional vector RGB. And that Gaussian can have correlation structure and it will have correlation structure between these different intensities, because the amount of RGB in an image tends not to be independent, especially within a given image class. But for the sake of illustrations and keeping all the drawings simple, we're just going to look at one dimension like this blue intensity here. But really, in your head, imagine these Gaussians in this higher dimensional space. Okay, but just to go back to what we were saying, we have a Gaussian per category, forest, sunsets and clouds. But remember, of course we don't have those labels of our images. We can't separate and analyze each one of those classes separately to learn the parameters of those distributions. Instead what we have is this pretty nasty space of intensities mixed over all these different images. So again, if we just look at the slice along the blue intensity value, then maybe our histogram would look like this three-humped distribution that we showed before. So this is just a histogram of our values in our data set. But we're going to take a model-based approach and somehow we want to model this distribution. So the distribution over blue intensity in our entire data set. And the question is, how are we going to do this? Well, to do this, we're going to take each one of our category-specific distributions and simply average them together. So the resulting density is going to be an average of each one of these category-specific Gaussians. But the simple averaging assumes that each one of these different categories appears in equal proportion in our data set, so we have equal numbers of sunset, cloud and forest images. But what if, for example, there are many more forest images than sunset or cloud images in our dataset? Well in that case, we'd want to more heavily weight the forest distribution in this mixture, in this average. So we would do a weighted average over these different distributions where the forest distribution gets higher weight in that average. More formally, we introduce a set of cluster weights, pi k, one for each cluster k. So in this example, where we're going to assume a model with three different clusters, which happens to correspond to the truth where we have forest, clouds and sunsets, we'd have pi 1, pi 2 and pi 3, where these weights are capturing the relative proportion of these images in our entire data set. And each one of these weights has to live between 0 and 1. And the sum of the weights across each of the clusters has to equal exactly 1. And so what this means is when we're doing our weighted average, and these are the weights that we're using in that weighted average, this is something that's called a convex combination. So remember, or I didn't actually mention this, so you might not know it, but it's a distribution, a Gaussian is a distribution. So that means, if we look at all the mask, if we integrate over that distribution over our random variable, the sum is 1, probabilities add to 1 or in the continuous space, they integrate to 1. Now, when we're looking at this mixture model, it's because we're doing a convex combination of these individual Gaussians, each of which integrates to 1, then the overall result is still going to integrate to 1, so it still provides a valid distribution. And then, finally, to complete our mixture model specification or mixtures of Gaussians in particular, not only do we have a mixture wave for each one of our different clusters, we also have a cluster-specific mean and variance term in one dimension. And remember that those means and variances specify the location and the spread of each one of the these different distributions that comprise our mixture model. And when were thinking about mixtures of Gaussians in higher dimensions, so I'm only going to go up to picture in two dimensions, not really the three-dimensional Gaussians that we would be working with for this RGB vector, but two dimensions is as much as I can manage to plot. We end up with just a generalization of the cluster parameters I talked about on a previous slide. The mixture waves are exactly the same. That doesn't change or vary with dimensionality, but now instead of just a mean and a scalar variance, we have a mean vector and covariance matrix for each one of these Gaussians. So remember these means and covariances specify not only the location but also the shape and the orientation of these different ellipses. And then when we're thinking about the mixture weights, it's pretty hard to annotate on these types of contour plots. Just think of taking each one of these distributions, which are shown in these green, fuchsia and blue colors, and then weighing them and this weight, pi k is coming out of the board, out of the slide here. And that represents how much we're emphasizing each one of these different mixture components in the overall mixture. [MUSIC]