[MUSIC] When specifying a Gaussian distribution over two random variables, we assign probability density for the pair of random variables everywhere over the range of values that each of these variables can take. And as in 1D, there's just a single hump to this distribution. So for example, here this would be the place where there's most probability. So it's most likely that the blue and green random variables. So the intensities of blue and green in the images fall within this range. Whatever, if you were to drop down this point and look at the blue value and the green value, that would represent the mean of this distribution in the most probable region because in a Gaussian the mean is also equivalent to the mode of the distribution. And when we walk away from that mode, we are walking to regions of lower probability. And another way to view a Gaussian distribution in two dimensions, and the one that's more commonly used because of being able to see it on 2D plots, is what's called a contour plot. And a contour plot is just a bird's eye view of this mesh plot, this three-dimensional mesh plot. So you take this three-dimensional mesh, you look down at it from the very top and you plot these curves, these ellipses of equal probability. Where the coloring represents the intensity so how high it was in probability. So again this center point is the region of highest probability. And this blue shading represents low probability just like in the mesh plot. So this ring would be lower probability. And we see lower probability as we walk out in this direction or in this direction. Or in this direction, or any direction moving away from the peak. The question is just how rapidly it drops off. And it drops off most rapidly in this direction, least rapidly in this direction here. And somewhere in between, between these two different directions that we've shown. Okay, so this contour plot is going to be our standard representation of a Gaussian in two dimensions. In two dimensions a Gaussian is fully specified by a mean of vector and the covariance matrix. So this mean vector has elements that center the distribution along every dimension. So for example in this case, mu1 centers the distribution along the blue axis so the blue intensity. And mu2 centers the distribution along the green intensity. And then the crosshairs pinpoint the center of the distribution jointly in the blue and green intensity space. Then the covariance matrix specifies the spread and orientation of the distribution. Along the diagonal of this covariance matrix we have the variance terms for each of the dimensions. So the top left hand corner we just use sigma blue squared, so that's the variance along just the blue intensity direction or dimension of the observation vector. And likewise for sigma green squared, that's the variance just along that one dimension. But then we have this other parameter, sigma blue, green, or green, blue and actually those are the same value because it's a symmetric matrix, that's not actually a detail you really need to know. But anyway those off-diagonal terms specify is the correlation structure of this distribution, so what's the orientation of these ellipses. So let's look at a few examples of covariance structures that we could specify. So one is a diagonal covariance with equal elements along that diagonal. First, by specifying a diagonal covariance, what we're seeing is that there's no correlation between our two random variables, because there's zeroes in the off diagonals. And what this means is that having one random variable like this blue intensity, being high or low doesn't relate to whether the other random variable like the green intensity is high or low. And furthermore by having equal values of the variances along the diagonal where we end up where this is circular shape to the distribution because we are saying that the spread along each one of these two dimensions is exactly the same. In contrast, if we were to specify different variances along the diagonal, what this means is that the spread in each of these dimensions is different and so what we end up with are these axis-aligned ellipses. And finally, if we consider a full covariance that allows for correlation between our two random variables we can provide these non-access aligned ellipses. So in this example that we're showing here, if the blue intensity is high, it's more likely that the green intensity is also high. So these would be what are called positively correlated random variables. And if the blue intensity is low, it's more likely that the green intensity if also low. Okay well, we can define a Gaussian in any number of dimensions, not just one or two. And in this case, we have a mean vector where the number of elements in that vector is exactly the same as the number of dimensions that we're in, the dimensionality of the random vector x that we're looking at. And this covariance matrix has dimension. Let's say we're looking at a random vector x that has d different dimensions then our covariance matrix is going to be a d by d matrix. They're going to be d variance parameter along the diagonal of that matrix. And then these off-diagonals are going to capture this correlation structure amongst these d different random variables. The matrix is also something that's called positive semidefinite. But again, if you don't know what that means, that's okay. I just want to put that out there as a word of warning is that there are some constraints on what the elements of this matrix can be. Okay, but notationally, we're going to use a very similar notation to what we used in the one-dimensional case. So we have this N of x, and then this bar. But now we have mu and this capital sigma instead of just the variance term, little sigma, squared. [MUSIC]