1 00:00:00,000 --> 00:00:04,556 [MUSIC] 2 00:00:04,556 --> 00:00:08,589 When specifying a Gaussian distribution over two random variables, 3 00:00:08,589 --> 00:00:13,169 we assign probability density for the pair of random variables everywhere over 4 00:00:13,169 --> 00:00:16,546 the range of values that each of these variables can take. 5 00:00:16,546 --> 00:00:22,330 And as in 1D, there's just a single hump to this distribution. 6 00:00:22,330 --> 00:00:28,170 So for example, here this would be the place 7 00:00:28,170 --> 00:00:32,000 where there's most probability. 8 00:00:34,500 --> 00:00:39,810 So it's most likely that the blue and green random variables. 9 00:00:39,810 --> 00:00:46,210 So the intensities of blue and 10 00:00:46,210 --> 00:00:49,620 green in the images fall within this range. 11 00:00:51,260 --> 00:00:56,440 Whatever, if you were to drop down this point and look at the blue value and 12 00:00:56,440 --> 00:01:01,160 the green value, that would represent 13 00:01:01,160 --> 00:01:05,280 the mean of this distribution in the most probable region because in 14 00:01:05,280 --> 00:01:09,605 a Gaussian the mean is also equivalent to the mode of the distribution. 15 00:01:09,605 --> 00:01:14,100 And when we walk away from that mode, 16 00:01:14,100 --> 00:01:20,059 we are walking to regions of lower probability. 17 00:01:20,059 --> 00:01:25,321 And another way to view a Gaussian distribution in two dimensions, and 18 00:01:25,321 --> 00:01:31,109 the one that's more commonly used because of being able to see it on 2D plots, 19 00:01:31,109 --> 00:01:34,410 is what's called a contour plot. 20 00:01:34,410 --> 00:01:38,900 And a contour plot is just a bird's eye view of this mesh plot, 21 00:01:38,900 --> 00:01:40,370 this three-dimensional mesh plot. 22 00:01:40,370 --> 00:01:47,241 So you take this three-dimensional mesh, you look down at it from the very top and 23 00:01:47,241 --> 00:01:52,535 you plot these curves, these ellipses of equal probability. 24 00:01:56,516 --> 00:02:00,527 Where the coloring represents the intensity so 25 00:02:00,527 --> 00:02:03,240 how high it was in probability. 26 00:02:03,240 --> 00:02:10,286 So again this center point is the region of highest probability. 27 00:02:13,141 --> 00:02:18,860 And this blue shading represents low probability just like in the mesh plot. 28 00:02:18,860 --> 00:02:23,210 So this ring would be lower probability. 29 00:02:25,880 --> 00:02:30,670 And we see lower probability as we walk out in this direction or 30 00:02:30,670 --> 00:02:31,460 in this direction. 31 00:02:33,650 --> 00:02:38,640 Or in this direction, or any direction moving away from the peak. 32 00:02:38,640 --> 00:02:41,480 The question is just how rapidly it drops off. 33 00:02:41,480 --> 00:02:46,570 And it drops off most rapidly in this direction, 34 00:02:46,570 --> 00:02:51,900 least rapidly in this direction here. 35 00:02:53,550 --> 00:02:58,510 And somewhere in between, between these two different directions that we've shown. 36 00:02:58,510 --> 00:03:02,550 Okay, so this contour plot is going to be our standard representation of a Gaussian 37 00:03:02,550 --> 00:03:04,580 in two dimensions. 38 00:03:04,580 --> 00:03:09,120 In two dimensions a Gaussian is fully specified by a mean of vector and 39 00:03:09,120 --> 00:03:11,270 the covariance matrix. 40 00:03:11,270 --> 00:03:15,860 So this mean vector has elements that center the distribution along 41 00:03:15,860 --> 00:03:16,710 every dimension. 42 00:03:16,710 --> 00:03:21,510 So for example in this case, mu1 centers the distribution 43 00:03:21,510 --> 00:03:25,800 along the blue axis so the blue intensity. 44 00:03:25,800 --> 00:03:30,920 And mu2 centers the distribution along the green intensity. 45 00:03:30,920 --> 00:03:36,040 And then the crosshairs pinpoint the center of the distribution jointly 46 00:03:36,040 --> 00:03:39,690 in the blue and green intensity space. 47 00:03:39,690 --> 00:03:43,660 Then the covariance matrix specifies the spread and 48 00:03:43,660 --> 00:03:46,370 orientation of the distribution. 49 00:03:46,370 --> 00:03:51,290 Along the diagonal of this covariance matrix we have the variance terms for 50 00:03:51,290 --> 00:03:52,790 each of the dimensions. 51 00:03:52,790 --> 00:03:58,650 So the top left hand corner we just use sigma blue squared, so that's the variance 52 00:03:58,650 --> 00:04:04,240 along just the blue intensity direction or dimension of the observation vector. 53 00:04:05,950 --> 00:04:07,970 And likewise for sigma green squared, 54 00:04:07,970 --> 00:04:11,490 that's the variance just along that one dimension. 55 00:04:11,490 --> 00:04:17,750 But then we have this other parameter, sigma blue, green, or green, 56 00:04:17,750 --> 00:04:21,830 blue and actually those are the same value because it's a symmetric matrix, 57 00:04:21,830 --> 00:04:24,108 that's not actually a detail you really need to know. 58 00:04:24,108 --> 00:04:29,599 But anyway those off-diagonal terms specify is the correlation 59 00:04:29,599 --> 00:04:37,270 structure of this distribution, so what's the orientation of these ellipses. 60 00:04:37,270 --> 00:04:40,500 So let's look at a few examples of covariance structures that we 61 00:04:40,500 --> 00:04:41,690 could specify. 62 00:04:41,690 --> 00:04:47,430 So one is a diagonal covariance with equal elements along that diagonal. 63 00:04:47,430 --> 00:04:49,740 First, by specifying a diagonal covariance, 64 00:04:49,740 --> 00:04:52,370 what we're seeing is that there's no correlation 65 00:04:52,370 --> 00:04:57,330 between our two random variables, because there's zeroes in the off diagonals. 66 00:04:57,330 --> 00:05:02,450 And what this means is that having one random variable like this 67 00:05:02,450 --> 00:05:07,140 blue intensity, being high or low doesn't relate to whether the other 68 00:05:07,140 --> 00:05:12,070 random variable like the green intensity is high or low. 69 00:05:12,070 --> 00:05:17,790 And furthermore by having equal values of the variances along the diagonal where 70 00:05:17,790 --> 00:05:22,470 we end up where this is circular shape to the distribution because we are saying 71 00:05:22,470 --> 00:05:27,340 that the spread along each one of these two dimensions is exactly the same. 72 00:05:27,340 --> 00:05:33,260 In contrast, if we were to specify different variances along the diagonal, 73 00:05:33,260 --> 00:05:37,320 what this means is that the spread in each of these dimensions is different and 74 00:05:37,320 --> 00:05:40,960 so what we end up with are these axis-aligned ellipses. 75 00:05:42,910 --> 00:05:46,970 And finally, if we consider a full covariance that allows for correlation 76 00:05:46,970 --> 00:05:53,970 between our two random variables we can provide these non-access aligned ellipses. 77 00:05:53,970 --> 00:05:58,600 So in this example that we're showing here, if the blue intensity is high, 78 00:05:58,600 --> 00:06:02,520 it's more likely that the green intensity is also high. 79 00:06:02,520 --> 00:06:06,550 So these would be what are called positively correlated random variables. 80 00:06:07,570 --> 00:06:09,880 And if the blue intensity is low, 81 00:06:09,880 --> 00:06:13,850 it's more likely that the green intensity if also low. 82 00:06:13,850 --> 00:06:19,330 Okay well, we can define a Gaussian in any number of dimensions, not just one or two. 83 00:06:19,330 --> 00:06:23,840 And in this case, we have a mean vector where the number of elements in that 84 00:06:23,840 --> 00:06:27,590 vector is exactly the same as the number of dimensions that we're in, 85 00:06:27,590 --> 00:06:32,180 the dimensionality of the random vector x that we're looking at. 86 00:06:32,180 --> 00:06:34,910 And this covariance matrix has dimension. 87 00:06:34,910 --> 00:06:40,520 Let's say we're looking at a random vector x that has d different dimensions 88 00:06:40,520 --> 00:06:45,430 then our covariance matrix is going to be a d by d matrix. 89 00:06:45,430 --> 00:06:50,050 They're going to be d variance parameter along the diagonal of that matrix. 90 00:06:50,050 --> 00:06:53,940 And then these off-diagonals are going to capture this correlation structure amongst 91 00:06:53,940 --> 00:06:56,660 these d different random variables. 92 00:06:56,660 --> 00:07:00,450 The matrix is also something that's called positive semidefinite. 93 00:07:00,450 --> 00:07:02,790 But again, if you don't know what that means, that's okay. 94 00:07:02,790 --> 00:07:07,000 I just want to put that out there as a word of warning is that there are some 95 00:07:07,000 --> 00:07:11,000 constraints on what the elements of this matrix can be. 96 00:07:11,000 --> 00:07:14,670 Okay, but notationally, we're going to use a very similar notation to what we used in 97 00:07:14,670 --> 00:07:15,974 the one-dimensional case. 98 00:07:15,974 --> 00:07:20,020 So we have this N of x, and then this bar. 99 00:07:20,020 --> 00:07:23,926 But now we have mu and this capital sigma instead of just the variance term, 100 00:07:23,926 --> 00:07:25,296 little sigma, squared. 101 00:07:25,296 --> 00:07:29,909 [MUSIC]