1
00:00:00,000 --> 00:00:04,556
[MUSIC]

2
00:00:04,556 --> 00:00:08,589
When specifying a Gaussian distribution
over two random variables,

3
00:00:08,589 --> 00:00:13,169
we assign probability density for the pair
of random variables everywhere over

4
00:00:13,169 --> 00:00:16,546
the range of values that each
of these variables can take.

5
00:00:16,546 --> 00:00:22,330
And as in 1D, there's just a single
hump to this distribution.

6
00:00:22,330 --> 00:00:28,170
So for example,
here this would be the place

7
00:00:28,170 --> 00:00:32,000
where there's most probability.

8
00:00:34,500 --> 00:00:39,810
So it's most likely that the blue and
green random variables.

9
00:00:39,810 --> 00:00:46,210
So the intensities of blue and

10
00:00:46,210 --> 00:00:49,620
green in the images
fall within this range.

11
00:00:51,260 --> 00:00:56,440
Whatever, if you were to drop down this
point and look at the blue value and

12
00:00:56,440 --> 00:01:01,160
the green value, that would represent

13
00:01:01,160 --> 00:01:05,280
the mean of this distribution in
the most probable region because in

14
00:01:05,280 --> 00:01:09,605
a Gaussian the mean is also equivalent
to the mode of the distribution.

15
00:01:09,605 --> 00:01:14,100
And when we walk away from that mode,

16
00:01:14,100 --> 00:01:20,059
we are walking to regions
of lower probability.

17
00:01:20,059 --> 00:01:25,321
And another way to view a Gaussian
distribution in two dimensions, and

18
00:01:25,321 --> 00:01:31,109
the one that's more commonly used because
of being able to see it on 2D plots,

19
00:01:31,109 --> 00:01:34,410
is what's called a contour plot.

20
00:01:34,410 --> 00:01:38,900
And a contour plot is just a bird's
eye view of this mesh plot,

21
00:01:38,900 --> 00:01:40,370
this three-dimensional mesh plot.

22
00:01:40,370 --> 00:01:47,241
So you take this three-dimensional mesh,
you look down at it from the very top and

23
00:01:47,241 --> 00:01:52,535
you plot these curves,
these ellipses of equal probability.

24
00:01:56,516 --> 00:02:00,527
Where the coloring
represents the intensity so

25
00:02:00,527 --> 00:02:03,240
how high it was in probability.

26
00:02:03,240 --> 00:02:10,286
So again this center point is
the region of highest probability.

27
00:02:13,141 --> 00:02:18,860
And this blue shading represents low
probability just like in the mesh plot.

28
00:02:18,860 --> 00:02:23,210
So this ring would be lower probability.

29
00:02:25,880 --> 00:02:30,670
And we see lower probability as
we walk out in this direction or

30
00:02:30,670 --> 00:02:31,460
in this direction.

31
00:02:33,650 --> 00:02:38,640
Or in this direction, or
any direction moving away from the peak.

32
00:02:38,640 --> 00:02:41,480
The question is just how
rapidly it drops off.

33
00:02:41,480 --> 00:02:46,570
And it drops off most
rapidly in this direction,

34
00:02:46,570 --> 00:02:51,900
least rapidly in this direction here.

35
00:02:53,550 --> 00:02:58,510
And somewhere in between, between these
two different directions that we've shown.

36
00:02:58,510 --> 00:03:02,550
Okay, so this contour plot is going to be
our standard representation of a Gaussian

37
00:03:02,550 --> 00:03:04,580
in two dimensions.

38
00:03:04,580 --> 00:03:09,120
In two dimensions a Gaussian is fully
specified by a mean of vector and

39
00:03:09,120 --> 00:03:11,270
the covariance matrix.

40
00:03:11,270 --> 00:03:15,860
So this mean vector has elements
that center the distribution along

41
00:03:15,860 --> 00:03:16,710
every dimension.

42
00:03:16,710 --> 00:03:21,510
So for example in this case,
mu1 centers the distribution

43
00:03:21,510 --> 00:03:25,800
along the blue axis so the blue intensity.

44
00:03:25,800 --> 00:03:30,920
And mu2 centers the distribution
along the green intensity.

45
00:03:30,920 --> 00:03:36,040
And then the crosshairs pinpoint
the center of the distribution jointly

46
00:03:36,040 --> 00:03:39,690
in the blue and green intensity space.

47
00:03:39,690 --> 00:03:43,660
Then the covariance matrix
specifies the spread and

48
00:03:43,660 --> 00:03:46,370
orientation of the distribution.

49
00:03:46,370 --> 00:03:51,290
Along the diagonal of this covariance
matrix we have the variance terms for

50
00:03:51,290 --> 00:03:52,790
each of the dimensions.

51
00:03:52,790 --> 00:03:58,650
So the top left hand corner we just use
sigma blue squared, so that's the variance

52
00:03:58,650 --> 00:04:04,240
along just the blue intensity direction or
dimension of the observation vector.

53
00:04:05,950 --> 00:04:07,970
And likewise for sigma green squared,

54
00:04:07,970 --> 00:04:11,490
that's the variance just
along that one dimension.

55
00:04:11,490 --> 00:04:17,750
But then we have this other parameter,
sigma blue, green, or green,

56
00:04:17,750 --> 00:04:21,830
blue and actually those are the same
value because it's a symmetric matrix,

57
00:04:21,830 --> 00:04:24,108
that's not actually a detail
you really need to know.

58
00:04:24,108 --> 00:04:29,599
But anyway those off-diagonal
terms specify is the correlation

59
00:04:29,599 --> 00:04:37,270
structure of this distribution, so
what's the orientation of these ellipses.

60
00:04:37,270 --> 00:04:40,500
So let's look at a few examples
of covariance structures that we

61
00:04:40,500 --> 00:04:41,690
could specify.

62
00:04:41,690 --> 00:04:47,430
So one is a diagonal covariance with
equal elements along that diagonal.

63
00:04:47,430 --> 00:04:49,740
First, by specifying
a diagonal covariance,

64
00:04:49,740 --> 00:04:52,370
what we're seeing is that
there's no correlation

65
00:04:52,370 --> 00:04:57,330
between our two random variables, because
there's zeroes in the off diagonals.

66
00:04:57,330 --> 00:05:02,450
And what this means is that having
one random variable like this

67
00:05:02,450 --> 00:05:07,140
blue intensity, being high or
low doesn't relate to whether the other

68
00:05:07,140 --> 00:05:12,070
random variable like the green
intensity is high or low.

69
00:05:12,070 --> 00:05:17,790
And furthermore by having equal values of
the variances along the diagonal where

70
00:05:17,790 --> 00:05:22,470
we end up where this is circular shape to
the distribution because we are saying

71
00:05:22,470 --> 00:05:27,340
that the spread along each one of these
two dimensions is exactly the same.

72
00:05:27,340 --> 00:05:33,260
In contrast, if we were to specify
different variances along the diagonal,

73
00:05:33,260 --> 00:05:37,320
what this means is that the spread in
each of these dimensions is different and

74
00:05:37,320 --> 00:05:40,960
so what we end up with are these
axis-aligned ellipses.

75
00:05:42,910 --> 00:05:46,970
And finally, if we consider a full
covariance that allows for correlation

76
00:05:46,970 --> 00:05:53,970
between our two random variables we can
provide these non-access aligned ellipses.

77
00:05:53,970 --> 00:05:58,600
So in this example that we're showing
here, if the blue intensity is high,

78
00:05:58,600 --> 00:06:02,520
it's more likely that the green
intensity is also high.

79
00:06:02,520 --> 00:06:06,550
So these would be what are called
positively correlated random variables.

80
00:06:07,570 --> 00:06:09,880
And if the blue intensity is low,

81
00:06:09,880 --> 00:06:13,850
it's more likely that the green
intensity if also low.

82
00:06:13,850 --> 00:06:19,330
Okay well, we can define a Gaussian in any
number of dimensions, not just one or two.

83
00:06:19,330 --> 00:06:23,840
And in this case, we have a mean vector
where the number of elements in that

84
00:06:23,840 --> 00:06:27,590
vector is exactly the same as
the number of dimensions that we're in,

85
00:06:27,590 --> 00:06:32,180
the dimensionality of the random
vector x that we're looking at.

86
00:06:32,180 --> 00:06:34,910
And this covariance matrix has dimension.

87
00:06:34,910 --> 00:06:40,520
Let's say we're looking at a random
vector x that has d different dimensions

88
00:06:40,520 --> 00:06:45,430
then our covariance matrix is
going to be a d by d matrix.

89
00:06:45,430 --> 00:06:50,050
They're going to be d variance parameter
along the diagonal of that matrix.

90
00:06:50,050 --> 00:06:53,940
And then these off-diagonals are going to
capture this correlation structure amongst

91
00:06:53,940 --> 00:06:56,660
these d different random variables.

92
00:06:56,660 --> 00:07:00,450
The matrix is also something that's
called positive semidefinite.

93
00:07:00,450 --> 00:07:02,790
But again, if you don't know
what that means, that's okay.

94
00:07:02,790 --> 00:07:07,000
I just want to put that out there as
a word of warning is that there are some

95
00:07:07,000 --> 00:07:11,000
constraints on what the elements
of this matrix can be.

96
00:07:11,000 --> 00:07:14,670
Okay, but notationally, we're going to use
a very similar notation to what we used in

97
00:07:14,670 --> 00:07:15,974
the one-dimensional case.

98
00:07:15,974 --> 00:07:20,020
So we have this N of x, and then this bar.

99
00:07:20,020 --> 00:07:23,926
But now we have mu and this capital
sigma instead of just the variance term,

100
00:07:23,926 --> 00:07:25,296
little sigma, squared.

101
00:07:25,296 --> 00:07:29,909
[MUSIC]