1
00:00:00,000 --> 00:00:04,546
[MUSIC]

2
00:00:04,546 --> 00:00:08,962
In this video, we will derive the formulas
for training Gaussian process.

3
00:00:08,962 --> 00:00:11,504
So here's our set up again.

4
00:00:11,504 --> 00:00:17,087
We have the probability of our new
point given all previous points.

5
00:00:17,087 --> 00:00:21,155
And it equals to the ratio
between the joint probability and

6
00:00:21,155 --> 00:00:23,362
the conditional probability.

7
00:00:23,362 --> 00:00:28,629
So we have here the denominator, the joint
probability over the known points.

8
00:00:28,629 --> 00:00:33,005
And in the numerator we have the known
points and the unknown point.

9
00:00:33,005 --> 00:00:37,296
Let me denote the unknown point as f star.

10
00:00:37,296 --> 00:00:41,671
So this would be equal to f star.

11
00:00:41,671 --> 00:00:46,962
This factor here would be f.

12
00:00:46,962 --> 00:00:49,477
And so as we saw in the previous video,

13
00:00:49,477 --> 00:00:53,337
we'll have the ratio between
two normal distributions.

14
00:00:53,337 --> 00:00:58,159
The one in the numerator would
have the f star and f vector,

15
00:00:58,159 --> 00:01:02,798
given the mean 0 and
the covariance matrix as follows.

16
00:01:02,798 --> 00:01:05,464
And the same would be in the denominator.

17
00:01:05,464 --> 00:01:09,838
We'll normal distribution over f.

18
00:01:09,838 --> 00:01:12,921
The mean is 0 and
the convergence matrix is C.

19
00:01:12,921 --> 00:01:15,380
All right, let's write them down.

20
00:01:15,380 --> 00:01:20,620
So it will be proportional

21
00:01:20,620 --> 00:01:27,296
to exponent of minus one-half.

22
00:01:27,296 --> 00:01:31,289
This virtual transposed,

23
00:01:31,289 --> 00:01:36,148
it would be f star, f transpose,

24
00:01:37,363 --> 00:01:40,671
convergence matrix.

25
00:01:44,461 --> 00:01:49,073
K tanspose K, C,

26
00:01:49,073 --> 00:01:55,225
inverse and this vector

27
00:01:55,225 --> 00:02:00,462
again, so f star f.

28
00:02:00,462 --> 00:02:05,297
So this is a term from the numerator, and
we have also a term from the denominator.

29
00:02:05,297 --> 00:02:06,879
We'll have plus,

30
00:02:11,004 --> 00:02:16,304
Plus f transposed

31
00:02:16,304 --> 00:02:20,546
C inversed f.

32
00:02:20,546 --> 00:02:26,462
All right,
now let's see what this term equals to.

33
00:02:26,462 --> 00:02:34,837
So we'll have exponent of minus
one-half times the following thing.

34
00:02:34,837 --> 00:02:39,754
We'll have to see what is
the inverse matrix for this term.

35
00:02:39,754 --> 00:02:44,087
Let's write it down as
some arbitrary matrix.

36
00:02:44,087 --> 00:02:49,274
We'll have components d,

37
00:02:49,274 --> 00:02:53,337
b transpose, b, A.

38
00:02:53,337 --> 00:02:57,013
So this is just a inverse
matrix of this matrix, but

39
00:02:57,013 --> 00:02:59,671
we don't know the components of it.

40
00:02:59,671 --> 00:03:04,171
So we can plug it in here, and we'll have,

41
00:03:06,629 --> 00:03:12,629
D times f star squared.

42
00:03:17,378 --> 00:03:21,379
Plus 2 times,

43
00:03:23,420 --> 00:03:30,797
We'll have b transposed
f times the f star.

44
00:03:33,421 --> 00:03:37,962
And we also have some constants
that not depend on f star.

45
00:03:37,962 --> 00:03:41,171
So let me write down some const.

46
00:03:46,212 --> 00:03:50,837
All right,
now we have to take the full squared.

47
00:03:50,837 --> 00:03:56,341
So it would be exponent of -1 over,

48
00:03:56,341 --> 00:03:59,891
let's take this term,

49
00:03:59,891 --> 00:04:04,337
d here, out of the brackets.

50
00:04:04,337 --> 00:04:07,671
So we'll put it here.

51
00:04:07,671 --> 00:04:15,046
This would be 1 over 2d power -1.

52
00:04:15,046 --> 00:04:20,773
We'll have f star here,

53
00:04:20,773 --> 00:04:28,961
plus b transposed f over d squared.

54
00:04:28,961 --> 00:04:34,462
And times some multiplicative constant,
let me write down here as proportional to.

55
00:04:36,755 --> 00:04:41,905
Okay, so from this we can easily see that

56
00:04:41,905 --> 00:04:47,212
the mean value is -b transposed f over d.

57
00:04:47,212 --> 00:04:53,753
So mu equals 2- b transposed f over d.

58
00:04:53,753 --> 00:04:55,837
And the variance is simply d.

59
00:05:00,504 --> 00:05:03,046
So those are our formulas.

60
00:05:05,961 --> 00:05:13,492
And the result would be the normal
distribution over f star,

61
00:05:13,492 --> 00:05:20,296
given mean mu from here,
and variance sigma squared.

62
00:05:20,296 --> 00:05:24,775
All right, so we haven't finished yet,
since we have d and

63
00:05:24,775 --> 00:05:28,462
b here and
we don't know the values for them yet.

64
00:05:28,462 --> 00:05:33,062
So those are just some
parameters that should be

65
00:05:33,062 --> 00:05:37,088
equal to the inverse matrix of this term.

66
00:05:37,088 --> 00:05:38,420
So let's derive them.

67
00:05:45,921 --> 00:05:50,610
Okay, so the simplest way to
derive them is to remember,

68
00:05:50,610 --> 00:05:54,226
since this is an inverse of this matrix,
and

69
00:05:54,226 --> 00:05:58,056
then their product would
be identity matrix.

70
00:05:58,056 --> 00:06:01,503
So we have k of 0.

71
00:06:01,503 --> 00:06:08,798
K transposed K,
C trans inverse in this form.

72
00:06:08,798 --> 00:06:12,100
So it is d b transposed b A,

73
00:06:12,100 --> 00:06:17,212
should be equal to the identity matrix.

74
00:06:17,212 --> 00:06:21,155
So we'll have scalar 1 here,

75
00:06:21,155 --> 00:06:27,380
Two vectors of 0 here,
and indent matrix here.

76
00:06:27,380 --> 00:06:31,629
All right, so
actually we have many equations here.

77
00:06:31,629 --> 00:06:39,041
We can multiply this matrices explicitly,

78
00:06:39,041 --> 00:06:44,046
so we'll have K of 0 times d +

79
00:06:44,046 --> 00:06:48,671
k transpose b equals to 1.

80
00:06:48,671 --> 00:06:54,813
We have K of 0 b transpose

81
00:06:54,813 --> 00:07:02,129
+ K transpose A equals to 0.

82
00:07:02,129 --> 00:07:06,107
And two more terms,

83
00:07:06,107 --> 00:07:12,191
those are kd + cb equals to 0,

84
00:07:12,191 --> 00:07:17,575
and finally kb transpose +

85
00:07:17,575 --> 00:07:23,671
CA equals to identity matrix.

86
00:07:23,671 --> 00:07:28,193
All right, so we're interested
in values for b and d, and

87
00:07:28,193 --> 00:07:30,879
we don't have to find A actually.

88
00:07:30,879 --> 00:07:33,047
So let's see what we have then.

89
00:07:33,047 --> 00:07:38,848
So in this term, we have b here and

90
00:07:38,848 --> 00:07:42,796
d here, b, b, d, b.

91
00:07:42,796 --> 00:07:46,218
All right, actually we need to find
the two equations from those two,

92
00:07:46,218 --> 00:07:47,088
from those four.

93
00:07:47,088 --> 00:07:51,503
We need this equation
since it has both d and b.

94
00:07:51,503 --> 00:07:55,087
And this one,
since also we have here d and b.

95
00:07:55,087 --> 00:07:58,779
And we can notice here that the number
of equations equals the number of known

96
00:07:58,779 --> 00:07:59,480
parameters.

97
00:07:59,480 --> 00:08:03,548
So we can hope that we
will be able to solve it.

98
00:08:03,548 --> 00:08:08,671
So let's start from the second equation.

99
00:08:08,671 --> 00:08:16,026
We can see that from this equation,

100
00:08:16,026 --> 00:08:21,671
b equals to -c inverse kd.

101
00:08:21,671 --> 00:08:25,796
We can plug in b from this formula here.

102
00:08:25,796 --> 00:08:30,889
And then we'll have k of 0 times

103
00:08:30,889 --> 00:08:37,879
d plus k transpose c
inverse kd equals to 1.

104
00:08:39,630 --> 00:08:46,963
So from this formula,
we can see that, We have d here.

105
00:08:46,963 --> 00:08:52,487
And so the value for d would be,

106
00:08:52,487 --> 00:08:57,161
d equal to 1 over K of 0 + k

107
00:08:57,161 --> 00:09:01,422
transpose c inverse k.

108
00:09:04,128 --> 00:09:08,754
The final step is to plug in
the value of d from here to b.

109
00:09:08,754 --> 00:09:13,196
So we'll have b equal to

110
00:09:13,196 --> 00:09:18,081
-c inverse k over K of 0 +

111
00:09:18,081 --> 00:09:22,753
k transpose C inverse k.

112
00:09:22,753 --> 00:09:28,128
Now we need to plug in d and b into
the formulas for mean and the variance.

113
00:09:28,128 --> 00:09:30,421
So the variance would be the inverse of d.

114
00:09:30,421 --> 00:09:35,426
So it would be K of 0 + k

115
00:09:35,426 --> 00:09:40,963
transposed C inverse k.

116
00:09:40,963 --> 00:09:44,608
And the mean would be, so

117
00:09:44,608 --> 00:09:50,171
we have b here, we should write down.

118
00:09:50,171 --> 00:09:54,312
I'm just going to write it
down more carefully, so

119
00:09:54,312 --> 00:09:57,046
it would be b transpose f over d.

120
00:09:57,046 --> 00:10:03,912
So b transposed is
simply k transpose c -1,

121
00:10:03,912 --> 00:10:09,151
since c is metric, times f over K of

122
00:10:09,151 --> 00:10:15,131
0 + k transpose C inverse k and, or d.

123
00:10:15,131 --> 00:10:20,060
So it would be K of 0 + k

124
00:10:20,060 --> 00:10:25,255
transpose C inverse k.

125
00:10:25,255 --> 00:10:30,212
And here we have two similar terms,
we can cancel them out.

126
00:10:30,212 --> 00:10:33,755
So these two.

127
00:10:33,755 --> 00:10:38,712
And so, finally we have that
the mean value equals to,

128
00:10:40,921 --> 00:10:45,296
K transposed c inverse f.

129
00:10:45,296 --> 00:10:48,504
And so, those are our final formulas.

130
00:10:48,504 --> 00:10:52,796
So sigma squared equals to this term.

131
00:10:52,796 --> 00:11:00,921
The variance K of 0 + k
transpose C inverse k.

132
00:11:00,921 --> 00:11:07,129
And for
the mean we have k transpose C inverse f.

133
00:11:07,129 --> 00:11:17,129
[MUSIC]