1 00:00:00,000 --> 00:00:04,546 [MUSIC] 2 00:00:04,546 --> 00:00:08,962 In this video, we will derive the formulas for training Gaussian process. 3 00:00:08,962 --> 00:00:11,504 So here's our set up again. 4 00:00:11,504 --> 00:00:17,087 We have the probability of our new point given all previous points. 5 00:00:17,087 --> 00:00:21,155 And it equals to the ratio between the joint probability and 6 00:00:21,155 --> 00:00:23,362 the conditional probability. 7 00:00:23,362 --> 00:00:28,629 So we have here the denominator, the joint probability over the known points. 8 00:00:28,629 --> 00:00:33,005 And in the numerator we have the known points and the unknown point. 9 00:00:33,005 --> 00:00:37,296 Let me denote the unknown point as f star. 10 00:00:37,296 --> 00:00:41,671 So this would be equal to f star. 11 00:00:41,671 --> 00:00:46,962 This factor here would be f. 12 00:00:46,962 --> 00:00:49,477 And so as we saw in the previous video, 13 00:00:49,477 --> 00:00:53,337 we'll have the ratio between two normal distributions. 14 00:00:53,337 --> 00:00:58,159 The one in the numerator would have the f star and f vector, 15 00:00:58,159 --> 00:01:02,798 given the mean 0 and the covariance matrix as follows. 16 00:01:02,798 --> 00:01:05,464 And the same would be in the denominator. 17 00:01:05,464 --> 00:01:09,838 We'll normal distribution over f. 18 00:01:09,838 --> 00:01:12,921 The mean is 0 and the convergence matrix is C. 19 00:01:12,921 --> 00:01:15,380 All right, let's write them down. 20 00:01:15,380 --> 00:01:20,620 So it will be proportional 21 00:01:20,620 --> 00:01:27,296 to exponent of minus one-half. 22 00:01:27,296 --> 00:01:31,289 This virtual transposed, 23 00:01:31,289 --> 00:01:36,148 it would be f star, f transpose, 24 00:01:37,363 --> 00:01:40,671 convergence matrix. 25 00:01:44,461 --> 00:01:49,073 K tanspose K, C, 26 00:01:49,073 --> 00:01:55,225 inverse and this vector 27 00:01:55,225 --> 00:02:00,462 again, so f star f. 28 00:02:00,462 --> 00:02:05,297 So this is a term from the numerator, and we have also a term from the denominator. 29 00:02:05,297 --> 00:02:06,879 We'll have plus, 30 00:02:11,004 --> 00:02:16,304 Plus f transposed 31 00:02:16,304 --> 00:02:20,546 C inversed f. 32 00:02:20,546 --> 00:02:26,462 All right, now let's see what this term equals to. 33 00:02:26,462 --> 00:02:34,837 So we'll have exponent of minus one-half times the following thing. 34 00:02:34,837 --> 00:02:39,754 We'll have to see what is the inverse matrix for this term. 35 00:02:39,754 --> 00:02:44,087 Let's write it down as some arbitrary matrix. 36 00:02:44,087 --> 00:02:49,274 We'll have components d, 37 00:02:49,274 --> 00:02:53,337 b transpose, b, A. 38 00:02:53,337 --> 00:02:57,013 So this is just a inverse matrix of this matrix, but 39 00:02:57,013 --> 00:02:59,671 we don't know the components of it. 40 00:02:59,671 --> 00:03:04,171 So we can plug it in here, and we'll have, 41 00:03:06,629 --> 00:03:12,629 D times f star squared. 42 00:03:17,378 --> 00:03:21,379 Plus 2 times, 43 00:03:23,420 --> 00:03:30,797 We'll have b transposed f times the f star. 44 00:03:33,421 --> 00:03:37,962 And we also have some constants that not depend on f star. 45 00:03:37,962 --> 00:03:41,171 So let me write down some const. 46 00:03:46,212 --> 00:03:50,837 All right, now we have to take the full squared. 47 00:03:50,837 --> 00:03:56,341 So it would be exponent of -1 over, 48 00:03:56,341 --> 00:03:59,891 let's take this term, 49 00:03:59,891 --> 00:04:04,337 d here, out of the brackets. 50 00:04:04,337 --> 00:04:07,671 So we'll put it here. 51 00:04:07,671 --> 00:04:15,046 This would be 1 over 2d power -1. 52 00:04:15,046 --> 00:04:20,773 We'll have f star here, 53 00:04:20,773 --> 00:04:28,961 plus b transposed f over d squared. 54 00:04:28,961 --> 00:04:34,462 And times some multiplicative constant, let me write down here as proportional to. 55 00:04:36,755 --> 00:04:41,905 Okay, so from this we can easily see that 56 00:04:41,905 --> 00:04:47,212 the mean value is -b transposed f over d. 57 00:04:47,212 --> 00:04:53,753 So mu equals 2- b transposed f over d. 58 00:04:53,753 --> 00:04:55,837 And the variance is simply d. 59 00:05:00,504 --> 00:05:03,046 So those are our formulas. 60 00:05:05,961 --> 00:05:13,492 And the result would be the normal distribution over f star, 61 00:05:13,492 --> 00:05:20,296 given mean mu from here, and variance sigma squared. 62 00:05:20,296 --> 00:05:24,775 All right, so we haven't finished yet, since we have d and 63 00:05:24,775 --> 00:05:28,462 b here and we don't know the values for them yet. 64 00:05:28,462 --> 00:05:33,062 So those are just some parameters that should be 65 00:05:33,062 --> 00:05:37,088 equal to the inverse matrix of this term. 66 00:05:37,088 --> 00:05:38,420 So let's derive them. 67 00:05:45,921 --> 00:05:50,610 Okay, so the simplest way to derive them is to remember, 68 00:05:50,610 --> 00:05:54,226 since this is an inverse of this matrix, and 69 00:05:54,226 --> 00:05:58,056 then their product would be identity matrix. 70 00:05:58,056 --> 00:06:01,503 So we have k of 0. 71 00:06:01,503 --> 00:06:08,798 K transposed K, C trans inverse in this form. 72 00:06:08,798 --> 00:06:12,100 So it is d b transposed b A, 73 00:06:12,100 --> 00:06:17,212 should be equal to the identity matrix. 74 00:06:17,212 --> 00:06:21,155 So we'll have scalar 1 here, 75 00:06:21,155 --> 00:06:27,380 Two vectors of 0 here, and indent matrix here. 76 00:06:27,380 --> 00:06:31,629 All right, so actually we have many equations here. 77 00:06:31,629 --> 00:06:39,041 We can multiply this matrices explicitly, 78 00:06:39,041 --> 00:06:44,046 so we'll have K of 0 times d + 79 00:06:44,046 --> 00:06:48,671 k transpose b equals to 1. 80 00:06:48,671 --> 00:06:54,813 We have K of 0 b transpose 81 00:06:54,813 --> 00:07:02,129 + K transpose A equals to 0. 82 00:07:02,129 --> 00:07:06,107 And two more terms, 83 00:07:06,107 --> 00:07:12,191 those are kd + cb equals to 0, 84 00:07:12,191 --> 00:07:17,575 and finally kb transpose + 85 00:07:17,575 --> 00:07:23,671 CA equals to identity matrix. 86 00:07:23,671 --> 00:07:28,193 All right, so we're interested in values for b and d, and 87 00:07:28,193 --> 00:07:30,879 we don't have to find A actually. 88 00:07:30,879 --> 00:07:33,047 So let's see what we have then. 89 00:07:33,047 --> 00:07:38,848 So in this term, we have b here and 90 00:07:38,848 --> 00:07:42,796 d here, b, b, d, b. 91 00:07:42,796 --> 00:07:46,218 All right, actually we need to find the two equations from those two, 92 00:07:46,218 --> 00:07:47,088 from those four. 93 00:07:47,088 --> 00:07:51,503 We need this equation since it has both d and b. 94 00:07:51,503 --> 00:07:55,087 And this one, since also we have here d and b. 95 00:07:55,087 --> 00:07:58,779 And we can notice here that the number of equations equals the number of known 96 00:07:58,779 --> 00:07:59,480 parameters. 97 00:07:59,480 --> 00:08:03,548 So we can hope that we will be able to solve it. 98 00:08:03,548 --> 00:08:08,671 So let's start from the second equation. 99 00:08:08,671 --> 00:08:16,026 We can see that from this equation, 100 00:08:16,026 --> 00:08:21,671 b equals to -c inverse kd. 101 00:08:21,671 --> 00:08:25,796 We can plug in b from this formula here. 102 00:08:25,796 --> 00:08:30,889 And then we'll have k of 0 times 103 00:08:30,889 --> 00:08:37,879 d plus k transpose c inverse kd equals to 1. 104 00:08:39,630 --> 00:08:46,963 So from this formula, we can see that, We have d here. 105 00:08:46,963 --> 00:08:52,487 And so the value for d would be, 106 00:08:52,487 --> 00:08:57,161 d equal to 1 over K of 0 + k 107 00:08:57,161 --> 00:09:01,422 transpose c inverse k. 108 00:09:04,128 --> 00:09:08,754 The final step is to plug in the value of d from here to b. 109 00:09:08,754 --> 00:09:13,196 So we'll have b equal to 110 00:09:13,196 --> 00:09:18,081 -c inverse k over K of 0 + 111 00:09:18,081 --> 00:09:22,753 k transpose C inverse k. 112 00:09:22,753 --> 00:09:28,128 Now we need to plug in d and b into the formulas for mean and the variance. 113 00:09:28,128 --> 00:09:30,421 So the variance would be the inverse of d. 114 00:09:30,421 --> 00:09:35,426 So it would be K of 0 + k 115 00:09:35,426 --> 00:09:40,963 transposed C inverse k. 116 00:09:40,963 --> 00:09:44,608 And the mean would be, so 117 00:09:44,608 --> 00:09:50,171 we have b here, we should write down. 118 00:09:50,171 --> 00:09:54,312 I'm just going to write it down more carefully, so 119 00:09:54,312 --> 00:09:57,046 it would be b transpose f over d. 120 00:09:57,046 --> 00:10:03,912 So b transposed is simply k transpose c -1, 121 00:10:03,912 --> 00:10:09,151 since c is metric, times f over K of 122 00:10:09,151 --> 00:10:15,131 0 + k transpose C inverse k and, or d. 123 00:10:15,131 --> 00:10:20,060 So it would be K of 0 + k 124 00:10:20,060 --> 00:10:25,255 transpose C inverse k. 125 00:10:25,255 --> 00:10:30,212 And here we have two similar terms, we can cancel them out. 126 00:10:30,212 --> 00:10:33,755 So these two. 127 00:10:33,755 --> 00:10:38,712 And so, finally we have that the mean value equals to, 128 00:10:40,921 --> 00:10:45,296 K transposed c inverse f. 129 00:10:45,296 --> 00:10:48,504 And so, those are our final formulas. 130 00:10:48,504 --> 00:10:52,796 So sigma squared equals to this term. 131 00:10:52,796 --> 00:11:00,921 The variance K of 0 + k transpose C inverse k. 132 00:11:00,921 --> 00:11:07,129 And for the mean we have k transpose C inverse f. 133 00:11:07,129 --> 00:11:17,129 [MUSIC]