1 00:00:00,000 --> 00:00:03,754 [MUSIC] 2 00:00:03,754 --> 00:00:06,118 So that's just the critic. 3 00:00:06,118 --> 00:00:11,407 That's just the probability a that with which probability 4 00:00:11,407 --> 00:00:15,857 will accept and move proposed by Markov. 5 00:00:15,857 --> 00:00:19,730 And that's just in the way the detail equation holds. 6 00:00:19,730 --> 00:00:25,041 So here we have the desired distribution, pi, which we want to sample from. 7 00:00:25,041 --> 00:00:29,541 And we have the Markov chain Q, which we want to fix by a critic A. 8 00:00:29,541 --> 00:00:34,037 And we want to fix this A such that this [INAUDIBLE] equation holds. 9 00:00:34,037 --> 00:00:39,296 So pi is indeed a stationary distribution for the mark of chain T, and 10 00:00:39,296 --> 00:00:45,024 that's the mark of chain T converge to this point from any starting point. 11 00:00:45,024 --> 00:00:47,920 So let's see how we can choose A such that this property holds. 12 00:00:47,920 --> 00:00:54,356 Let's, compose 13 00:00:54,356 --> 00:00:59,650 this equality and, move all the parts that depends on A in one part. 14 00:00:59,650 --> 00:01:06,466 So, A(x to x'), 15 00:01:09,067 --> 00:01:12,877 Divided by A(x' to x), 16 00:01:17,508 --> 00:01:20,103 Equals to the following equation, so we have 17 00:01:22,540 --> 00:01:26,389 On the one side we have Y of X prime. 18 00:01:30,123 --> 00:01:33,310 And the Q of X prime to x, and 19 00:01:33,310 --> 00:01:38,443 in the denominator we have the same thing but 20 00:01:38,443 --> 00:01:41,790 With swapping x and x prime. 21 00:01:44,168 --> 00:01:50,047 [SOUND] So, this is something computable right? 22 00:01:50,047 --> 00:01:51,903 We know y and q. 23 00:01:51,903 --> 00:01:54,016 So, we can compute this thing for any two values. 24 00:01:54,016 --> 00:01:56,171 Let's call this, ratio raw. 25 00:01:59,014 --> 00:02:03,642 Let's assume for a moment that this row is less than 1. 26 00:02:03,642 --> 00:02:07,522 So it's always greater than 0 because it's some ratio of probabilities. 27 00:02:07,522 --> 00:02:14,239 But let's assume that it's also less than 1. 28 00:02:14,239 --> 00:02:17,300 So it's just an assumption, it may not hold too much, let's assume that. 29 00:02:17,300 --> 00:02:19,817 In this case, we have a ratio of eights, 30 00:02:19,817 --> 00:02:23,382 which would make equal to which is less than one. 31 00:02:23,382 --> 00:02:28,493 So let's just use A of x prime to be your one 32 00:02:33,038 --> 00:02:35,194 And A ox X prime to X to be 1. 33 00:02:41,933 --> 00:02:46,017 This way the desire to holds true right, 34 00:02:46,017 --> 00:02:51,408 because A of X to X prime divided to X prime to X Will be [INAUDIBLE]. 35 00:02:51,408 --> 00:02:56,807 And what you may notice that we could have chosen here some other numbers, 36 00:02:56,807 --> 00:03:01,002 like 4 divided by 2, for example, or 1 divided by 2. 37 00:03:01,002 --> 00:03:02,682 But this will also work, 38 00:03:02,682 --> 00:03:07,018 the overall mark of chain will deconverge to the distribution. 39 00:03:07,018 --> 00:03:08,930 [INAUDIBLE]. 40 00:03:08,930 --> 00:03:13,591 Because this I think is the probability we accepts they move. 41 00:03:13,591 --> 00:03:17,868 And the way to increase probability will necessarily inject more moves. 42 00:03:17,868 --> 00:03:23,417 So we risk more capacity for our more [INAUDIBLE] which will reject. 43 00:03:23,417 --> 00:03:28,399 So this choose is the maximum acceptance probability we 44 00:03:28,399 --> 00:03:32,321 can make while keeping the ratio to be low And 45 00:03:32,321 --> 00:03:36,146 what to do if the ratio is greater than one. 46 00:03:36,146 --> 00:03:39,398 Well we can just choose to be A of X prime, 47 00:03:39,398 --> 00:03:43,727 X to X prime to be divided by one, and this P to be one N. 48 00:03:43,727 --> 00:03:48,665 So we can summarize this to cases that are always 49 00:03:48,665 --> 00:03:54,097 less than one and they're always greater than one, 50 00:03:54,097 --> 00:03:58,417 in the form of the expression A of X prime, 51 00:03:58,417 --> 00:04:04,818 of X dot prime, To be equal to minimal. 52 00:04:07,247 --> 00:04:12,857 Within two values, 1 and this ratio, 53 00:04:12,857 --> 00:04:18,467 so and pie of X prime times Q of X prime dot 54 00:04:18,467 --> 00:04:28,021 X And 55 00:04:28,021 --> 00:04:33,252 the same thing while changing the parameters x to x prime and vice versa. 56 00:04:40,295 --> 00:04:42,940 So this expression works both for 57 00:04:42,940 --> 00:04:48,044 the case where the ratio is less than than 1 and greater than 1. 58 00:04:48,044 --> 00:04:51,633 You can check it by considering two cases. 59 00:04:51,633 --> 00:04:54,907 And in one case it will be just 1 because it's. 60 00:04:54,907 --> 00:04:58,335 Minimum between 1 and something greater than 1. 61 00:04:58,335 --> 00:05:01,447 In other case it will be wrong. 62 00:05:01,447 --> 00:05:10,040 So this and by choosing this critic we indeed can prove that the overall mark of 63 00:05:10,040 --> 00:05:15,169 defined by this kind of procedure will indeed. 64 00:05:15,169 --> 00:05:18,070 Follow this equation. 65 00:05:18,070 --> 00:05:22,457 So the distribution probably will be stationary for our Markov chain, and 66 00:05:22,457 --> 00:05:27,065 that we will convert to distribution for any starting point. 67 00:05:27,065 --> 00:05:33,811 Last, by choosing this critic, we may sample from the distribution y. 68 00:05:33,811 --> 00:05:38,724 To summarise the metropolis Hastings approach to building 69 00:05:38,724 --> 00:05:43,251 chain that converges to the desired distribution pi. 70 00:05:43,251 --> 00:05:48,127 We have to start with sum of mac chain q, which we don't have anything to 71 00:05:48,127 --> 00:05:53,004 do with the distribution pie and then this mark of chain on each step will 72 00:05:53,004 --> 00:05:58,321 propose symbols and we'll have a correct with sometimes rejects this move. 73 00:05:58,321 --> 00:06:01,922 And ask the mark of chain to stay where it was at the various straight. 74 00:06:01,922 --> 00:06:06,807 And we just proved that if you defined your coreject to be like this, 75 00:06:06,807 --> 00:06:12,294 if your acceptance from bill will be this expression minimum between one and 76 00:06:12,294 --> 00:06:13,252 some ratio. 77 00:06:13,252 --> 00:06:17,885 Then no matter from which Markov chain Q is started with, 78 00:06:17,885 --> 00:06:23,106 you will necessarily converge to the desired distribution pi. 79 00:06:23,106 --> 00:06:26,954 Note that this acceptance probability, it's the only place 80 00:06:26,954 --> 00:06:30,951 where we use our distribution pi, which we want to sample from. 81 00:06:30,951 --> 00:06:35,700 And note that we don't have to know this distribution pi exactly. 82 00:06:35,700 --> 00:06:38,422 We may know it up to normalization constant. 83 00:06:38,422 --> 00:06:43,337 because here, we'll have a ratio between our distribution at two different points. 84 00:06:43,337 --> 00:06:46,816 It doesn't matter how we normalize this distribution. 85 00:06:46,816 --> 00:06:49,997 The ratio will not care about the normalization, 86 00:06:49,997 --> 00:06:52,798 because if we divide each of the parts of this 87 00:06:52,798 --> 00:06:56,826 ratio by the normalization constant z, nothing will change. 88 00:06:56,826 --> 00:07:02,022 So, this method, as well as the Gibbs sampling, can be used when 89 00:07:02,022 --> 00:07:07,412 you don't know the normalization constant of your distribution. 90 00:07:07,412 --> 00:07:13,210 We have discussed that we can choose q to be anything in distribution number one. 91 00:07:13,210 --> 00:07:16,876 Well obviously, the Q should be nonzero anywhere, so 92 00:07:16,876 --> 00:07:19,199 the ratio will be well-defined. 93 00:07:19,199 --> 00:07:21,947 But of course, the efficiency and 94 00:07:21,947 --> 00:07:27,082 the properties of your algorithm will depend on the choice of Q. 95 00:07:27,082 --> 00:07:29,120 There are kind of two opposing forces here. 96 00:07:29,120 --> 00:07:33,234 So first of all, Q should make large steps, 97 00:07:33,234 --> 00:07:39,242 it should spread out and kind of roll freely in the sample space, 98 00:07:39,242 --> 00:07:44,933 to make the samples less correlated and to converge faster. 99 00:07:44,933 --> 00:07:50,636 On the other hand, if you have too large steps, your critic will reject them, too 100 00:07:50,636 --> 00:07:56,354 often, and you will waste your capacity by staying at the same place for too long. 101 00:07:56,354 --> 00:08:00,297 Because if you're now, for example, already converged, and 102 00:08:00,297 --> 00:08:05,262 you're in the high-density region, then if you're Q proposes all this to make 103 00:08:05,262 --> 00:08:10,008 too-large moves, then you will move outside your high density region, and 104 00:08:10,008 --> 00:08:12,513 the will not be happy with that. 105 00:08:12,513 --> 00:08:18,428 It will say I don't want to go there, the probability distribution 106 00:08:18,428 --> 00:08:23,326 curve ball doesn't work, isn't high in that region. 107 00:08:23,326 --> 00:08:24,640 So in the next video, 108 00:08:24,640 --> 00:08:29,611 we'll see some small demo of how this approach can work in One dimensional case. 109 00:08:29,611 --> 00:08:39,611 [SOUND]