[MUSIC] So that's just the critic. That's just the probability a that with which probability will accept and move proposed by Markov. And that's just in the way the detail equation holds. So here we have the desired distribution, pi, which we want to sample from. And we have the Markov chain Q, which we want to fix by a critic A. And we want to fix this A such that this [INAUDIBLE] equation holds. So pi is indeed a stationary distribution for the mark of chain T, and that's the mark of chain T converge to this point from any starting point. So let's see how we can choose A such that this property holds. Let's, compose this equality and, move all the parts that depends on A in one part. So, A(x to x'), Divided by A(x' to x), Equals to the following equation, so we have On the one side we have Y of X prime. And the Q of X prime to x, and in the denominator we have the same thing but With swapping x and x prime. [SOUND] So, this is something computable right? We know y and q. So, we can compute this thing for any two values. Let's call this, ratio raw. Let's assume for a moment that this row is less than 1. So it's always greater than 0 because it's some ratio of probabilities. But let's assume that it's also less than 1. So it's just an assumption, it may not hold too much, let's assume that. In this case, we have a ratio of eights, which would make equal to which is less than one. So let's just use A of x prime to be your one And A ox X prime to X to be 1. This way the desire to holds true right, because A of X to X prime divided to X prime to X Will be [INAUDIBLE]. And what you may notice that we could have chosen here some other numbers, like 4 divided by 2, for example, or 1 divided by 2. But this will also work, the overall mark of chain will deconverge to the distribution. [INAUDIBLE]. Because this I think is the probability we accepts they move. And the way to increase probability will necessarily inject more moves. So we risk more capacity for our more [INAUDIBLE] which will reject. So this choose is the maximum acceptance probability we can make while keeping the ratio to be low And what to do if the ratio is greater than one. Well we can just choose to be A of X prime, X to X prime to be divided by one, and this P to be one N. So we can summarize this to cases that are always less than one and they're always greater than one, in the form of the expression A of X prime, of X dot prime, To be equal to minimal. Within two values, 1 and this ratio, so and pie of X prime times Q of X prime dot X And the same thing while changing the parameters x to x prime and vice versa. So this expression works both for the case where the ratio is less than than 1 and greater than 1. You can check it by considering two cases. And in one case it will be just 1 because it's. Minimum between 1 and something greater than 1. In other case it will be wrong. So this and by choosing this critic we indeed can prove that the overall mark of defined by this kind of procedure will indeed. Follow this equation. So the distribution probably will be stationary for our Markov chain, and that we will convert to distribution for any starting point. Last, by choosing this critic, we may sample from the distribution y. To summarise the metropolis Hastings approach to building chain that converges to the desired distribution pi. We have to start with sum of mac chain q, which we don't have anything to do with the distribution pie and then this mark of chain on each step will propose symbols and we'll have a correct with sometimes rejects this move. And ask the mark of chain to stay where it was at the various straight. And we just proved that if you defined your coreject to be like this, if your acceptance from bill will be this expression minimum between one and some ratio. Then no matter from which Markov chain Q is started with, you will necessarily converge to the desired distribution pi. Note that this acceptance probability, it's the only place where we use our distribution pi, which we want to sample from. And note that we don't have to know this distribution pi exactly. We may know it up to normalization constant. because here, we'll have a ratio between our distribution at two different points. It doesn't matter how we normalize this distribution. The ratio will not care about the normalization, because if we divide each of the parts of this ratio by the normalization constant z, nothing will change. So, this method, as well as the Gibbs sampling, can be used when you don't know the normalization constant of your distribution. We have discussed that we can choose q to be anything in distribution number one. Well obviously, the Q should be nonzero anywhere, so the ratio will be well-defined. But of course, the efficiency and the properties of your algorithm will depend on the choice of Q. There are kind of two opposing forces here. So first of all, Q should make large steps, it should spread out and kind of roll freely in the sample space, to make the samples less correlated and to converge faster. On the other hand, if you have too large steps, your critic will reject them, too often, and you will waste your capacity by staying at the same place for too long. Because if you're now, for example, already converged, and you're in the high-density region, then if you're Q proposes all this to make too-large moves, then you will move outside your high density region, and the will not be happy with that. It will say I don't want to go there, the probability distribution curve ball doesn't work, isn't high in that region. So in the next video, we'll see some small demo of how this approach can work in One dimensional case. [SOUND]