1 00:00:03,870 --> 00:00:08,380 In the previous video we discussed one way 2 00:00:08,380 --> 00:00:11,930 to build a Markov chains that converged to a zag distribution, 3 00:00:11,930 --> 00:00:14,135 namely the Gibbs sampling. 4 00:00:14,135 --> 00:00:17,260 And we have discussed that it has some downsides. 5 00:00:17,260 --> 00:00:20,425 It produces highly correlated samples 6 00:00:20,425 --> 00:00:25,315 and it converts somewhat slowly to the distribution. 7 00:00:25,315 --> 00:00:28,705 And both of these downsides came from one property, 8 00:00:28,705 --> 00:00:33,150 that Gibbs is doing kind of local and small steps in the sample space. 9 00:00:33,150 --> 00:00:37,893 In this video, we're going to discuss some other scheme to build Markov chains, 10 00:00:37,893 --> 00:00:41,690 which instead of producing just one predefined Markov chain, 11 00:00:41,690 --> 00:00:43,510 will give you a whole family of Markov chains. 12 00:00:43,510 --> 00:00:47,435 And you can choose the one that has the desired properties like, 13 00:00:47,435 --> 00:00:53,085 it converges faster and or maybe it produces less correlated samples. 14 00:00:53,085 --> 00:00:58,900 This family of techniques is called Metropolis-Hastings and the idea is 15 00:00:58,900 --> 00:01:06,285 to apply the rejection sampling idea, two Markov chains. 16 00:01:06,285 --> 00:01:09,400 So let's start with 17 00:01:09,400 --> 00:01:15,830 sum Markov chain which maybe doesn't have anything to do with the desired distribution B. 18 00:01:15,830 --> 00:01:18,415 But on the bright side, 19 00:01:18,415 --> 00:01:23,055 this Markov chain Q is kind of through freely in the sample space. 20 00:01:23,055 --> 00:01:27,325 It's doing very large steps and thus it can converge 21 00:01:27,325 --> 00:01:31,843 faster and it produces almost uncorrelated samples. 22 00:01:31,843 --> 00:01:35,193 It's a nice Markov chain but it doesn't produce what we want, 23 00:01:35,193 --> 00:01:37,070 dissembles from the distribution B. 24 00:01:37,070 --> 00:01:39,745 And now let's introduce a critic. 25 00:01:39,745 --> 00:01:42,325 And a critic of something that says, 26 00:01:42,325 --> 00:01:45,870 wait a minute you can go there right now, 27 00:01:45,870 --> 00:01:53,310 so please stay where you are and wait until you generate some reasonable direction to go. 28 00:01:53,310 --> 00:01:58,330 And the critic make sure that the Markov chain we have doesn't 29 00:01:58,330 --> 00:02:04,765 go in some regions where the probability of our desired distribution B is low. 30 00:02:04,765 --> 00:02:07,405 Well, at least it doesn't go there too often. 31 00:02:07,405 --> 00:02:10,265 And we will accept the proposal, 32 00:02:10,265 --> 00:02:15,790 the generated sample from the transition probabilities of the Markov chain Q with 33 00:02:15,790 --> 00:02:19,085 probability given by the critic and otherwise 34 00:02:19,085 --> 00:02:24,175 we will object and will stay where we were at the previous equation. 35 00:02:24,175 --> 00:02:28,370 So, this is the general idea and 36 00:02:28,370 --> 00:02:32,810 in the following part we will 37 00:02:32,810 --> 00:02:35,270 discuss how to build such a critic for 38 00:02:35,270 --> 00:02:38,635 a given Markov chain Q and the desired distribution B. 39 00:02:38,635 --> 00:02:45,190 So first let's discuss how this overall scheme will look like in terms of a Markov chain. 40 00:02:45,190 --> 00:02:52,760 So we had some Markov chain Q and we changed its procedure to generate points. 41 00:02:52,760 --> 00:02:57,335 And so now this overall scheme generates its points like this. 42 00:02:57,335 --> 00:03:02,885 It transitions from the point X to X prime with probability. 43 00:03:02,885 --> 00:03:07,442 Well, first of all we need to generate this X prime from the proposal distribution Q 44 00:03:07,442 --> 00:03:12,425 from the first Markov chain Q and then we have to accept this point. 45 00:03:12,425 --> 00:03:15,450 And this happens with probability given by the critic. 46 00:03:15,450 --> 00:03:16,726 So A of X to X prime. 47 00:03:16,726 --> 00:03:21,140 This is true for all points that are not equal. 48 00:03:21,140 --> 00:03:24,145 So if X not equal to X prime, 49 00:03:24,145 --> 00:03:29,490 and if they're equal we have something a little bit more complicated because we again 50 00:03:29,490 --> 00:03:35,605 could have proposed to move to X and then accepted that this proposal. 51 00:03:35,605 --> 00:03:40,720 Or we can propose to move anywhere else and then reject that proposal, 52 00:03:40,720 --> 00:03:45,855 which happens with probability one minus the probability given by the critic, 53 00:03:45,855 --> 00:03:48,816 one is accepted in probability. 54 00:03:48,816 --> 00:03:53,320 So, this is the transitional probability of overall Markov chain defined by 55 00:03:53,320 --> 00:03:59,705 the process where we propose some points and then accept them with some probability. 56 00:03:59,705 --> 00:04:04,570 And now we want our Markov chain to generate points from the desired distribution. 57 00:04:04,570 --> 00:04:10,560 So, we want some distribution point to be stationary for this Markov chain 58 00:04:10,560 --> 00:04:13,710 T. And the idea here is 59 00:04:13,710 --> 00:04:18,820 to let alone the choice of the underlying command of Markov chain Q. 60 00:04:18,820 --> 00:04:24,400 So give this freedom to the user of the method and try to choose A, 61 00:04:24,400 --> 00:04:29,565 such that this quota holds. 62 00:04:29,565 --> 00:04:30,785 So we have as input, 63 00:04:30,785 --> 00:04:35,120 the Markov chain Q and the desired distribution pie. 64 00:04:35,120 --> 00:04:36,925 And you want to choose the critic 65 00:04:36,925 --> 00:04:46,146 such the distribution pie is indeed stationary for the final Markov chain G. 66 00:04:46,146 --> 00:04:49,600 How can we do it? 67 00:04:49,600 --> 00:04:53,840 Well, it's not easy because this equation is kind of complicated. 68 00:04:53,840 --> 00:04:57,300 It has summation inside and this complicated definition of 69 00:04:57,300 --> 00:05:00,930 the transition probability T. So, 70 00:05:00,930 --> 00:05:05,835 to solve this problem we'll have to introduce one more concept, 71 00:05:05,835 --> 00:05:08,320 which is called the Detailed Balance equation. 72 00:05:08,320 --> 00:05:13,540 So, recall the definition of the stationary distribution, it's like this. 73 00:05:13,540 --> 00:05:18,075 And Detailed balance equation is the following equation, 74 00:05:18,075 --> 00:05:22,570 which gives you a sufficient condition for the stationary distribution. 75 00:05:22,570 --> 00:05:28,905 So, if the detailed balance holds then the distribution pie isn't necessarily 76 00:05:28,905 --> 00:05:32,170 a stationary one for 77 00:05:32,170 --> 00:05:38,770 the transition probability D. And what does the Detailed Balance equation says us. 78 00:05:38,770 --> 00:05:44,115 Well, it says that if we started from the distribution pie and made one step, 79 00:05:44,115 --> 00:05:45,580 one timestep of our Markov chain G, 80 00:05:45,580 --> 00:05:49,230 then the probability that we started 81 00:05:49,230 --> 00:05:53,375 from X and move to X prime is the same as doing the reverse. 82 00:05:53,375 --> 00:05:57,275 Is the same as starting from X prime and moving to X. 83 00:05:57,275 --> 00:05:59,685 And if it's true, then the probability distribution pie is stationary for 84 00:05:59,685 --> 00:06:06,480 G. And it's kind of easy to prove so let's just write down 85 00:06:06,480 --> 00:06:08,910 the right hand side of the definition of 86 00:06:08,910 --> 00:06:12,930 the stationary distribution and then substitute inside 87 00:06:12,930 --> 00:06:16,455 that definition substitutes by 88 00:06:16,455 --> 00:06:20,940 effects times the transition probability by the first equation, 89 00:06:20,940 --> 00:06:23,085 by the detailed balance equation. 90 00:06:23,085 --> 00:06:26,990 And thus we'll get a summation is the X of 91 00:06:26,990 --> 00:06:33,230 this expression and the point expressed doesn't depend on x at all. 92 00:06:33,230 --> 00:06:40,075 So we can move pie of X prime outside the summation to get something with this. 93 00:06:40,075 --> 00:06:44,170 And finally summation of T of X brand X, 94 00:06:44,170 --> 00:06:50,020 with respect to X is just one because this is some kind of probability distribution. 95 00:06:50,020 --> 00:06:54,995 And if we sum out all possible outcomes, it will be one. 96 00:06:54,995 --> 00:06:58,660 So, finally we have proven that if the Detailed Balance 97 00:06:58,660 --> 00:07:03,740 holds then this solution isn't necessarily a stationary one. 98 00:07:03,740 --> 00:07:07,150 So, returning to our problem. 99 00:07:07,150 --> 00:07:09,850 We have a Markov chain Q, 100 00:07:09,850 --> 00:07:12,300 which doesn't produce us samples which we all 101 00:07:12,300 --> 00:07:18,175 want and we have a distribution of pie which we want to simplify. 102 00:07:18,175 --> 00:07:23,530 Now we want to define a critic such 103 00:07:23,530 --> 00:07:31,507 that the Markov chain that was created by this critic, 104 00:07:31,507 --> 00:07:35,485 so the Markov chain T, will indeed produce samples from the distribution pie. 105 00:07:35,485 --> 00:07:38,795 So, the distribution pie is stationary. 106 00:07:38,795 --> 00:07:43,450 And since this property is hard to check, 107 00:07:43,450 --> 00:07:48,345 instead we're going to just check that the distribution pie will be, 108 00:07:48,345 --> 00:07:51,920 the detailed balance equation will be true for it. 109 00:07:51,920 --> 00:07:54,970 So, in the next video, 110 00:07:54,970 --> 00:07:58,625 we're going to use Blackboard to find the critic such that 111 00:07:58,625 --> 00:08:00,490 this detailed balance equation holds for 112 00:08:00,490 --> 00:08:04,610 the given Markov chain Q and the distribution pie.