1
00:00:00,000 --> 00:00:03,754
[MUSIC]

2
00:00:03,754 --> 00:00:06,118
So that's just the critic.

3
00:00:06,118 --> 00:00:11,407
That's just the probability
a that with which probability

4
00:00:11,407 --> 00:00:15,857
will accept and move proposed by Markov.

5
00:00:15,857 --> 00:00:19,730
And that's just in the way
the detail equation holds.

6
00:00:19,730 --> 00:00:25,041
So here we have the desired distribution,
pi, which we want to sample from.

7
00:00:25,041 --> 00:00:29,541
And we have the Markov chain Q,
which we want to fix by a critic A.

8
00:00:29,541 --> 00:00:34,037
And we want to fix this A such that
this [INAUDIBLE] equation holds.

9
00:00:34,037 --> 00:00:39,296
So pi is indeed a stationary distribution
for the mark of chain T, and

10
00:00:39,296 --> 00:00:45,024
that's the mark of chain T converge to
this point from any starting point.

11
00:00:45,024 --> 00:00:47,920
So let's see how we can choose
A such that this property holds.

12
00:00:47,920 --> 00:00:54,356
Let's, compose

13
00:00:54,356 --> 00:00:59,650
this equality and, move all the parts
that depends on A in one part.

14
00:00:59,650 --> 00:01:06,466
So, A(x to x'),

15
00:01:09,067 --> 00:01:12,877
Divided by A(x' to x),

16
00:01:17,508 --> 00:01:20,103
Equals to the following equation,
so we have

17
00:01:22,540 --> 00:01:26,389
On the one side we have Y of X prime.

18
00:01:30,123 --> 00:01:33,310
And the Q of X prime to x, and

19
00:01:33,310 --> 00:01:38,443
in the denominator we
have the same thing but

20
00:01:38,443 --> 00:01:41,790
With swapping x and x prime.

21
00:01:44,168 --> 00:01:50,047
[SOUND] So,
this is something computable right?

22
00:01:50,047 --> 00:01:51,903
We know y and q.

23
00:01:51,903 --> 00:01:54,016
So, we can compute this thing for
any two values.

24
00:01:54,016 --> 00:01:56,171
Let's call this, ratio raw.

25
00:01:59,014 --> 00:02:03,642
Let's assume for
a moment that this row is less than 1.

26
00:02:03,642 --> 00:02:07,522
So it's always greater than 0 because
it's some ratio of probabilities.

27
00:02:07,522 --> 00:02:14,239
But let's assume that
it's also less than 1.

28
00:02:14,239 --> 00:02:17,300
So it's just an assumption, it may
not hold too much, let's assume that.

29
00:02:17,300 --> 00:02:19,817
In this case, we have a ratio of eights,

30
00:02:19,817 --> 00:02:23,382
which would make equal to
which is less than one.

31
00:02:23,382 --> 00:02:28,493
So let's just use A of
x prime to be your one

32
00:02:33,038 --> 00:02:35,194
And A ox X prime to X to be 1.

33
00:02:41,933 --> 00:02:46,017
This way the desire to holds true right,

34
00:02:46,017 --> 00:02:51,408
because A of X to X prime divided to
X prime to X Will be [INAUDIBLE].

35
00:02:51,408 --> 00:02:56,807
And what you may notice that we could
have chosen here some other numbers,

36
00:02:56,807 --> 00:03:01,002
like 4 divided by 2, for
example, or 1 divided by 2.

37
00:03:01,002 --> 00:03:02,682
But this will also work,

38
00:03:02,682 --> 00:03:07,018
the overall mark of chain will
deconverge to the distribution.

39
00:03:07,018 --> 00:03:08,930
[INAUDIBLE].

40
00:03:08,930 --> 00:03:13,591
Because this I think is
the probability we accepts they move.

41
00:03:13,591 --> 00:03:17,868
And the way to increase probability
will necessarily inject more moves.

42
00:03:17,868 --> 00:03:23,417
So we risk more capacity for
our more [INAUDIBLE] which will reject.

43
00:03:23,417 --> 00:03:28,399
So this choose is the maximum
acceptance probability we

44
00:03:28,399 --> 00:03:32,321
can make while keeping
the ratio to be low And

45
00:03:32,321 --> 00:03:36,146
what to do if the ratio
is greater than one.

46
00:03:36,146 --> 00:03:39,398
Well we can just choose
to be A of X prime,

47
00:03:39,398 --> 00:03:43,727
X to X prime to be divided by one,
and this P to be one N.

48
00:03:43,727 --> 00:03:48,665
So we can summarize this
to cases that are always

49
00:03:48,665 --> 00:03:54,097
less than one and
they're always greater than one,

50
00:03:54,097 --> 00:03:58,417
in the form of
the expression A of X prime,

51
00:03:58,417 --> 00:04:04,818
of X dot prime, To be equal to minimal.

52
00:04:07,247 --> 00:04:12,857
Within two values, 1 and this ratio,

53
00:04:12,857 --> 00:04:18,467
so and
pie of X prime times Q of X prime dot

54
00:04:18,467 --> 00:04:28,021
X And

55
00:04:28,021 --> 00:04:33,252
the same thing while changing the
parameters x to x prime and vice versa.

56
00:04:40,295 --> 00:04:42,940
So this expression works both for

57
00:04:42,940 --> 00:04:48,044
the case where the ratio is less
than than 1 and greater than 1.

58
00:04:48,044 --> 00:04:51,633
You can check it by considering two cases.

59
00:04:51,633 --> 00:04:54,907
And in one case it will
be just 1 because it's.

60
00:04:54,907 --> 00:04:58,335
Minimum between 1 and
something greater than 1.

61
00:04:58,335 --> 00:05:01,447
In other case it will be wrong.

62
00:05:01,447 --> 00:05:10,040
So this and by choosing this critic we
indeed can prove that the overall mark of

63
00:05:10,040 --> 00:05:15,169
defined by this kind of
procedure will indeed.

64
00:05:15,169 --> 00:05:18,070
Follow this equation.

65
00:05:18,070 --> 00:05:22,457
So the distribution probably will be
stationary for our Markov chain, and

66
00:05:22,457 --> 00:05:27,065
that we will convert to distribution for
any starting point.

67
00:05:27,065 --> 00:05:33,811
Last, by choosing this critic,
we may sample from the distribution y.

68
00:05:33,811 --> 00:05:38,724
To summarise the metropolis
Hastings approach to building

69
00:05:38,724 --> 00:05:43,251
chain that converges to
the desired distribution pi.

70
00:05:43,251 --> 00:05:48,127
We have to start with sum of mac chain q,
which we don't have anything to

71
00:05:48,127 --> 00:05:53,004
do with the distribution pie and
then this mark of chain on each step will

72
00:05:53,004 --> 00:05:58,321
propose symbols and we'll have a correct
with sometimes rejects this move.

73
00:05:58,321 --> 00:06:01,922
And ask the mark of chain to stay
where it was at the various straight.

74
00:06:01,922 --> 00:06:06,807
And we just proved that if you
defined your coreject to be like this,

75
00:06:06,807 --> 00:06:12,294
if your acceptance from bill will be
this expression minimum between one and

76
00:06:12,294 --> 00:06:13,252
some ratio.

77
00:06:13,252 --> 00:06:17,885
Then no matter from which
Markov chain Q is started with,

78
00:06:17,885 --> 00:06:23,106
you will necessarily converge
to the desired distribution pi.

79
00:06:23,106 --> 00:06:26,954
Note that this acceptance probability,
it's the only place

80
00:06:26,954 --> 00:06:30,951
where we use our distribution pi,
which we want to sample from.

81
00:06:30,951 --> 00:06:35,700
And note that we don't have to
know this distribution pi exactly.

82
00:06:35,700 --> 00:06:38,422
We may know it up to
normalization constant.

83
00:06:38,422 --> 00:06:43,337
because here, we'll have a ratio between
our distribution at two different points.

84
00:06:43,337 --> 00:06:46,816
It doesn't matter how we
normalize this distribution.

85
00:06:46,816 --> 00:06:49,997
The ratio will not care
about the normalization,

86
00:06:49,997 --> 00:06:52,798
because if we divide
each of the parts of this

87
00:06:52,798 --> 00:06:56,826
ratio by the normalization constant z,
nothing will change.

88
00:06:56,826 --> 00:07:02,022
So, this method, as well as
the Gibbs sampling, can be used when

89
00:07:02,022 --> 00:07:07,412
you don't know the normalization
constant of your distribution.

90
00:07:07,412 --> 00:07:13,210
We have discussed that we can choose q to
be anything in distribution number one.

91
00:07:13,210 --> 00:07:16,876
Well obviously,
the Q should be nonzero anywhere, so

92
00:07:16,876 --> 00:07:19,199
the ratio will be well-defined.

93
00:07:19,199 --> 00:07:21,947
But of course, the efficiency and

94
00:07:21,947 --> 00:07:27,082
the properties of your algorithm
will depend on the choice of Q.

95
00:07:27,082 --> 00:07:29,120
There are kind of two
opposing forces here.

96
00:07:29,120 --> 00:07:33,234
So first of all,
Q should make large steps,

97
00:07:33,234 --> 00:07:39,242
it should spread out and
kind of roll freely in the sample space,

98
00:07:39,242 --> 00:07:44,933
to make the samples less correlated and
to converge faster.

99
00:07:44,933 --> 00:07:50,636
On the other hand, if you have too large
steps, your critic will reject them, too

100
00:07:50,636 --> 00:07:56,354
often, and you will waste your capacity by
staying at the same place for too long.

101
00:07:56,354 --> 00:08:00,297
Because if you're now, for
example, already converged, and

102
00:08:00,297 --> 00:08:05,262
you're in the high-density region,
then if you're Q proposes all this to make

103
00:08:05,262 --> 00:08:10,008
too-large moves, then you will move
outside your high density region, and

104
00:08:10,008 --> 00:08:12,513
the will not be happy with that.

105
00:08:12,513 --> 00:08:18,428
It will say I don't want to go there,
the probability distribution

106
00:08:18,428 --> 00:08:23,326
curve ball doesn't work,
isn't high in that region.

107
00:08:23,326 --> 00:08:24,640
So in the next video,

108
00:08:24,640 --> 00:08:29,611
we'll see some small demo of how this
approach can work in One dimensional case.

109
00:08:29,611 --> 00:08:39,611
[SOUND]