1
00:00:00,000 --> 00:00:05,006
Now we're going to see more undecidable
problems. We begin with Rice's theorem

2
00:00:05,006 --> 00:00:09,820
which tells us that almost every question
we can ask about the recursively

3
00:00:09,820 --> 00:00:14,891
enumerable language is undecidable, and we
then introduce a problem called post

4
00:00:14,891 --> 00:00:19,968
correspondence problem, which we also show
as undecidable. Post's problem does not

5
00:00:19,968 --> 00:00:24,872
appear to have anything to do with Turing
machines, so the fact that we can show it

6
00:00:24,872 --> 00:00:29,126
is undecidable is a valuable step on our
road to showing real problem's

7
00:00:29,126 --> 00:00:33,912
undecidable. However, Post's problem is
still just a game. But we can use Post's

8
00:00:33,912 --> 00:00:38,520
problem to show some real problems, for
example questions about grammars, to be

9
00:00:38,520 --> 00:00:42,929
undecidable. Well, first, see Rice's
theorem. That theorem involves what are

10
00:00:42,929 --> 00:00:47,658
called properties of languages. Formally,
a property of languages is any set of

11
00:00:47,658 --> 00:00:52,510
languages; the languages that we say have
this property. For example, the property

12
00:00:52,510 --> 00:00:57,482
of being infinite is the set of infinite
languages. The property of being empty is

13
00:00:57,482 --> 00:01:01,695
the set containing only the empty
language. Well properties like being

14
00:01:01,695 --> 00:01:06,393
infinite apply to any language, even not
recursively enumerable ones. We need to

15
00:01:06,393 --> 00:01:11,032
represent languages, and the touring
machine is the most powerful tool we have

16
00:01:11,032 --> 00:01:16,027
for representing languages. So we'll talk
about properties of recursively enumerable

17
00:01:16,027 --> 00:01:21,564
languages only. Solution. Consider a
property to be a problem about touring

18
00:01:21,564 --> 00:01:26,544
machines. Given a code for touring
machine, does it define a language with

19
00:01:26,544 --> 00:01:34,025
that property? So for any property P we
can define a language L sub P, the set of

20
00:01:34,025 --> 00:01:40,881
binary strings that represent a Turing
machine whose language has the property P.

21
00:01:40,881 --> 00:01:50,620
So for example, L sub infinite. So for
example, L sub infinite. Is the set of

22
00:01:50,620 --> 00:01:57,473
codes of term machines that define an
infinite language. There are two

23
00:01:57,473 --> 00:02:02,198
properties we call trivial, for reasons
that will be obvious. For these two

24
00:02:02,198 --> 00:02:07,242
properties, P, L sub is decidable. One of
these trivial properties is the always

25
00:02:07,242 --> 00:02:11,839
false property that contains no
recursively enumerable languages. We can

26
00:02:11,839 --> 00:02:16,436
think of this property as, is not a
recursively enumerable language. This

27
00:02:16,436 --> 00:02:20,905
property is actually true of some
languages, but those are outside the

28
00:02:20,905 --> 00:02:26,140
recursively enumerable class, obviously.
And we're talking about properties only as

29
00:02:26,140 --> 00:02:31,010
applied to recursively enumerable
languages. How do we decide this property?

30
00:02:31,010 --> 00:02:35,928
Given an input W, ignore it, and say no.
Notice that even if W represents an

31
00:02:35,928 --> 00:02:40,714
invalid Turing Machine code, we have
agreed to take all those strings as

32
00:02:40,714 --> 00:02:46,363
representing a Turing Machine that accepts
the empty language. But the empty language

33
00:02:46,363 --> 00:02:51,747
is a recursively enumerable language. So
we are correct in saying that W's Turing

34
00:02:51,747 --> 00:02:58,444
Machine does not have this property. The
second trivial property is the always true

35
00:02:58,444 --> 00:03:03,898
property. We can express the property as
is a recursively enumerable language. The

36
00:03:03,898 --> 00:03:10,779
algorithm for this property is to ignore
the input and say yes. And Rice's theorem

37
00:03:10,779 --> 00:03:18,015
says, for every property P except the two
trivial ones, L sub P is undecidable. An

38
00:03:18,015 --> 00:03:22,850
important part of the proof of Rice's
theorem, which we'll be working on for a

39
00:03:22,850 --> 00:03:27,870
while, is the idea of a reduction. There
are actually several different notions of

40
00:03:27,870 --> 00:03:32,766
reduction. We'll come back to that point
later. But the simplest one, and the one

41
00:03:32,766 --> 00:03:37,476
we need for Rice's theorem, is an
algorithm that takes an input string W for

42
00:03:37,476 --> 00:03:42,682
one language, or a problem L, and converts
it to another string X that is an input to

43
00:03:42,682 --> 00:03:47,764
another language or problem, L Prime. With
the property that X is in L Prime, if and

44
00:03:47,764 --> 00:03:53,801
only if W is in L. The value of having
such a reduction is that it tells us L is

45
00:03:53,801 --> 00:03:58,938
no harder than L prime, at least as far as
decidability is concerned. If I know L

46
00:03:58,938 --> 00:04:04,140
prime is recursively enumerable, than I
also have a touring machine program for L.

47
00:04:05,140 --> 00:04:10,929
If I know L prime is recursive, then I
also have and algorithm for L. In either

48
00:04:10,929 --> 00:04:16,571
case the solution for L is to take the
input W convert it to X see what the

49
00:04:16,571 --> 00:04:22,509
Turing Machine for L prime does, and take
its output to be that answer for W. The

50
00:04:22,509 --> 00:04:28,224
thing that turns a sting W into X is
called a transducer. So far we have only

51
00:04:28,224 --> 00:04:34,460
talked about a Turing Machine whose only
output is the yes or no that is implied by

52
00:04:34,460 --> 00:04:40,353
accepting or halting without accepting
respectfully. But we can get other kinds

53
00:04:40,353 --> 00:04:45,185
of output from a multi tape track machine
if it doesn't make one of its tapes to be

54
00:04:45,185 --> 00:04:49,786
the output tape and we got what is written
there when the machine halts is its

55
00:04:49,786 --> 00:04:54,676
output. We could, for example imagine such
a [inaudible] it takes W on its input tape

56
00:04:54,676 --> 00:05:00,586
and writes X on its output tape and then
halts. So if we reduce language L to L

57
00:05:00,586 --> 00:05:05,946
prime using such a transducer and L prime
is decidable, that is, there is an

58
00:05:05,946 --> 00:05:11,520
algorithm to test membership in L prime,
then there is also an algorithm for L.

59
00:05:11,520 --> 00:05:16,745
That is, we start by applying the
transducer to input W, and producing

60
00:05:16,745 --> 00:05:22,577
output X. We apply the algorithm for L
Prime to input X, and decide whether or

61
00:05:22,577 --> 00:05:28,105
not X is in L Prime. Either way, the
algorithm rendered, renders a decision,

62
00:05:28,105 --> 00:05:34,316
either yes or no. And these two steps for
an algorithm for L. The whole thing takes

63
00:05:34,316 --> 00:05:41,302
input W, and tells us whether or not W is
in L. We sometimes see a reduction use the

64
00:05:41,302 --> 00:05:45,645
discover and algorithm for L, but the more
important use of the idea is the

65
00:05:45,645 --> 00:05:50,162
contrapositive. If we already know that
there is no algorithm for L, then there

66
00:05:50,162 --> 00:05:54,621
cannot be any algorithm for L prime.
That's how we're going to use reductions.

67
00:05:54,621 --> 00:05:59,022
We'll take one problem that we may not be
interested in but that we know is

68
00:05:59,022 --> 00:06:03,539
undecidable and reduce it to another
problem that we are really interested in

69
00:06:03,539 --> 00:06:07,998
but whose status we don't yet know, and
then we will conclude that the second

70
00:06:07,998 --> 00:06:12,897
problem is also undecidable. Moreover,
when we get to intractability, or the

71
00:06:12,897 --> 00:06:17,872
theory of NP-completeness, we'll be using
reductions there fast, to argue that a

72
00:06:17,872 --> 00:06:23,856
problem like L prime can't be solved
faster than the problem L. There are more

73
00:06:23,856 --> 00:06:28,456
powerful forms of reduction that let us
reach the same conclusion. That is if

74
00:06:28,456 --> 00:06:34,973
there is an algorithm for L prime then
there is an algorithm for L. We actually

75
00:06:34,973 --> 00:06:39,679
saw one example of a more complex
reduction where we started by assuming

76
00:06:39,679 --> 00:06:45,325
that there was an algorithm for L sub U,
the universal language. And showed how we

77
00:06:45,325 --> 00:06:52,351
could then construct an algorithm for LD
which we know does not exist. However, the

78
00:06:52,351 --> 00:06:57,552
hypothetical algorithm we constructed for
L's of D involved more than just turning

79
00:06:57,552 --> 00:07:02,815
one string into another. We first, have to
check whether the input w is a well-formed

80
00:07:02,815 --> 00:07:07,766
code for turning machine and if not we
answer the question directly. More after,

81
00:07:07,766 --> 00:07:13,066
moreover. After doing a transduction where
we turned input W into W111W, we had to

82
00:07:13,066 --> 00:07:18,525
compliment the answer that we got from the
hypothetical algorithm for L sub U. That

83
00:07:18,525 --> 00:07:24,051
is we turned a yes answer into no and vice
versa. In the simple version of reduction,

84
00:07:24,051 --> 00:07:29,379
we are not allowed to modify answers in
this way; we have to take whatever answer

85
00:07:29,379 --> 00:07:35,350
we get. We'll revisit this issue of more
general kinds of reductions when we talk

86
00:07:35,350 --> 00:07:39,668
about [inaudible] completeness. There, the
simple reduction is called the Carp

87
00:07:39,668 --> 00:07:43,818
Reduction and more general kinds of
reductions are called Cook Reductions.

88
00:07:43,818 --> 00:07:48,136
Steve Cook and Richard Carp by the way
were the people who made the earliest

89
00:07:48,136 --> 00:07:53,306
contributions to NP Completeness Theory.
We're now ready to prove [inaudible]

90
00:07:53,306 --> 00:07:59,199
theorem. The idea is that we can reduce l,
sub u to l sub b for any none [inaudible]

91
00:07:59,199 --> 00:08:04,878
property b. Then since we know l sub u is
undecidable, it follows that l sub b is

92
00:08:04,878 --> 00:08:10,842
undecidable. Here's how we reduce value to
l p. The input to l u is touring machine m

93
00:08:10,842 --> 00:08:16,380
and an input w for m. The output will be
the code for touring machine in prime.

94
00:08:17,360 --> 00:08:23,110
That is at least the right form, since LP
is a set of Turing Machine codes. Those

95
00:08:23,110 --> 00:08:28,278
for which the language of the machine has
property P. Of course, for the

96
00:08:28,278 --> 00:08:33,956
transduction from M and W to M Prime to
work, we must arrange that M Prime has

97
00:08:33,956 --> 00:08:39,416
property P, if and only if M accepts W.
That is, M Prime is an L sub P, if and

98
00:08:39,416 --> 00:08:45,740
only if M and W is an L sub U. We'll
design M prime to be a 2-tape machine. On

99
00:08:45,740 --> 00:08:51,524
the first tape, M prime simulates another
particular Turing machine, which we'll

100
00:08:51,524 --> 00:08:57,701
call M sub L, on its own input, say X. On
the second tape, M prime simulates M on W.

101
00:08:57,701 --> 00:09:03,482
It is important to understand the M prime
has only its own input x which the

102
00:09:03,482 --> 00:09:09,635
transducer does not deal with or see. The
transducer knows about the M's of L, it is

103
00:09:09,635 --> 00:09:16,314
build into the design of the transducer.
The transducer gets to see the code for M

104
00:09:16,314 --> 00:09:22,244
and the string W, that is on its own
input. Thus the transducer can build both

105
00:09:22,244 --> 00:09:28,174
the moves of M and the input W into the
transition function of the machine M

106
00:09:28,174 --> 00:09:35,194
prime, that is its own output. None of M,
M sub L, or W is input to M prime. We're

107
00:09:35,194 --> 00:09:40,190
going to assume that the empty line does
not have property P. If that is not the

108
00:09:40,190 --> 00:09:46,043
case, then consider the complimented piece
AQ. Surely we loved your language than his

109
00:09:46,043 --> 00:09:50,670
property Q. But if we could prove that Q
around this of [inaudible] then P also

110
00:09:50,670 --> 00:09:55,413
must be on this side of [inaudible]. That
is well P were in cursive language, then

111
00:09:55,413 --> 00:09:59,923
still it be a Q is [inaudible] class of
cursive language is disclosed in the

112
00:09:59,923 --> 00:10:07,182
complementation. So let L be any language
with property P. We know L exists because

113
00:10:07,182 --> 00:10:16,000
P is not trivial. Also, let M sub L be the
turning machine that accepts L. Here's how

114
00:10:16,000 --> 00:10:22,542
M prime behaves. To begin, M prime writes
W on its second take. It can do that

115
00:10:22,542 --> 00:10:28,706
because the transducer seeing W generates
a sequence of states that write W one bit

116
00:10:28,706 --> 00:10:34,267
at a time. M Prime, from its start state,
enters each of these states in turn. Then

117
00:10:34,267 --> 00:10:39,484
M Prime moves the tape head for tape two
to the left end. Goes to the start state

118
00:10:39,484 --> 00:10:44,700
of M, and simulates W on input W. Again, M
Prime can do that, because the transducer

119
00:10:44,700 --> 00:10:49,143
sees both M and W, and makes the
transition function of M part of the

120
00:10:49,143 --> 00:10:54,356
transition function of M Prime. Suppose,
during the simulation of M on W, M enters

121
00:10:54,356 --> 00:10:59,674
an accepting state. Then M Prime goes to
the start state of M sub L, and simulates

122
00:10:59,674 --> 00:11:04,927
M sub L on the input X to M Prime, which
has been just sitting there on tape one

123
00:11:04,927 --> 00:11:10,310
being ignored so far. Where does M prime
get the transition function for M sub L? M

124
00:11:10,310 --> 00:11:15,759
sub L is a particular Turing Machine, one
that accepts a language L with property P.

125
00:11:15,759 --> 00:11:21,406
Plus the transducer itself can be designed
so that it writes the transitions of M sub

126
00:11:21,406 --> 00:11:27,541
L out as part of the transition function
of M Prime. If M [inaudible] accepts the

127
00:11:27,541 --> 00:11:32,931
input adds, then M Prime enters an
accepting state. If M [inaudible] never

128
00:11:32,931 --> 00:11:38,008
accepts ads, now it is M Prime. Does
M-Prime ever get the stage where it

129
00:11:38,008 --> 00:11:43,451
simulates an MC-bell? An M-Prime accepts
the same language as MC-bell does, that is

130
00:11:43,451 --> 00:11:48,629
L. But if M does not accept W then M-Prime
never gets to the stage as where it

131
00:11:48,629 --> 00:11:54,971
simulates MC-bell and therefore M-Prime
accept nothing. Here's a picture to help

132
00:11:54,971 --> 00:12:03,475
us remember what M prime does. It first
simulates M on input W. So far M prime's

133
00:12:03,475 --> 00:12:12,088
own input, X is ignored. But if M accepts
W, then M prime simulates M sub L on its

134
00:12:12,088 --> 00:12:21,626
own input X. And N prime accepts X if and
only if X is in L. So to summarize what we

135
00:12:21,626 --> 00:12:27,011
know about M prime. First, suppose M
accepts W. Then M prime simulates M

136
00:12:27,011 --> 00:12:33,953
[inaudible] on X and accepts X if and only
if X is in L. That is, if M excepts W,

137
00:12:33,953 --> 00:12:41,960
then the language of M prime is L. We know
L has property P, so M prime is an LP. Now

138
00:12:41,960 --> 00:12:48,956
look at the opposite case where N does not
accept W. An M prime never even starts the

139
00:12:48,956 --> 00:12:54,620
simulation of ML and therefore M prime
cannot accepts input x. That is, in this

140
00:12:54,620 --> 00:13:00,647
case, the language of M prime is the empty
language. We know the empty language does

141
00:13:00,647 --> 00:13:08,507
not have property P. Thus if M does not
accept W, M prime is not in the language

142
00:13:08,507 --> 00:13:14,512
LP. We concluded the out rhythm we
described for converting m and w to m

143
00:13:14,512 --> 00:13:19,883
prime is reduction of l sub u to l p. And
therefore, that l p is undecidable. That

144
00:13:19,883 --> 00:13:25,455
is rises theorem. This is a picture that
reveals the argument for why the existence

145
00:13:25,455 --> 00:13:31,162
or reduction from l u to l p proves that l
p is undecidable. We have a real reduction

146
00:13:31,162 --> 00:13:36,734
out rhythm. I hope you're convinced that
you can program this out rhythm if you are

147
00:13:36,734 --> 00:13:42,317
paid enough. We then contradict the
hypothesis that LP has an algorithm by

148
00:13:42,317 --> 00:13:48,658
supposing that algorithm existed. Then you
could put together the reduction plus the

149
00:13:48,658 --> 00:13:53,755
hypothetical algorithm. To build an
algorithm for LU. Since we already proved

150
00:13:53,755 --> 00:13:58,544
that there's no such thing, we have to
look at what of this story hasn't been

151
00:13:58,544 --> 00:14:02,959
proved. The finger points at the
hypothetical algorithm for LP. Since we

152
00:14:02,959 --> 00:14:08,037
didn't prove it exists, we just assumed it
did. Thus, the assumption must be

153
00:14:08,037 --> 00:14:13,124
responsible for the false conclusion. And
we can conclude instead, that there is no

154
00:14:13,124 --> 00:14:18,026
algorithm for property P. Thanks to Rice's
theorem, we suddenly have an infinite

155
00:14:18,026 --> 00:14:23,176
collection of undecideable problems about
the languages defined by Turing Machines.

156
00:14:23,176 --> 00:14:27,643
Here is just a tiny sample of the
questions that are undecideable about

157
00:14:27,643 --> 00:14:34,668
Turing Machines M. Is M's language
regular, or is it context free? Does this

158
00:14:34,668 --> 00:14:40,631
language include at least one string that
is a palindrome? That is, a string that is

159
00:14:40,631 --> 00:14:46,378
the same as its reversal. Is the language
empty? That is, does M accept any string

160
00:14:46,378 --> 00:14:51,737
at all? Does the language contain at least
1000 strings? But Rice's theorem also

161
00:14:51,737 --> 00:14:56,370
applies to programs since you can write a
program to simulate a turning machine.

162
00:14:56,370 --> 00:15:01,285
That tells us any nontrivial question
about what a program does will also be

163
00:15:01,285 --> 00:15:06,136
undecidable. I want to emphasize about
what a program does. There are lots of

164
00:15:06,136 --> 00:15:11,179
questions about what a program looks like
that are decidable. For example, I can

165
00:15:11,179 --> 00:15:16,095
tell whether a program uses more than
twenty variable names, but that's not a

166
00:15:16,095 --> 00:15:21,393
question about what the program does. An
example of a question about what a program

167
00:15:21,393 --> 00:15:26,646
does is, does the program eventually hall,
halt on any input? That is undecidable.

168
00:15:26,646 --> 00:15:32,443
Or, does the program correctly sort its
input? That's undecidable. Or, does this

169
00:15:32,443 --> 00:15:38,480
line of code ever get executed? That's
undecidable. We're now going to take up

170
00:15:38,480 --> 00:15:43,664
post correspondence problem,
affectionately known as PCP. Pcp is the

171
00:15:43,664 --> 00:15:48,197
first example of a problem that is
undecidable, yet doesn't involve touring

172
00:15:48,197 --> 00:15:53,424
machines or programs. Which are really the
same thing. As we said, PCP is not really

173
00:15:53,424 --> 00:15:57,961
important by itself since it's just the
made up problem, but it leads us to proofs

174
00:15:57,961 --> 00:16:02,442
that many other problems are undecidable.
These problems are unrelated to touring

175
00:16:02,442 --> 00:16:06,758
machines but are related to matters like
context-free languages that were not

176
00:16:06,758 --> 00:16:11,074
developed for the purpose of exposing
undecidable problems. That is, we studied

177
00:16:11,074 --> 00:16:15,445
grammars for their use in matters like
describing the structure of programming

178
00:16:15,445 --> 00:16:19,871
languages. We had no intent to describe
problems that turn out to be undecidable.

179
00:16:19,871 --> 00:16:24,186
It just turned out that there were
undecidable problems lurking in the theory

180
00:16:24,186 --> 00:16:29,825
of context-free grammars. An instance of
PCP is a list of corresponding strings.

181
00:16:29,825 --> 00:16:37,063
That is, a list of pairs of strings over
some particular alphabet sigma. The, the

182
00:16:37,063 --> 00:16:43,656
incident has some number of pairs, say N.
The first pair is W1X1; the second is W2X2

183
00:16:43,656 --> 00:16:49,627
and so on. None of these strings can be
the empty string. The PCP question is,

184
00:16:49,627 --> 00:16:55,350
given a list of corresponding pairs,
whether we can find a list of one or more

185
00:16:55,350 --> 00:17:00,999
[inaudible] I1 thought IK that we can
treat as the sequence of pairs that we

186
00:17:00,999 --> 00:17:06,942
choose. There can be repetitions in this
list, the property of the list that makes

187
00:17:06,942 --> 00:17:13,032
the answer to this instance of PCP be yes,
is that when we take the first component

188
00:17:13,032 --> 00:17:24,480
from each pair on the list, that is. Wi1
is actually the first component of the.

189
00:17:24,480 --> 00:17:34,142
Pair that we indexed as I sub one I can't
draw on the slide two levels of subscripts

190
00:17:34,142 --> 00:17:43,122
then a W, the second W I two is the...
That W from this list that has in the... I

191
00:17:43,122 --> 00:17:51,662
that is from the I two pair on that list
and so on. Then we get the same string if

192
00:17:51,662 --> 00:17:57,592
we take the Ws from the pairs as if we
took the second component with the Xs from

193
00:17:57,592 --> 00:18:03,450
the pairs. Okay, such a list of integers
is called the solution to the instance of

194
00:18:03,450 --> 00:18:12,040
PCP. Here is a simple example instance of
PCP. The alphabet will be zero and one.

195
00:18:12,860 --> 00:18:25,312
There are two pairs; the first pair is
zero and 01. The second pair Is 100 and

196
00:18:25,312 --> 00:18:32,641
001. Okay. Every time a list of integers
has one, it refers to this pair and every

197
00:18:32,641 --> 00:18:40,062
time it has a two, it refers to that pair.
Okay. In this case, it turns out; there is

198
00:18:40,062 --> 00:18:46,981
no solution to this instance of PCP. The
analysis of this simple instance is not so

199
00:18:46,981 --> 00:18:52,493
easy. The easy part is that no solution
can start with the second pair. Why? The

200
00:18:52,493 --> 00:18:58,076
reason is that if we choose two as the
first integer in the so-called solution

201
00:18:58,076 --> 00:19:03,730
list, then the first string made from the
Ws that are first component will begin

202
00:19:03,730 --> 00:19:09,761
with this one. But the second string, the
one made from the corresponding X's that

203
00:19:09,761 --> 00:19:14,497
are the second components, begins with a
zero. I don't care how you continue a

204
00:19:14,497 --> 00:19:19,110
string that begins with one can never
equal a string that begins with zero.

205
00:19:19,110 --> 00:19:24,030
However, it is possible that the solution
list begins with index one because for

206
00:19:24,030 --> 00:19:28,520
pair one the two strings are not
inconsistent, that is, one is a prefix of

207
00:19:28,520 --> 00:19:33,380
the other. So let's see if we can build a
solution starting with the first pair.

208
00:19:33,680 --> 00:19:38,878
Since the two strings are not equal, we
have to add another pair. We can't choose

209
00:19:38,878 --> 00:19:44,206
pair one because the first string would
then begin 00 and the second string would

210
00:19:44,206 --> 00:19:48,885
begin 01. That could never lead to a
solution. So the second index in the

211
00:19:48,885 --> 00:19:54,515
solution must be two. Now both strings
begin with 0100 and the second string has

212
00:19:54,515 --> 00:20:00,300
an additional one. We're back where we
were at the previous choice. We can't pick

213
00:20:00,300 --> 00:20:06,446
one as the third index but we can pick two
and we get a match up to a point with the

214
00:20:06,446 --> 00:20:11,660
second string again having an additional
one. If you think about it, we can never

215
00:20:11,660 --> 00:20:16,141
make the two strings be the same because
to do so we'd eventually have to use a

216
00:20:16,141 --> 00:20:20,735
pair of the second string was shorter than
the first. But there is no such pair in

217
00:20:20,735 --> 00:20:25,894
this instance. We conclude that this
instance has answer no. On the other hand,

218
00:20:25,894 --> 00:20:33,929
let's change the instance a little by
adding a third pair, 110 and ten. Now the

219
00:20:33,929 --> 00:20:42,377
list of indexes thirteen is the solution.
From the first strings of the pair is we

220
00:20:42,377 --> 00:20:52,794
get zero followed by 110 which is. 0110
i.e. 0110 is a zero followed by a 110.

221
00:20:52,794 --> 00:21:04,818
From the corresponding second strings we
get 01 which is this followed by ten

222
00:21:04,818 --> 00:21:13,061
that's that, and that also is 0110. In
fact there's an infinitive solutions for

223
00:21:13,061 --> 00:21:18,042
this instance, any string of events that
exist in the language of regular

224
00:21:18,042 --> 00:21:23,705
expression one to star two. That is after
the one, we can use pair two any number of

225
00:21:23,705 --> 00:21:28,890
times, including zero obviously, and
finally use the pair three. So here's the

226
00:21:28,890 --> 00:21:34,075
plan for proving PCPs undesirable. We're
gonna start by introducing a modified

227
00:21:34,075 --> 00:21:39,479
version of the problem called MPCP, or the
modified post correspondence problem. The

228
00:21:39,479 --> 00:21:44,362
modification is that a solution to MPCP
must begin with the first pair. But

229
00:21:44,362 --> 00:21:49,734
otherwise the modified and original PCP
problems are the same. We'll show how to

230
00:21:49,734 --> 00:21:55,370
reduce L sub U, the universal Turing
machine language, to MPCP. And then we'll

231
00:21:55,370 --> 00:22:01,155
show how to reduce MPCP to P sub P. In
fact, this part is easier so we'll do it

232
00:22:01,155 --> 00:22:07,607
first. Just to get the distinction between
PCP and MPCP, observe that if we treat the

233
00:22:07,607 --> 00:22:13,313
list of three pairs from our previous
example of PCP as an instance of MPCP,

234
00:22:13,313 --> 00:22:19,020
then the answer is yes. The reason is that
there is a sequence within the Xs,

235
00:22:19,020 --> 00:22:24,802
beginning with one. Namely, 1-3, that
produces equal strings from the first and

236
00:22:24,802 --> 00:22:31,182
second components of the pairs. But we can
reorder the pairs; say by putting 110 and

237
00:22:31,182 --> 00:22:37,495
eleven first. As an instance of PCP the
order of the pairs doesn't matter, there's

238
00:22:37,495 --> 00:22:42,868
still a solution. But as an instance of
mpcp, there is now no solution. Any

239
00:22:42,868 --> 00:22:48,506
solution would have to begin with the
index one. But then the first of the two

240
00:22:48,506 --> 00:22:54,506
strings would begin one, one zero. And the
second would begin one zero. No matter how

241
00:22:54,506 --> 00:23:00,506
we extend these strings. They're going to
disagree in their second positions. Before

242
00:23:00,506 --> 00:23:06,145
we can talk about reductions, we need to
express pcp and mpcp as languages. The

243
00:23:06,145 --> 00:23:11,865
instances of these problems can have any
alphabet. So it is not im- immediately

244
00:23:11,865 --> 00:23:17,905
obvious how to express the set of all PCP
instances that have a solution. Again the

245
00:23:17,905 --> 00:23:22,507
problem is, is that there no finite
alphabet for this set, and languages just

246
00:23:22,507 --> 00:23:28,377
need a finite alphabet. So we need to code
symbols in a finite alphabet. We'll

247
00:23:28,377 --> 00:23:35,093
represent the odd symbol of an alphabet by
the symbol A, followed by I and binary.

248
00:23:35,093 --> 00:23:41,477
Thus, we only need three symbols to
represent any number of symbols. The order

249
00:23:41,477 --> 00:23:47,032
of symbols, the alphabet, is not
important. For example, if we have an

250
00:23:47,032 --> 00:23:53,499
instance over alphabet, A through Z, we
could represent A by A1, B by A10, and so

251
00:23:53,499 --> 00:23:59,688
on. But we could also represent. Z by A1
and Y by A10, and so on. It should be

252
00:23:59,688 --> 00:24:06,677
clear that the particular symbols used for
the alphabet of the PCP instance does not

253
00:24:06,677 --> 00:24:13,256
affect the outcome. And the only other
symbols we need to represent instances is

254
00:24:13,256 --> 00:24:20,081
the comma and left and right parentheses.
Thus the alphabet for the language of PCP

255
00:24:20,081 --> 00:24:26,553
instances will have an alphabet of six
symbols the A, zero, one. Comma, left

256
00:24:26,553 --> 00:24:38,298
paren and right paren. [sound], [sound]
Now that we have a finite alphabet for

257
00:24:38,298 --> 00:24:43,332
coding instances we can define two
languages. L sub PCP, the language of

258
00:24:43,332 --> 00:24:49,338
coded instances of PCP that have a
solution, and L sub MPCP is the same for

259
00:24:49,338 --> 00:24:55,069
the modified problem. Now we can do the
reduction of MPCP to PCP. We'll describe

260
00:24:55,069 --> 00:25:00,514
the transformation only but you should be
able to visualize the process being

261
00:25:00,514 --> 00:25:06,030
performed by a touring machine transducer.
Our input is an instance of MPCP. The

262
00:25:06,030 --> 00:25:11,685
output instance of PCP will use the same
alphabet as the input instance plus two

263
00:25:11,685 --> 00:25:17,270
new symbols, the star and the dollar sign.
Of course all symbols new ones and the

264
00:25:17,270 --> 00:25:22,995
symbols of the input instance of MPCP will
be coded using A, zeroes and ones as we

265
00:25:22,995 --> 00:25:28,918
just described. For each pair. W, X in the
input instance. There's a pair in the

266
00:25:28,918 --> 00:25:35,172
output instance based on W and X. For
every first string W of the pair, we add

267
00:25:35,172 --> 00:25:43,556
star after every symbol. So for example.
Zero, one, Becomes zero star, one star.

268
00:25:43,556 --> 00:25:51,212
And for X, the second string of the pair,
we instead, add a star before every

269
00:25:51,212 --> 00:25:58,867
character. So, for example, 1-1 becomes
star one, star one. The output instance

270
00:25:58,867 --> 00:26:07,544
also has a new pair unrelated to any input
pair. This pair is dollar sign paired with

271
00:26:07,544 --> 00:26:13,984
star dollar sign. And the output also has
a second pair based on the first input

272
00:26:13,984 --> 00:26:20,182
pair. This pair has stars placed after the
symbols with the first string, and before

273
00:26:20,182 --> 00:26:26,081
the symbols with the second string. Just
like in rules one and two on the slide.

274
00:26:26,081 --> 00:26:31,906
But the first string also has a star at
the beginning. So, for example, the pair

275
00:26:31,906 --> 00:26:44,520
01 paired with zero. Would become star
zero star one star paired with star zero.

276
00:26:46,840 --> 00:26:51,809
Here's an example. On the left is the
instance of MPCP that is input to the

277
00:26:51,809 --> 00:26:57,242
transducer. On the right are those pairs
with the added stars. Notice that for each

278
00:26:57,242 --> 00:27:02,807
pair, the stars come after the symbols on
the first string and before the symbols on

279
00:27:02,807 --> 00:27:09,122
the second string. And we always have this
pair which will serve to make the strings

280
00:27:09,122 --> 00:27:14,684
of the PCP instance match when there is a
solution to the MPCP instance. Its job is

281
00:27:14,684 --> 00:27:20,045
to add the missing start of the second
string. And this pair also comes from the

282
00:27:20,045 --> 00:27:25,953
first pair. It is just like the first pair
of the PCP instance. That is this. And

283
00:27:25,953 --> 00:27:32,563
this pair also comes from the first pair.
It is just like the first pair of the PCP

284
00:27:32,563 --> 00:27:39,132
instance. Except for the star at the
beginning of the first string. Notice that

285
00:27:39,132 --> 00:27:46,172
this is the only pair in the entire PCP
instance where both strings start with the

286
00:27:46,172 --> 00:27:52,080
same symbol. For example, this pair, has
zero. As the first symbol of the first

287
00:27:52,080 --> 00:27:57,767
string and star as the first symbol of the
second string. Okay, all these pairs have

288
00:27:57,767 --> 00:28:03,385
second strings that begin with star and
the first string that begins with zero or

289
00:28:03,385 --> 00:28:08,654
one. Suppose the MPCP instance has a
solution, a sequence of indexes that yield

290
00:28:08,654 --> 00:28:14,187
string W when we use the first strings in
the pairs and the same string W when we

291
00:28:14,187 --> 00:28:19,721
use the second string. Then we can get a
solution to the constructive PCP instance.

292
00:28:19,721 --> 00:28:25,323
In this case, the solution string will be
W patterned with stars before each symbol

293
00:28:25,323 --> 00:28:33,938
of W and also at the end followed by the
dollar sign. For example. If W is one zero

294
00:28:33,938 --> 00:28:43,259
one. In a PCP instances solution is star
one, star zero, star one Star, dollar

295
00:28:43,259 --> 00:28:48,877
sign. To get the PCP solution, we use the
same sequence of indexes as in the

296
00:28:48,877 --> 00:28:54,721
solution to the MPCP instance, with two
exceptions. First, we use the index with

297
00:28:54,721 --> 00:29:00,565
the special pair as the first index.
Recall, this pair was constructed from the

298
00:29:00,565 --> 00:29:06,184
first pair of the MPCP instance with the
extra star. And we also add to the

299
00:29:06,184 --> 00:29:12,102
solution of the PC instance. The index of
the ender pair, star, dollar and dollar,

300
00:29:12,102 --> 00:29:18,193
to the end of the list of indexes. Thus
the solution to the input MPCP instance

301
00:29:18,193 --> 00:29:24,412
means that the output PCP instance also
has a solution. But the other direction

302
00:29:24,412 --> 00:29:30,554
also holds. If the PCP instance has a
solution we can modify to get a solution

303
00:29:30,554 --> 00:29:36,490
to the input MPCP instance. The first
index in the PCP solution must be the

304
00:29:36,490 --> 00:29:41,974
special pair because that's the only pair
in the PCP instance where both strings

305
00:29:41,974 --> 00:29:47,391
start with the same symbol. Change this
index to one, the index of the first pair

306
00:29:47,391 --> 00:29:52,808
of the NPCP index. Leave all the other
indexes unchanged but remove from the PCP

307
00:29:52,808 --> 00:29:58,293
solution the last index which must be the
index of the ender pair. Why? That's the

308
00:29:58,293 --> 00:30:03,810
only pair where both strings end with the
same symbol. Thus we've shown that the

309
00:30:03,810 --> 00:30:09,390
answer to the questions, does the input
MPCP instance have a solution, and does

310
00:30:09,390 --> 00:30:15,114
the output PCP instance have a solution,
are the same. That means we have a valid

311
00:30:15,114 --> 00:30:20,694
reduction from MPCP to PCP. If we can
prove MPCP is undecidable, which we'll do

312
00:30:20,694 --> 00:30:26,640
next, then we have a proof that PCP is
also undecidable. So our next task is to

313
00:30:26,640 --> 00:30:33,794
show how to reduce L sub U to NPCP. Given
an instance M and W, of L sub U, will

314
00:30:33,794 --> 00:30:39,426
convert this to an instance of, of MPCP
whose only possible solutions yield

315
00:30:39,426 --> 00:30:45,360
strings that represent the sequence of
ID's that N enters when its input is W.

316
00:30:45,980 --> 00:30:53,667
More precisely, suppose this sequence is a
sequence of IDs that M enters starting

317
00:30:53,667 --> 00:31:01,054
with the initial ID with input W. Then any
solution to the constructed NPCP instance

318
00:31:01,054 --> 00:31:06,840
will yield a pair of strings that begin a
sequence of IDs, separated by pound signs.

319
00:31:06,840 --> 00:31:11,558
However, until M enters an accepting
state, when we look at the two strings of

320
00:31:11,558 --> 00:31:16,338
the partial solution, one form from the
first strings of the indexed pairs, and

321
00:31:16,338 --> 00:31:21,179
the second from the second strings of the
indexed pairs. The second string will

322
00:31:21,179 --> 00:31:26,020
always be a full ID ahead of the first
string. Only if M accepts, will we be ab-,

323
00:31:26,020 --> 00:31:30,923
will we be able to choose pairs that make
the first string grow faster than the

324
00:31:30,923 --> 00:31:35,967
second. And eventually make the two
strings become identical. We're going to

325
00:31:35,967 --> 00:31:40,625
assume as we can that the turning machine
M from the input to L's of U has a

326
00:31:40,625 --> 00:31:45,524
semi-infinite tape and never moves left
from the initial head position, that's is

327
00:31:45,524 --> 00:31:49,396
given the actual binary string
representing M we can modify the

328
00:31:49,396 --> 00:31:54,614
represented machine to mark its left end
of tape. As we did when we described the

329
00:31:54,614 --> 00:31:59,206
construction of a turning machine with
semi-infinite tape from one that had a too

330
00:31:59,206 --> 00:32:04,697
infinite tape. Then we can perform the
reduction on the new Turing Machine that

331
00:32:04,697 --> 00:32:10,158
we know does the same as M, rather than on
M itself. However in what follows, we will

332
00:32:10,158 --> 00:32:15,422
continue to refer to the input machine as
M. The MPCP instance we construct will

333
00:32:15,422 --> 00:32:20,884
have an alphabet that consists of all the
states and tapes, and [inaudible] of M or

334
00:32:20,884 --> 00:32:25,885
rather, the modified M plus the special
marker pound sign. We assume that the

335
00:32:25,885 --> 00:32:31,083
symbols used for states and tapes and
[inaudible] of this joint, so we can tell

336
00:32:31,083 --> 00:32:37,585
one from another in the ID. Here's the
beginning of the construction of the MPCP

337
00:32:37,585 --> 00:32:43,140
instance from M and W. The first pair has
a first string that is just the pound sign

338
00:32:43,140 --> 00:32:48,498
but a second string that is the entire
first ID with input W surrounded by pound

339
00:32:48,498 --> 00:32:53,789
signs. Note that the transducer gets the
CW on its input so it can generate this

340
00:32:53,789 --> 00:33:01,061
pair. We'll add the pair ##. It lets us
add markers to the end of one ID in the

341
00:33:01,061 --> 00:33:07,193
first string at the same time we add it to
the following ID in the second string. Add

342
00:33:07,193 --> 00:33:13,181
pair XX for every tape symbol X. This pair
lets us add an X to the ID we're forming

343
00:33:13,181 --> 00:33:18,880
in the first string at the same time we
add an X to the next ID which is being

344
00:33:18,880 --> 00:33:24,793
formed on the second string. Of course in
order to make sure that the strings match,

345
00:33:24,793 --> 00:33:29,915
it must be the case that the X appears as
the first unmatched symbol of the second

346
00:33:29,915 --> 00:33:34,975
string. They call the second string as one
id as heads that these paired in effect.

347
00:33:34,975 --> 00:33:40,096
Let's copy the tape of one id to the next
id but prevents it from any changes that

348
00:33:40,096 --> 00:33:44,910
are not just [inaudible] by the move of
them. Here's a picture of copying id's.

349
00:33:44,910 --> 00:33:50,025
Suppose we have just reached a point in
the sequence of indexes when the second

350
00:33:50,025 --> 00:33:55,013
string has a complete ID more than the
first string but otherwise the strings

351
00:33:55,013 --> 00:33:59,958
match. We can add to the solution we're
constructing the pair A, A. That has the

352
00:33:59,958 --> 00:34:05,056
effect of putting the first symbol of the
new ID at the end of the first string. And

353
00:34:05,056 --> 00:34:09,790
also extending the second string by the
same symbol. That's what we want; since

354
00:34:09,790 --> 00:34:14,888
there's no way this A could change in the
next ID. We can also then add the index of

355
00:34:14,888 --> 00:34:19,743
the pair, to the solution, extending the
new ID one symbol further. This might or

356
00:34:19,743 --> 00:34:24,659
might not be the right thing to do. The
problem is that, if the move of M in state

357
00:34:24,659 --> 00:34:29,514
Q with symbol C is to move left, then the
second symbol of the new ID is the new

358
00:34:29,514 --> 00:34:34,150
state, and B will be the third symbol. But
just because we can choose the pair

359
00:34:34,209 --> 00:34:39,169
doesn't mean we have to. If M is going to
move left, there will be another choice of

360
00:34:39,169 --> 00:34:43,889
next index that simulates the move
correctly and the sequence where is chosen

361
00:34:43,889 --> 00:34:49,110
instead will fail to yield the solution to
the NBCB instance we're constructing. Now

362
00:34:49,110 --> 00:34:57,706
we need to add the pairs that reflect the
moves of M. For every straight Q and

363
00:34:57,706 --> 00:35:06,241
[inaudible] symbol X, if the move of M is
to the right, say PYR, we have. A pair in

364
00:35:06,241 --> 00:35:13,058
which Q to the left of X, that's this, can
be replaced by a P to the right of Y. That

365
00:35:13,058 --> 00:35:19,710
correctly reflects the change in ID that
occurs at the rightward move and if the

366
00:35:19,710 --> 00:35:26,199
move is to the left, then we have a family
of pairs, one for each symbol Z which

367
00:35:26,199 --> 00:35:32,400
could appear to the left of the state.
Note that we arranged that M can never

368
00:35:32,400 --> 00:35:38,799
move left from its initial position, so
there is no possibility that the state is

369
00:35:38,799 --> 00:35:45,434
at the left end of the ID if M moves left.
[sound] So if the leftward move has state

370
00:35:45,434 --> 00:35:52,224
Q and symbol X replaced by state P and
symbol Y, then for every Z, there is. A

371
00:35:52,224 --> 00:36:01,862
pair that puts P to the left of the Z, and
replaces the X, by the Y. One other

372
00:36:01,862 --> 00:36:06,718
possibility is that in the current ID the
state is at the right end of the ID

373
00:36:06,718 --> 00:36:11,512
scanning a blank we've never visited
before. If so, the state will actually be

374
00:36:11,512 --> 00:36:16,057
to the left of the pound sign. And the
pairs are almost the same but when

375
00:36:16,057 --> 00:36:21,037
constructing the second string we should
imagine an extra blank in front of the

376
00:36:21,037 --> 00:36:26,366
pound sign. The blank is replaced by the
new symbol Y of course. Following a

377
00:36:26,366 --> 00:36:33,595
previous example, suppose that in state Q,
scanning C, M does indeed move right,

378
00:36:33,595 --> 00:36:41,114
going to state P and writing E in place of
C. Had M moved left in that situation, the

379
00:36:41,114 --> 00:36:47,031
sequence of pair choices would be dead in
the water, unable to continue. However, in

380
00:36:47,031 --> 00:36:52,803
that case, we could have chosen not to use
the pair but instead, use a pair that

381
00:36:52,803 --> 00:36:58,360
incorporated the left move, and handled
the B properly. In this case, we have a

382
00:36:58,360 --> 00:37:03,916
pair with QC as the first string. So we
can match the string on the bottom. It

383
00:37:03,916 --> 00:37:09,580
also extends the bottom string with EP,
reflecting a move of M. Once the move has

384
00:37:09,580 --> 00:37:14,740
been handled we can just match symbol
against symbol here the pair dd is used.

385
00:37:15,400 --> 00:37:21,476
And once we are at the end of the ID we
use the pair of pound signs to separate

386
00:37:21,476 --> 00:37:27,628
the IDs. And we're now ready to continue
the sequence of pair choices to copy the

387
00:37:27,628 --> 00:37:36,041
new ID, which is abed. And make it change
by another move. But we need some more

388
00:37:36,041 --> 00:37:41,201
pairs in the MPCP instance so that in case
M accepts we can find a solution. Note

389
00:37:41,201 --> 00:37:46,169
that if M never accepts then only the
rules given so far can ever be used and

390
00:37:46,169 --> 00:37:51,584
these can never lead to a solution because
the second string is always one id longer

391
00:37:51,584 --> 00:37:59,702
than the first. However, if M answers the
final state F. Then it is possible for the

392
00:37:59,702 --> 00:38:06,652
F to let's say, eat the neighboring
symbols of an ID. We no longer have real

393
00:38:06,652 --> 00:38:13,368
IDs of M but it doesn't matter. We
simulated M enough to know that M accepts

394
00:38:13,368 --> 00:38:20,349
W and our only concern then is to make
sure that there's a solution to the MPC

395
00:38:20,349 --> 00:38:28,549
instance based on M and W. Thus we add to
the instance of MPCP pairs XFYF. Fyf and

396
00:38:28,549 --> 00:38:38,391
XF, F for all tape symbols. X and Y. Using
these pairs, together with pairs of the

397
00:38:38,391 --> 00:38:44,490
form XX, the copy products of IDs as we
have done. We eventually get to an ID that

398
00:38:44,490 --> 00:38:50,288
is only the state of F. Of course, that
isn't an actual ID of M, but it doesn't

399
00:38:50,288 --> 00:38:55,861
matter anymore. And one more pair, F
pound, pound, paired with the pound sign,

400
00:38:55,861 --> 00:39:01,358
will end the two string being formed,
making them the same, and yielding a

401
00:39:01,358 --> 00:39:07,457
solution to the MCPC instance. So, here's
an example where M has entered the final

402
00:39:07,457 --> 00:39:14,008
state F, And the sequence of choice. Is
either copying a symbol, or allowing f to

403
00:39:14,008 --> 00:39:20,376
eat the adjacent symbol or symbols.
Eventually leads to identical strings. So

404
00:39:20,376 --> 00:39:27,079
here's what it looks like. Well we just
have to copy that a. But now, the f can in

405
00:39:27,079 --> 00:39:33,531
effect eat the b and c adjacent to it.
We've got to copy the d. We got to copy

406
00:39:33,531 --> 00:39:40,369
the e. We have to copy the pound sign.
Now, the f can eat the a and the d. But we

407
00:39:40,369 --> 00:39:47,116
have to copy the E. Have to copy the pound
sign. Now F doesn't have anything that it

408
00:39:47,116 --> 00:39:57,497
can eat to its left but it still will eat
the E. And finally, Copy the pound sign

409
00:39:57,497 --> 00:40:04,556
and now, F pound, pound pay with pound
makes everything evened up, and we have

410
00:40:04,556 --> 00:40:11,336
our solution. Now we know that PCP is
undecidable because we successfully

411
00:40:11,336 --> 00:40:18,578
reduced L sub too, it's O language and
some PCP. And we're going to reduce PCP to

412
00:40:18,578 --> 00:40:24,415
the problem is a context free grammar
ambiguous. Then, for the first time, we'll

413
00:40:24,415 --> 00:40:29,342
have an undecideable problem that arose
naturally. That is, not because we were

414
00:40:29,342 --> 00:40:34,459
looking for undecideable problems. Before
we can talk about any problem involving

415
00:40:34,459 --> 00:40:39,323
grammars, we need to find a code for
grammars using a finite alphabet, just as

416
00:40:39,323 --> 00:40:43,871
we did for PCP. So to start the encoding
of grammars, let's represent the

417
00:40:43,871 --> 00:40:49,446
[inaudible] terminal by symbol A followed
by I and binary. That may look. Like with

418
00:40:49,446 --> 00:40:54,615
us forgetting what the actual terminal
symbols are, but since were interested in

419
00:40:54,615 --> 00:40:59,979
ambiguity we can rename the terminals any
way we like as long as we don't name two

420
00:40:59,979 --> 00:41:04,954
terminals the same, and the ambiguity or
un-ambiguity of the grammar will not

421
00:41:04,954 --> 00:41:10,253
change. Then we'll represent the I-theorem
by capital A followed by I in binary, we

422
00:41:10,253 --> 00:41:16,600
will assume that A-1 is always the start
symbol. The right arrow between the head

423
00:41:16,600 --> 00:41:21,439
and body of a production will be
represented by itself. Likewise, the c,

424
00:41:21,439 --> 00:41:26,900
commas separating productions and the
symbol epsilon will represent themselves.

425
00:41:27,700 --> 00:41:37,399
For example, this grammar is represented
by this string right here. The bar

426
00:41:37,399 --> 00:41:44,800
connecting alternative bodies for S has
been expanded so that there are two

427
00:41:44,800 --> 00:41:56,498
separate productions. Okay, this. Right
here, represents, this represents S goes

428
00:41:56,498 --> 00:42:06,115
to 0S1. And this word wrap presents S goes
to capital A. Finally this represents

429
00:42:06,115 --> 00:42:13,354
capital A goes to little c. Suppose we
have a PCP instance with k pairs and let

430
00:42:13,354 --> 00:42:19,575
the ith pair be wixi. We need k index
symbols to represent the numbers of pairs

431
00:42:19,575 --> 00:42:25,953
and we'll use A1 through Ak which we may
choose to symbols that do not appear in

432
00:42:25,953 --> 00:42:31,859
the PCP instance itself. Notice that we
are going to talk about reducing an

433
00:42:31,859 --> 00:42:38,395
unquoted instance of PCP. So it could have
any alphabet to an uncoded grammar which

434
00:42:38,395 --> 00:42:44,612
could also have any symbols. However, the
process we describe really takes place

435
00:42:44,612 --> 00:42:51,143
with all the symbols coded in the manners
we have described. It is easier to follow

436
00:42:51,143 --> 00:42:57,358
the construction in the uncoded form so
that's what we'll do. The list language

437
00:42:57,358 --> 00:43:03,417
for the first half of each pair, the
strings W1 through WK, has a context-free

438
00:43:03,417 --> 00:43:09,790
grammar with productions. A goes to WIAaI.
It also has the same sort of production

439
00:43:09,790 --> 00:43:16,356
that, but with A omitted from the body.
The latter kind of productions, these, are

440
00:43:16,356 --> 00:43:23,492
used in the derivations. All the strings
in this language which are some sequence

441
00:43:23,492 --> 00:43:30,924
of the w I's with repetitions allowed and
in any order with the corresponding index

442
00:43:30,924 --> 00:43:38,355
symbols following but in reverse for
example here's a derivation, okay, a could

443
00:43:38,355 --> 00:43:47,003
derive in one step, let's say w1a, little
a1. And then, in one more step, we could

444
00:43:47,003 --> 00:43:53,480
use, let's say the terminal production for
W2s. We get W1, W2, and then A2, A1.

445
00:43:53,480 --> 00:44:00,571
Notice that the sequence of A's is the
reverse of the sequence of W's. And we can

446
00:44:00,571 --> 00:44:07,661
do the same thing with the second string
from each pair. We'll use variable B, and

447
00:44:07,661 --> 00:44:13,890
XIs in place of WIs. But the idea is the
same. The language of strings generated

448
00:44:13,890 --> 00:44:19,347
from A and B each consists concatenation
of strings either from the WIs of its A,

449
00:44:19,347 --> 00:44:24,603
or the XIs of its B, and these are the
first or second components of the pairs.

450
00:44:24,603 --> 00:44:29,925
They are followed by the reverses of the
sequence of indexes of the pairs from

451
00:44:29,925 --> 00:44:35,940
which these strings came. Here's an
example of a PCP instance over alphabet

452
00:44:35,940 --> 00:44:43,069
AB. We'll use one, two, and three as the
index symbols. So the grammar for a start

453
00:44:43,069 --> 00:44:49,294
symbol A is this, for example the second
pair whose first string is BAA, gives rise

454
00:44:49,294 --> 00:44:55,596
to these two productions. Here's BAA with
the variable A in the middle and then two

455
00:44:55,596 --> 00:45:01,289
which is the index, and then there's
another production that's the same but

456
00:45:01,289 --> 00:45:07,432
without the capital A. And here is the
grammar for the second strings in the pair

457
00:45:07,432 --> 00:45:13,496
with start symbol B. As an example if we
choose the three pairs in order one, two,

458
00:45:13,496 --> 00:45:22,100
three then there is a derivation from A
that looks like this. So A, use the first

459
00:45:22,400 --> 00:45:30,743
Choice. That's this production. Then we'll
use the second choice which is this

460
00:45:30,743 --> 00:45:38,672
production and that gives us A, B, A, A
capital A the two and the one. And then

461
00:45:38,672 --> 00:45:47,226
we'll use the third pair but we wanna end
it and so we'll use that production and

462
00:45:47,226 --> 00:45:57,266
that gives us A, B, A, A, B, A, A sorry B,
B, A [sound]. It?s that. And then three,

463
00:45:57,266 --> 00:46:05,805
two, one. We're now ready to show how to
reduce PCP to the ambiguity problem. Given

464
00:46:05,805 --> 00:46:11,205
the PCP instance, construct the two list
languages with variable A for the first

465
00:46:11,205 --> 00:46:16,336
strings of the pairs and B, for the second
strings of the pairs. Then, add the

466
00:46:16,336 --> 00:46:21,669
productions S goes to A and S goes to B.
Of course, S is the start symbol of the

467
00:46:21,669 --> 00:46:26,523
grammar. 'Kay, we'll show the resulting
grammar is ambiguous if and only if

468
00:46:26,523 --> 00:46:31,700
there's a solution to this instance of
PCP. But first let's look at an example

469
00:46:31,700 --> 00:46:37,208
that should actually expose how the proof
in general works. 'Kay, here's the grammar

470
00:46:37,208 --> 00:46:41,986
constructed from the PCP instance that we
saw earlier. It is the sets of

471
00:46:41,986 --> 00:46:46,961
A-productions and B-productions we saw
plus the two productions for S. 'Kay,

472
00:46:46,964 --> 00:46:52,057
notice there is a solution one, three.
That is, the first pair was A, A, B. And

473
00:46:52,057 --> 00:47:00,787
the third pair was BBA with BA. When we
take the first strings of each pair, we

474
00:47:00,787 --> 00:47:09,740
get ABBA, this followed by that. And when
we take the second strings of each pair,

475
00:47:09,740 --> 00:47:18,582
AB and BA, we also get ABBA. And this
common string followed by the index string

476
00:47:18,582 --> 00:47:25,748
in reverse, which is that. You have two
left most derivations. One starting with a

477
00:47:25,748 --> 00:47:32,038
and the other starting with b. Here is the
derivation starting with a. [sound]. And

478
00:47:32,038 --> 00:47:37,684
here's the derivation starting with B. So
here's the proof this construction is a

479
00:47:37,684 --> 00:47:43,051
correct reduction from PCP to ambiguity.
First, suppose the PCP instance has a

480
00:47:43,051 --> 00:47:48,766
solution say represented by the sequence
of index symbols A1 to Ak. Note there can

481
00:47:48,766 --> 00:47:54,482
be repeats on this, in this sequence. And
W1 to Wk is the same string as X1 through

482
00:47:54,482 --> 00:48:00,679
XK, that's what it means for this, for
this index sequence to be a solution. Then

483
00:48:00,679 --> 00:48:06,066
the two left most derivations with the
string that is W1 through WK or

484
00:48:06,066 --> 00:48:12,135
prevalently it's X1 through XK and is
followed by AK down to A1. One starts with

485
00:48:12,135 --> 00:48:18,675
S it goes to A and the other starts with S
and goes to B. For the converse, we're

486
00:48:18,675 --> 00:48:23,293
going to show that if there are two
leftmost derivations of the string of

487
00:48:23,293 --> 00:48:27,974
terminals, then one must begin. S goes to
A, the other must begin, S goes to B.

488
00:48:27,974 --> 00:48:33,154
Suppose there are two leftmost derivations
that begin with S goes to A. Then we can

489
00:48:33,154 --> 00:48:38,334
look at the sequence if index symbols, and
the resulting terminal string, in reverse,

490
00:48:38,334 --> 00:48:42,890
and learn exactly the sequence that
productions used. Except for the last

491
00:48:42,890 --> 00:48:47,882
production used, each should be the unique
A production that generates the index

492
00:48:47,882 --> 00:48:53,808
symbol, and has an A in the body. And the
final production would be the one wi, with

493
00:48:53,808 --> 00:48:58,965
the last, that is left most index symbol,
and no a in the body. The same idea w

494
00:48:58,965 --> 00:49:04,256
works for derivations that start with s
goes to d. There can be only one with a

495
00:49:04,256 --> 00:49:10,812
given sequence of index symbols. Here's an
example. If a derivation starts with S

496
00:49:10,812 --> 00:49:17,154
goes to A and produces a terminal string
with index sequence 2321 and here's look

497
00:49:17,154 --> 00:49:23,110
at derivation [inaudible] has to look
like, okay. First of all we start with A.

498
00:49:23,110 --> 00:49:28,668
The first production used must generate
the one at the right end. And we need to

499
00:49:28,668 --> 00:49:34,436
keep the A, so this is the only choice. So
we've generated W1 off to the left of the

500
00:49:34,436 --> 00:49:39,647
A. The next index symbol from the right
end is a two. So we have to use this

501
00:49:39,647 --> 00:49:45,137
production. Again, we still need to keep
that A there, because we have more index

502
00:49:45,137 --> 00:49:50,249
symbols to go. Then, the third from left
index symbol is three, so this is the

503
00:49:50,249 --> 00:49:55,324
production we have to use. And then the
last index symbol is two, that is the

504
00:49:55,324 --> 00:50:00,686
left-most index symbol. Is two. So we're
gonna get rid of that A, and use this

505
00:50:00,686 --> 00:50:06,389
production. That's the only way we could
generate a string with 2321 at the end,

506
00:50:06,389 --> 00:50:12,381
and no other index symbols. We can prove
many things about context free grammars to

507
00:50:12,381 --> 00:50:18,374
be undecideable as well. But to do so, we
need to show that the complement of a list

508
00:50:18,374 --> 00:50:23,457
language is also context free. Remember
that the compliment of a context free

509
00:50:23,457 --> 00:50:28,423
language need to not be context free, but
fortunately these are. For the proof, we

510
00:50:28,423 --> 00:50:33,450
find it easy to construct the push down
automaton, in fact the deterministic push

511
00:50:33,450 --> 00:50:38,478
down automaton, for the compliment. The
PDA reconstruct will start with the bottom

512
00:50:38,478 --> 00:50:44,800
marker on its stack. At first, some
symbols from the PCP instance, not indexed

513
00:50:44,800 --> 00:50:51,089
symbols, will appear. The PDA simply
pushes them onto the stack. After the

514
00:50:51,089 --> 00:50:56,928
first index symbol arrives, start checking
that the proper strings appear on the

515
00:50:56,928 --> 00:51:02,841
stack. That is, if we see an index symbol
AI, then check that the proper WI appears

516
00:51:02,841 --> 00:51:08,753
on the stack with the right end of WI at
the top. The PDA pops all the symbols of

517
00:51:08,753 --> 00:51:13,618
WI if so. Note that we're assuming this is
the complement of the list language based

518
00:51:13,618 --> 00:51:18,052
on the first components. If it is for the
second components, we'll check that XI

519
00:51:18,052 --> 00:51:22,316
appears on the top of the stack. Again,
with the right end at the top. What we

520
00:51:22,316 --> 00:51:26,301
seem to be doing is accepting the list
language itself, rather than the

521
00:51:26,301 --> 00:51:30,734
complement. But I didn't tell you when the
PDA accepts. In fact, it accepts every

522
00:51:30,734 --> 00:51:35,280
input with only the exception of when it
has found a string in the list language.

523
00:51:35,800 --> 00:51:41,055
That is, if the input so far was some
sequence of PCP symbols followed by some

524
00:51:41,055 --> 00:51:46,310
sequence of index symbols. And these
sequences are not empty and the bottom of

525
00:51:46,310 --> 00:51:51,733
stack marker is now exposed then the PDA
does not accept this string. It will

526
00:51:51,733 --> 00:51:56,674
accept any other string that follows,
which cannot then also be a solution to

527
00:51:56,674 --> 00:52:01,615
the PCP instance. Now we can use the
compliment language. Let LA and LB be the

528
00:52:01,615 --> 00:52:07,945
list languages for the first and second
components of an instance of PCP. And let,

529
00:52:07,945 --> 00:52:14,790
LA complement and LB complement be their
complements. Now all four of these

530
00:52:14,790 --> 00:52:19,994
languages are context-free languages.
Let's consider the union of the complement

531
00:52:19,994 --> 00:52:25,003
languages. This is also a context-free
language by the fact that context-free

532
00:52:25,003 --> 00:52:30,601
languages are closed under
complementation. I claim that this union

533
00:52:30,601 --> 00:52:36,560
equals Sigma star, where Sigma is the
alphabet consisting of all the symbols

534
00:52:36,560 --> 00:52:42,675
used by the languages involved, if and
only if there is no solution to the PCP

535
00:52:42,675 --> 00:52:49,104
instance. Suppose there were a solution,
say represented by the index symbols a one

536
00:52:49,104 --> 00:52:56,200
through a n. Then this string, yes, is not
in the complement of L a. And the equal

537
00:52:56,200 --> 00:53:04,557
string, this, is not in the compliment to
be. Thus if there is a solution there is a

538
00:53:04,557 --> 00:53:15,437
string missing from the union. Conversely
a suppose string, y and down to A1, let's

539
00:53:15,437 --> 00:53:29,767
say this. [sound], [sound]. Is missing
from the union. That means Y is the string

540
00:53:29,767 --> 00:53:35,241
you get by using indexes A one through A N
and taking the first strings from the

541
00:53:35,241 --> 00:53:40,857
index pairs. And it's also the string you
get by doing the same with the second

542
00:53:40,857 --> 00:53:46,834
pairs. That means A1 through A N is the
solution to the PCP instance. We have now

543
00:53:46,834 --> 00:53:52,742
reduced PCP to the question, is the given
context for language equal to all strings

544
00:53:52,742 --> 00:54:00,553
over its terminal health event. Thus this
problem is undecidable. Here's another

545
00:54:00,553 --> 00:54:06,672
undecidable question about context free
languages. Is the language also a regular

546
00:54:06,672 --> 00:54:12,716
language? We do exactly the same reduction
from PCP and one direction is easy, if

547
00:54:12,716 --> 00:54:18,987
there's no solution then we just showed
that the union of the compliment languages

548
00:54:18,987 --> 00:54:27,262
is Sigma Star and that's surely a regular
language. But we also need to show the

549
00:54:27,262 --> 00:54:33,140
converse, that if L, the union of the
complement languages, is something other

550
00:54:33,140 --> 00:54:39,068
than Sigma star, then it's not a regular
language. Suppose we have a solution to

551
00:54:39,068 --> 00:54:44,628
the PCP instance. And, in particular, let
X be the index symbols in reverse. That

552
00:54:44,628 --> 00:54:50,402
gives you the solution, and let W be the
string you get by using the first string

553
00:54:50,402 --> 00:54:56,461
from each of the index pairs in order. Or
obviously equivalently, the second string,

554
00:54:56,461 --> 00:55:02,306
from the same, pairs. Now, let H by the
homomorphism defined by, H of zero is W,

555
00:55:02,306 --> 00:55:08,809
and H of one is X. We claim that H of the
string zero to the N, one to the N is also

556
00:55:08,809 --> 00:55:14,672
an L. Gotta, remember, remember, L is the
union of the compliments for any N.

557
00:55:14,672 --> 00:55:21,257
Because the repetition of a solution to a
PCP instance is also a solution to that

558
00:55:21,257 --> 00:55:27,816
instance. However, H of Y can't be an L
for any other Y. This point requires a

559
00:55:27,816 --> 00:55:33,672
little thought. First, if Y isn't 0's
followed by 1's, then H of Y isn't to the

560
00:55:33,672 --> 00:55:39,757
right form to be a solution to PCP. That
is, it will not consist the PCP instance

561
00:55:39,757 --> 00:55:46,411
and both followed by index symbols. But if
Y consists of N 1s preceded by a different

562
00:55:46,411 --> 00:55:52,022
number of zeroes, and it can?t possibly
represent a solution either. H applied to

563
00:55:52,022 --> 00:55:57,633
the N 1s gives a particular sequence of
index symbols but we know that the PCP

564
00:55:57,633 --> 00:56:03,457
instant symbols of the sequence of index
symbols corresponds to is H applied to N

565
00:56:03,457 --> 00:56:09,070
0s therefore it couldn't also be H applied
to a different number of 0s. So, if, L,

566
00:56:09,070 --> 00:56:14,915
again, the union of the complements, were
regular, then H inverse of L would also be

567
00:56:14,915 --> 00:56:20,618
regular, Because regular languages are
closed under inverse homomorphism. And the

568
00:56:20,618 --> 00:56:25,821
complement of H inverse L would be
regular, because regular languages are

569
00:56:25,821 --> 00:56:31,809
closed in their complementation. But this
language we just argued is the set of zero

570
00:56:31,809 --> 00:56:38,756
to the N1 to the N, such that N is at
least one. Okay, and of course we know

571
00:56:38,756 --> 00:56:42,780
this language [sound] not to be regular.