We're not changing the subject matter considerably. Until now, I've been talking about simple models, such as finite automata, or context free grammars, that let us model mechanisms like communication protocols, or processes like the parsing of programming languages. Today we begin to study the other side of automata theory, the part that lets us show certain tasks are impossible, or in some cases, possible but intractable. That is, they can be solved but only by very slow algorithms. It might not be obvious but just because we can state a problem doesn't mean there is any algorithm at all that can solve it. For example, can you devise an algorithm to tell whether the complement of a context free language is empty? That is, does a grammar generate all strings over its terminal alphabet? It turns out you can't. And we're going to learn the theory that lets us prove facts of this kind. We begin with the central theme of the study. Of what can be solved by computer algorithms. The idea that all data can be represented by integers. The idea of countable sets. Those whose members can be assigned integers one, two, three, and so on, is essential here. We're going to meet the touring machine which in some sense the ultimate automaton. It is a formal computing model that can, can compute anything we can do with the computer or with any other realistic model that we might think of as computing. Turing machines define the class of recursively innumerable languages which is thus the largest class of languages about which we can compute anything. There's another smaller class called the recursive languages that can be thought of as modeling algorithms. Computer programs that answer a particular question and then finish. I realize the concepts are vague at this point, but we'll get to it all in good time. If you took a modern course on programming you're undoubtedly introduced to the importance of data types or classes. >> But when you look under the hood, how these types are represented inside the computer. You see that there is only one type. Strings of bits. We could base our whole theory on strings of bits, but it is more convenient to convert binary strings to integers. We can almost see how to do that since integers can be represented in binary notation, but there is a small glitch we'll get to. And to make the idea that all the world is integers even more pervasive, remember that programs that computers execute are also, under the hood, just strings of 0's and 1's, they therefore can be regarded as integers. That lets up talk about the hundredth program and things like that. As we shall see, the fact that programs and data are at heart the same thing is what let's us build a powerful theory about what is not computable. As a simple example of a type that is really integers, let's look at character strings. Strings of ASCII characters can be thought of as binary strings, eight bits to a character. Or if you use Unicode, you need sixteen bits per character, but the idea is the same. Strings in any alphabet can be represented by binary strings. And binary strings are really integers. Does it make sense to talk about the ith string in any i. However, we have to be careful how we do the conversion from binary strings to integers. Because binary strings can have leading zeroes, there is not a unique representation of an integer as a binary string. More to the point, if you do the obvious conversion, then strings like the ones shown, all seem to be the integers five. But we can't call two or more different strings by the same integer. However, there is a correspondence between binary strings and integers that is one to one, and almost as simple. Put a one in front of any binary string, and then treat it as an integer. This way, no two strings can correspond to the same integer. For example, if we're given string 1-0-1. Then put a one in front of it, and treat 1101 as a binary integer. And you'll get thirteen, by the way. If you put a one in front of 0101, then this integer is 21, and so on. For a wilder example consider images as a data type. There are many representations of images. Let's talk about gift for a [inaudible] example. A gift file is just an ASCII string. So we know how to convert ASCII to binary. And then cover the binary string into an integer in the way we just discussed. And bingo, you have a notion of the [inaudible] image. Here's another wild example of proofs. A proof can be viewed as a sequence of expressions or assertions in mathematical form, each of which is, are the given or follow some previous statements in the sequence per certain logical rules. We can encode mathematical expressions in Unicode, it has pretty much any symbol you use in mathematics. But we can convert any expression to a binary string, and hence to an integer. There is, again, a small glitch. A proof is a sequence of expressions, not a single expression. So we need to fix on a way to separate a sequence of expressions. We also need to indicate whether an expression is given, or follows from previous expressions. We thus need to introduce two new symbols into our strings. One is a comma to separate expressions. We can't just use the Unicode comma, because the expression themselves may use this symbol. The other is a marker that says this is a given expression. The following technique is useful not only here but in general as a way to introduce new symbols with any meaning into binary strings while still keeping the strings binary. First, given a binary string, expand it by inserting a zero in front of every byte of the string. For example. 101 becomes 01 that's the first one, 00 that's the zero in the middle, and then 01. Now as a consequence of this change strings have no consecutive 1s. Use strings of two or more 1s as the new symbols, for example we'll use 111 to mark a given expression. And one, one, as the marker for the end of an expression. Here's an example of what a proof as a binary string could look like. The initial one, one, one, can only mean that expression to follow is a given expression. Here is the expression itself. It was originally one zero one but we expanded it with a zero in front of each symbol. Of course one zero one is too short to really be an expression since even one unit code character takes sixteen bytes, but not let's, let's not worry about that. This one, one marks the end of the first expression. Notice that this one cannot be part of a special marker since all expressions without the special markers have even length and this one is the sixth symbol in the expression. Here's another given marker and another expression. This one was zero, zero one, one with zeros embedded. And another expression ender and so on. As we said, programs in a language like C or Java are also a data type and can be represented by integers as if they were data. First, represent the program as an ASCII [inaudible], or in some other alphabet like Unicode, if the language requires it. And then convert the character string to a binary string, and then to an integer. Now we can talk about the [inaudible] program. And this should worry, there really aren't more programs than there are integers. While that number is infinite, you may be aware that there are different orders of infinity, and the integers of the smallest one. For example, there are more real numbers than there are integers, so there are more real numbers than there are programs. That immediately tells you that some real numbers cannot be computed by programs. We'll get to more concrete explanation of what the [inaudible] program really means shortly. We have an intuitive notion of what a set of [inaudible] are intimate. It says a finite if there is a particular integer that is the count of numbers on the set. The term for the count of the number of members is cardinality. For example, a set containing A, B, and C is a finite set, and its cardinality is three. And the formal definition of a finite set is one for which it is impossible to find a one to one correspondence between the members of the set, and a proper subset of that set. Well, actually, the formal definition of an infinite set is one for which there is a one to one correspondence between its members and a proper subset. Then. Finite sets are formally the sets that are not infinite. For example, the positive integers are an infinite set. We can let the proper subset be the even integers. And the one to one correspondence matches one with two, two with four, three with six, and so on. Every positive integer I is matched with the unique even integer, 2I. A countable set has a one to one correspondence with the positive integers. For example, the set of all integers, positive and negative, is countable. A correspondence maps zero to one minus I to plus 2I for all I equal to or greater than one, and plus I to 2I plus one for all I equal to or greater than one. As a consequence, the positive and negative integers get [inaudible] in the ordering as 0-1, 1-2, two and so on. The binary strings are also countable. We saw how to assign a unique positive integer to each binary string. By putting a one in front. And treating the result as a binary integer. Likewise, the Java programs can be put in one-to-one correspondence with binary strings, and then with integers. Many of these integers represent flawed Java programs, things that don't compile. But that is not important. The important thing is that every working Java program has you, a unique integer. A rather surprising point is that pairs of integers can be put in one-to-one correspondence with the integers themselves. A simple way to see this is to explain the ordering of the pair as first pair, second pair and so on. The ordering we want is first by sum and then for pairs with the same sum by first component. Thus one, one comes first. It is the only pair with a sum of a two. There are two pairs with a sum of three, these guys. And we put two one ahead of one two because it has the higher first component. Then come three pairs with sum four. These three, again ordered by the first component. You can, in fact, figure out a function F of IJ, such as the pair of IJ is the F of IJ in that order. But the important point is that some such function exists. And therefore, it makes sense to talk about the [inaudible] pair of integers. We call the one to one correspondence between a countable set and the positive integers and enumeration of the set. Thus, strings, programs, proofs, and pairs of integers, for example, have enumerations. Now, let's look at the set of all languages over some fixed alphabet, say, 01. Could this set be countable? The answer is no and we can prove it. Suppose that we are possible to innumerate the languages so every language over zero one. Was the ith language for some i. We already know how to enumerate binary strings. So we can talk about the ith binary string. Now, I'm going to define a language L. L is the language containing those binary strings W such that W is the Ith binary string. That must be true for some I. And W is not in the Ith language. L is really a language over alphabet 01. Since we assume that the languages over zero one are innumerable. That means that for some j. L is the j language. So let X be the J string. Surely some string is the J in the enumeration of binary strings. Now we can ask is X in L? It turns out that if it is, then it isn't and vice versa. To see why, remember the definition of L. L contains W if and only if W is not. In the language that corresponds to the same integer I that W corresponds to as a string. So let's focus on the case where W is X, the Jth string. In that case, the value of I is J because we know X is the Jth string. Further we know that L is the Jth language so both L, the language being defined, and the quote Ith language mentioned in the definition of L. Are all the same. And they are L J. Thus when we read the definition of L as a reply is to X, it says, that X as in L, which is Ls of J. If and only if, X is not in L, again, which is Ls of J. We have a contradiction. X is neither in L nor not in L. That's an impossible situation. We conclude that our assumptions are wrong. The only unproved assumption we made. Was that we could innumerate the languages over zero one. So that is false. We can't enumerate such languages. As a result, we see that there are more languages than programs, since the programs can be enumerated. A particularly bad consequence of this fact is that there are languages for which no program can tell whether or not a given string is in the language. On the bright side, none of these strange languages are context free or therefore not regular, Since the CYK algorithm gives us a membership testing program for any context-free language. The process of coming up with the language L that can't be in any enumeration is called diagonilization. The reason should be apparent from this picture. Imagine an infinite matrix in which the rows correspond to languages, and the columns to binary strings. A one in the entry for row I in column J means that the Jth string is in the Ith language. Zero means it is not. If we could innumerate languages then we could create such a table. Look at the entries along the diagonal and complement each one. That is, replace zero by one and vice versa. Here's what it looks like. A complement in diagonal rotated 45 degrees looks like a language. It tells which strings are its members. Here strings one and two are not in the language, the third and fourth are and so on. But this language cannot be any row because it disagrees with each row in at least one position. In particular it disagrees with the ith row in the ith position. We're now ready to look at the theory of Turing Machines. One important purpose of this theory is to prove certain particular languages to have no membership algorithm. The first step is to prove certain languages about Turing Machines themselves, not to have membership algorithms. We're then going to introduce the important notion of a reduction. From one problem to another. These are proved through the form. If there is an algorithm for problem P than there is an algorithm for problem Q. We'll draw this as Q is reduced to P. If we already know that there's no algorithm for q, then how could there be a algorithm for p. Because, if there were an algorithm for p, then this reduction would give us an algorithm for q. By use of reductions, we don't have to resort to diagonalization or another trick to prove that P has no algorithm. Here is the picture of a Turning Machine which you should keep in mind. There is a state which like all other automata can only be in one of a finite number of states. There is a tape, this. Infinite in both directions, and partitioned into squares, each of which can hold a symbol of a finite tape alphabet. There's a tape head. And always points to one of the tape squares. The touring machine makes move just like the automaton or the push down automaton. For the touring machine. The move is determined by the state and by the tape symbol under the [inaudible]. In one move, the touring machine can change its state. Write a new tape symbol over the old one. In the square [inaudible] its scanning. And move the head one square left or right. We might ask why we introduce the turn machine, rather than representing computation by C programs. While you can develop a theory around programs without using turn machines, it would be much harder. The thing that we'll come to appreciate about turn machines is that they are so simple in their operation compared to programs in a programming language or even real computers. That wouldn't matter if touring machines could not mimic real computers, but they can, as we shall argue. And in fact, one could argue that, because Turing Machines have infinite storage capacity on their tape, they are even more powerful than computers, since computers always have a finite amount of storage, however large that may be. But the difference is not essential, since, in principle, we could always by more storage for a computer. One good counter argument is the universe is finite so where are you going to get the atoms from which to build all of those discs? But even if you accept that the universe is finite the limitation doesn't seem to be effecting what we can compute in practice. So we're never going to argue that a computer is weaker than a Turing machine in a meaningful way. So here is the formula [inaudible] use for Turing machines. There's a finite set of states. We follow our tradition in using q for this set. There's a finite input alphabet for which we use the traditional sigma. There's a [unknown] alphabet, gamma, which always includes the input output alphabet. The transition function is as always delta, and we'll talk about how that works next. There is a start state Q0, a member of the set of states Q. Again, this is as before. There is a blank symbol which we usually represent by capital B. This symbol is in the tape alphabet, but is never an output symbol. There is a set of final states. F, as usual. There are several conventions about letters we shall use. They are consistent with our earlier conventions about finite and push down automata. Lower case letters at the beginning of the alphabet are still symbols of the input alphabet. Now capital letters at the end of the alphabet are tape symbols which might or might not be input symbols. Lower case letters at the end of the alphabet or strings of input symbols, again, that's as usual. And Greek letters at the beginning of the alphabet or strings of tape symbols which again may include some input symbols. Now let's see the transition function delta for a turn machine. A turn machine is deterministic and moreover does not have to have a move in any situation. Delta takes two arguments, the state and the symbol scan by the tape head. Delta of QZ for state Q and tape symbol Z. Can be undefined. In which case, the touring machine can make no more moves. Scanning a z in state q. And the touring machine is set to halt. But if delta of Q and Z is defined. Then it is a triple P Y D, where P is the new state. Y is the symbol that replaces Z in the square being scanned on the tape head. And D is the direction for the tape head to move. This direction is either L for left or R for right. And the move is by one square. Here is an example of a touring machine. It's input symbols are zero and one. The touring machine scans right on it's input looking for a one If it finds a wanted change is it's a zero, and goes to a final state F and halts. However if it sees the blank symbol before seeing a one, it changes the blank to one and, and moves one square left repeating the process just described. There are two states for the Arcturing machine, Q and F. Q is the start state, and F is the final state. As we mentioned, the input alphabet is 01. The tape symbols are only 01 and the blank B. One move of the turn machine is given here. It says that in state Q, scanning a zero. In state Q, scanning zero, it stays in state Q. Leaves the zero on the square it was scanning and it moves right. Another rule says that if it is in state Q and sees a one. It goes to final state F. Replaces the one by a zero and moves right. It doesn't matter which way the head moves in this situation since it has accepted its input, but it always has to move left or right, so we'll say right. The last move of the touring machine is this. It says that in state Q, while scanning a blank, it replaces the blank by one and moves left, staying in state Q. Here's the beginning of a moving picture of this touring machine. The three moves are shown in the upper right with the move that. Is applicable in a given situation shown in red. Here, the touring machine has just started. It is in a start state and the tape head is at the left end of the input which is 00 in this case. That is, here's the input. The rule in red says that in state Q scouting zero, it moves right and leaves the state and symbol unchanged. So we'll see that on the next slide. Here it's done that. And it is in the same situation, again scanning a zero, so it again moves right. Now it sees a blank on its tape so another rule is applicable. This rule says that stay in state Q and replace the blank by one and move left. So it's made those changes and the first rule is applicable again. The touring machine is going to move right while leaving the state and symbol unchanged. Here the second rule applies. In state Q it is scanning a one so it goes to the final state F, replaces the one by zero and moves right. Here, we see the effect of the last move. The Turing Machine has no move for when it is in state F scanning a blank, so it halts. It has also accepted, since F is the, in a final state. Moving pictures of a Turing Machine can get a bit cumbersome, so we're going to develop an instantaneous description or ID notation for Turing Machine, similar to what we use for push down automata. But in a sense, the touring machine I.D.s are even simpler because the input and paved contents are combined into one stream. Moreover, we represent the state in the head position by imbedding the state to the left of the symbol being scanned. We'll always assume that state symbols are chosen so they do not conflict with tape symbols and thus we can always tell which symbol is the state. As we suggested in our little example, the touring machine gets its input on the tape. The input, which is always a string of input symbols, is placed on the tape surrounded by an infinity of blanks to the left and to the right. The touring machine begins in it's start state with the head at the left most input symbol. If the input string happens to be epsilon then the touring machine's tape is entirely blanks and the head is scanning one of them. An idea is a string of the form alpha Q beta. Here alpha beta is the contents of the tape between the left most and right most non-blank symbols. That is, the tape consists of an infinity of blanks. Alpha, beta, and another infinity of blanks. We should observe that after any finite number of moves, the Turing machine has only visited a finite number of tape squares. So they can only be a finite number of non blanks. Thus, we always have a finite representation for the tape, even though it is infinite in extent. The state is Q, and it is placed immediately to the left of the symbol the head is now scanning. Remember we assumed state symbols were chosen, so none of them are taped symbols. Therefore, we can always identify Q. As a special case Q can be at the right end of beta. If so, it means that the symbol being scanned is a blank. This can only occur if the Turing machine has just moved to the right, and there are only blanks to the right. We use the turn-style notation, as this, to represent one move. And we use the turn-style star This to represent any number of moves including zero moves. These are exactly as for PDAs. Here's an example which corresponds to the earlier touring machine example but now using the ID notation. We start off in the start state with a state to the left of the input. At the first move, the touring machine moves right, here, and then right again. Now it is scanning a blank, and it's action was to replace the blank by a one and move left. Notice. That the state is now two symbols to the left of the end and it's scanning what used to be the right most zero, that is this guy. It, it again moves right and when it sees the one, it enters state AF and again moves right, changing that one to a zero. Now we'll discuss the turnstile relation formally. We have to discuss what happens on rightward moves separately from what happens on leftward moves. So suppose there is a right move where in state Q, scanning the Z, the touring machine goes to state P. Writes y and moves right. Then in any I d with q z as a sub string. Which means that the touring machines is in state q. And the z is being scanned. We can replace q z by y p. That has the effect of moving the head right. As well as making the state and simple changes. Called for. There's a special case when Z is blank. Then it is additionally possible that from an ID with Q at the right end, that is this, which means that the blank is being scanned, for the next ID to replace the Q by YP. That is, the blank which we didn't see in alpha Q. Has been replaced by Y, that's here, and the state Q became P. P, of course, moving to the right of the Y that it just wrote. Now consider a left move, where again, we are in state Q scanning Z. The new state will be P. And the symbol Y will be written over the Z. If there is any symbol X, to the left of the state in the current id, such as this, in the next, in the next id, the Z is replaced by Y, and the XQ is replaced by PX. That means that X is being scanned in the next ID. And of course, we have gone to state P from state Q. But we also have to take care of the case where the ID has nothing to the left of the state. Which means that there is an infinity of blanks to the left of the present head position. Then we reflect the move by replacing the z by y. That's that. And introducing a blank to be the symbol immediately to the right of the state. It's okay if the ID represents a blank explicitly, and this is one of several situations where we have to do so. However, it is important to note that the length of the ID can only grow by one at each move, so it remains finite after any finite number of moves. There are actually two ways to define the language of a turn machine. Final state is one of these ways, of course. Formally, L of M, our Turing Machine M, is the set of strings of input symbols W, such that started in the initial ID, that's Q, not W. With W on the tape, M gets to some ID with a final state. The other way to define a language is by halting. We define each of them to be the setup of input strings W, such that started in the initial I.D. Again. M gets the sum I For which no move is possible. The two methods defining languages define the same class of languages, as we can show by simple Turing Machine constructions that are not too different from how we show that PDA is accepting by final state and by empty stack at the same language defining power. Given a Turing Machine M, we are going to modify M into a new Turing Machine, M Prime. The language M prime accepts by final state. M prime will accept by halting. We can easily make the final states of M be halting states by removing the rule for delta of Q and X whenever Q is a final state. But we have to protect against the possibility that M halts without accepting, because the M prime would then accidentally accept. So for M prime, we introduce a new state S, whose job is to make sure M Prime never halts. One M Prime is in state S, on any tape symbol X, M Prime stays in state S, leaves the X as it is, and moves right. Thus N prime will always have an X move in state S. Eventually it will get to the infinity of blanks to the right on the blank and look at them one at a time. And to make sure N prime doesn't hold accidentally. Whenever delta of Q and X is undefined for M, and Q is not a final state, we'll have M prime enter the state S. The construction in the other direction is also pretty easy. Now we assume M accepts some language L by halting, and we want to modify it to become M double prime which accepts L by final state. We introduce a new state F, which is the final state of M double prime. F has no moves, so in fact, M double prime will also halt when it accepts, but it doesn't matter. Once a touring machine enters the final state, if acceptance is by final state then nothing it does in the future will negate the fact that the input was accepted. If M halts in any situation, that is delta of Q in X is undefined, then define it and make the next state be F. We now have a proof that the class of languages accepted by touring machines using final state is the same as the class of languages accepted by touring machines using halting. The name for this, class of language is, is the recursively innumerable languages. You might rightly wonder where this name came from. It actually predates Alan Turing's paper describing Turing machines. And it refers to an earlier notion of anything we can compute. I'm not gonna say more about that. But there's another important class related to touring machines. There are called the recursive languages. These we shall see are proper subset of the recursively innumerable languages. An algorithm formally is a touring machine accepting by final state that halts on any input regardless of whether that input is accepted. This notion of an algorithm is consistent with the information notions you may have learned. However, it is limited to accept and to reject [inaudible] input. That is, all algorithms in this sense render a yes no decision but do not compute an output. We can extend the notion of what a turn machine does to allow it to produce output and then halt in which case we have exactly the same notion of algorithm that is taught in freshman computing. The language that is excepted by final state by some algorithm is called a recursive language. This term too comes from the history of attempts to answer the question, "what can a [inaudible] compute, prior to Turing's insights?" For example, every context free language is a recursive language. You can implement the CY K algorithm for any context free language on a Turing machine. That Turning machine will halt once it has computed the triangle of sets of variables, and checked whether the start symbol of the grammar is in the final set. But the recursive languages go way beyond the context free. In fact it is very hard to come up with a language that is not recursive. Except by using tricks like the diagonalization that we discussed earlier.