There is a pumping lemma for context free languages analogous to the pumping lemma for regular languages. The goal of this lecture is to [inaudible] improve this lemma, which is naturally more complicated than the pumping lemma for regular languages. And then use it to show certain languages are not context free. Let's review the pumping lemma for regular languages. It said that any sufficiently long string W in a regular language has some short non empty piece near the beginning that we could pump. That is, we could repeat it as many times as we liked, including zero. And the resulting stream would also be in the language. We found the strength of pump by looking for the first state to repeat as a finite automaton process that's input W. The pumping level of a context free language says that we can find two pieces in any long string Z and pump them in tandem. That is, we can repeat each of them I times or any I equal to the greater than zero, and the result will be in the same language. We'll also see that these two strings are close together in Z, and one can be empty but not both. Here's the statement of the ? For context free languages. We can see it as the same sort game with an adversary that we talked about in connection with regular languages. For every context free language L, that is you get to pick L presumably the one you want to show isn't really context free, there's an integer N. This is something the adversary gets to pick but once it's picked it's fixed for the rest of the game. Such that for every string Z and L of length at least N And here, you get to pick the Z to focus on. You can break Z into five pieces - U, V, W, X, Y, such that three things are true. The adversary gets to pick how Z is broken, but subject to two constraints we'll see in just a moment. And incidentally, V and X are the substrings that get pumped. The first constraint is that the middle three component, V, W, and X are short, no longer than the length of them put together. Remember that V and X will get pumped, so that's a. Not only are they short but they appear within a bounded distance from each other within Z. The second condition that the adversary has to respect is that V and X cannot both be empty, although one can. And if all the above is satisfied and L really is context free then for all integers I equal to or greater than zero: U, V to the I, W, X to the Y, Y is also an L. We win the game and prove is not context free by allowing the adversary's choices of N and. Breakup of Z, subject to the constraints one and two. And then picking an I, such that UV to the IW, X the IY is not an L. The proof the, the pumping lemma for context free languages starts with a [inaudible] normal form grammar for L. Technically, it is for L minus epsilon, but the empty string will never meet the condition of being of length at least N. So its presence of absence doesn't matter. The [inaudible] grammar has M variables for sub M. So we're going to let M be two to the M. And now lets consider any string, z and l, of length at least n. That's two to the m again. We're going to prove first that any [inaudible] tree in the c and f grammar, for string z of length at least n equals two to the m, must have a path of length m plus two or more from the root to a leaf. Well, actually. Proof the contra-positive. That is, suppose there's a parse tree Z with no path longer than M plus one. Such a path has M nodes labeled by variables at the top, or the beginning, and one node labeled by a terminal at the end. That is, here is a typical path. That is, here is a typical path. It will have variables here, here, and here and then a terminal there. Okay. If we forget about the terminals at the leaves. Part strain at CNF grammar is a binary tree. Let's say at each level the number of notes is almost gonna double. Thus there is only one note at the top level, the root, there can be only two notes at the next level four at the third and eight at the fourth and so on. In general there will be at most two to the power M -one at the end level. But this tree has it most m levels with variables and then along each. Variables. The last variable has one trial with a terminal as its label. The largest number of leaves occur if all the paths on the tree have exactly N variables. If some paths terminate before level N, there will be fewer leaves. Thus there are at most two to the N minus one nodes that are labeled by variables and have a single trial with a terminal label. And thus there are at most two to the N minus one leaves. Finally, we therefore can conclude that the length of the yield is at most. Most of the two to the M minus one power M minus one. [inaudible] Power of m minus one is n over two. So z is of length n that cannot be the yield of any three. That has passed limited to length m plus one or less. Therefore, we conclude that somewhere on the [inaudible] is the path of this m plus two. Now we're ready to prove the pumping level. We just proved that Z's parsed tree has a path of length at least M+2. Only the last [inaudible] on any path can be labeled by a terminal. So there are at least M+1 [inaudible] with variables along this path. Let's focus on one of the longest paths on the. Surely there are at least ten plus one variables along this longest path. Remember that M is the number of variables of the grammar. So, along this path, there are two low nodes labeled by the same variable, call it A. In fact, to make sure we pump short pieces, let's look only at the bottom most M+1 variables along this path, which could be much longer than M+1. And we know that two of them must be the same. On the next slide we'll see what the pause stream must look like. Here's the picture of the [inaudible] tree for z. We've showed the path we've focused on and the lowest repeating variables along that path. The purple tree is rooted at the lower a. And the yellow tree with the purple tree within it is rooted up at upper a. Let W be the yield of the purple tree. V and X the portions of the yield with the yellow tree that precede and follow W respectively. And let U and V be the portions of Z that precede V and follow X respectively. Let's look at the yellow. Since the path shown is as long as any other, and that path has at most N+1 variables, we know by the lemma one that we just proved that the yield of the yellow plus purple is no longer then two to the power M or N. That is, the length of VWX is no more than N. But v and x both can't be empty. Why? That's a useful property of [inaudible] normal form of grammars. Since the two A's shown in the tree are different nodes, the upper A must have a child to the left or right of the path shown. That's a consequence of the fact that we eliminated unit production so all bodies that have variables have at least two of them. Moreover, once we have a variable not on the path, there are no epsilon productions so we must generate from this variable at least one terminal. That is all we need to conclude that either V or X or both have length at least one. Now we can take advantage of the fact that we have two As along the path. We can get rid of V and X by pumping zero times. That is, we know the purple tree can substitute for the yellow because both trees have the same variable A at the root. If the original on the left satisfies the conditions of a parse tree, that is, every interior node is the head of a production whose body is the labels of his children, then the same will be true of the smaller parse tree on the right. We conclude that UWY is also in the language. Well, we could pump twice, that is, replace the purple tree by the yellow which has the purple within it and you get a par [inaudible] whose yield is uvvwxxy. And in the previous tree representing pumping twice, we could again replace the purple tree by the yellow, and get a bigger tree, whose yield is U, three V's, W, three X'es, and then Y. In the same manner, we could get parse trees for all strings [inaudible] form. U, V to the I, W, X, the IVY, for any integer I equal to or greater than zero. These strings are therefore all in the language L. That proves the pumping [inaudible] for context free languages. Let's look at an example of how the pumping [inaudible] can be used to show a language not to be context free. This language which involves matching the counts of two blocks of zeros is context free. A grammar of PDA4 is easy to construct. The ideas are very much like what we saw for the language O to the N, one to the N. But give the language three blocks of zeros, all of which must be the same length and we're suddenly outside what context free grammars can do. We can prove that using the problem for context-free languages. So we'll pick this language L. And the adversary now gets to pick N. We don't know N, but we do know it's fixed for the rest of the game. We get to pick Z, so let's pick zero to the N, one, zero to the N, one, zero to the N. That is, each block of zeroes is of length equal to whatever N the adversary picked. Now the adversary gets to break I, G up into Z equals U, V, W, X, Y. But he must chose these substrings, such that DWX together are no longer than N and D and X can not both be picked to be the empty string. There two cases depending upon whether the advisory picks D and X to have zeros or not. The first case, suppose there are no zeros among D and X and since they can not both be empty. There must me at least one, one among them. But then if we pump zero times to get the string UWY. We know that there is at most one, one in this string. The pumping number claims it as in the language l. But it can't be, because all strings in l have exactly two 1s. In the second and last case, v and x have exactly one zero among them. D W X has length at most N. So these three sub strings cannot extend from the first block of 0s to the last because N plus two positions separate those blocks. So again consider U, W, I which if L is context free it must be an L. Removing V and X must leave at lease one of the three blocks of N 0's intact, so it still has N 0's but V and X have at least one zero so when U, W, Y at least one of the blocks of 0's is fewer than N 0's. We conclude that in this case too U, W, Y cannot be an L and those L cannot be context free.