1 00:00:00,000 --> 00:00:05,573 Good morning, good afternoon, good evening or good whatever or whenever, you are 2 00:00:05,573 --> 00:00:10,616 actually listening to this. Today we're going to start a completely new tack, 3 00:00:10,616 --> 00:00:16,057 where we learn about a second important class of languages other than the regular 4 00:00:16,057 --> 00:00:21,365 languages. This class is the context-free languages. They include all the regular 5 00:00:21,365 --> 00:00:26,076 languages and more. The most common description of these languages is a 6 00:00:26,076 --> 00:00:30,807 context-free grammar. We'll see the definition of these grammars and the way 7 00:00:30,807 --> 00:00:35,722 these grammars define languages, which is through derivations. We're also going to 8 00:00:35,722 --> 00:00:40,993 introduce a related notation called Bacchusnouer four. The two most important 9 00:00:40,993 --> 00:00:46,530 applications of context free grammars are probably a processing natural languages 10 00:00:46,530 --> 00:00:52,277 and computer languages. I am going to focus on computer languages Where almost 11 00:00:52,277 --> 00:00:57,472 every language today has its syntax defined by a context free grammar in the 12 00:00:57,472 --> 00:01:03,005 Bacchusnouer notation. Grammars are also essential when designing a parser for the 13 00:01:03,005 --> 00:01:08,470 language. That part of the compiler that puts together the tokens of the language 14 00:01:08,470 --> 00:01:13,733 into the proper structure. For example, the parser discovers the order in which 15 00:01:13,733 --> 00:01:19,064 arithmetic operators in an arithmetic expression and group statements properly 16 00:01:19,064 --> 00:01:27,896 so their execution sequence is as intended by the programmer. A context free grammar 17 00:01:27,896 --> 00:01:33,231 defines a language by a mechanism we will soon learn. Every regular language has a 18 00:01:33,231 --> 00:01:38,176 context free grammar describing it but there are also languages that can be 19 00:01:38,176 --> 00:01:43,251 described by a grammar but that are not regular. On the other hand, the context 20 00:01:43,251 --> 00:01:48,456 free grammar is still a relatively weak formalism. There are some languages that 21 00:01:48,456 --> 00:01:55,120 are simple to describe yet have to context free grammar. Many of the languages in the 22 00:01:55,120 --> 00:01:59,688 context free class that are outside the regular languages are languages that 23 00:01:59,688 --> 00:02:04,256 involve nested structures such as the patterns of left and right parenthesis 24 00:02:04,256 --> 00:02:09,275 that we called balanced. The central elements of the grammar are variables. 25 00:02:09,275 --> 00:02:14,323 These are symbols that generate particular sets of strings. One of these symbols 26 00:02:14,323 --> 00:02:19,245 called the start symbol will generate the entire language. But we can add many 27 00:02:19,245 --> 00:02:27,330 others to help us in that definition. The variables or rather the sets of strings 28 00:02:27,330 --> 00:02:34,560 they generate, are defined recursively in terms of one another. The rules that 29 00:02:34,560 --> 00:02:41,094 define the languages of the variables are called productions. Each production has a 30 00:02:41,094 --> 00:02:50,872 variable on the left, say A, an arrow And zero or more symbols on the right will 31 00:02:50,872 --> 00:02:57,667 draw say x or y on the right, that serve as the definition. A rule like this says 32 00:02:57,667 --> 00:03:04,631 that the concatenation of the languages represented by x and y is a subset of the 33 00:03:04,631 --> 00:03:10,625 language represented by a. A variable may have several productions and its languages 34 00:03:10,625 --> 00:03:15,649 thus defined to be the union of the languages described by the right side of 35 00:03:15,649 --> 00:03:21,001 each of its productions, but remember, all of this may be recursive so grammars are 36 00:03:21,001 --> 00:03:26,482 in fact far more powerful than the regular expressions we can build from unions and 37 00:03:26,482 --> 00:03:31,703 concatenations. For our first example, let's consider the language that we showed 38 00:03:31,703 --> 00:03:37,054 not to be a regular language. The set of strings of the form of N, zero followed by 39 00:03:37,054 --> 00:03:42,432 the same number of 1s. Here is a grammar for this language, Two productions or 40 00:03:42,432 --> 00:03:49,529 rules and only one variable s. The first production, this one is a basis rule. It 41 00:03:49,529 --> 00:03:55,566 says that the string zero one is in the language of s. The second, this, is an 42 00:03:55,566 --> 00:04:01,762 inductive rule. It says that if w is a string in the language of s, then zero w 43 00:04:01,762 --> 00:04:08,196 one is also in the language. That rule lets us build longer and longer strings at 44 00:04:08,196 --> 00:04:14,868 each step adding one zero to the beginning and one, one to the end. So we always have 45 00:04:14,868 --> 00:04:20,959 the same numbers of zeros and ones with the zeros preceding the ones. Now lets 46 00:04:20,959 --> 00:04:26,581 make precise our informal introduction to what context free grammars look like. The 47 00:04:26,581 --> 00:04:32,000 terminals of the grammar are analogous to the input symbols of an atomaton. They 48 00:04:32,000 --> 00:04:38,018 form the alphabet for the language being defined. The variables or non-terminals 49 00:04:38,018 --> 00:04:45,500 are something like states of automaton but they are more powerful. One variable is 50 00:04:45,500 --> 00:04:50,120 called the start symbol. It is the language of this variable that the grammar 51 00:04:50,120 --> 00:04:55,160 defines. And the other variables are used as auxiliaries but we can think of them as 52 00:04:55,160 --> 00:04:59,508 internal to the grammar and their languages are not visible outside. The 53 00:04:59,508 --> 00:05:05,153 productions of the grammar which are akin to the transition function of an atomaton 54 00:05:05,153 --> 00:05:10,730 have the form of the variable on the left, sometimes called the head, an arrow and a 55 00:05:10,730 --> 00:05:15,703 string of zero or more symbols which can terminals of variables. These are 56 00:05:15,703 --> 00:05:22,747 sometimes called the body of the production. We have the convention about 57 00:05:22,747 --> 00:05:28,166 the letters used for certain symbols and strings. These conventions are more 58 00:05:28,166 --> 00:05:33,372 complex than the convention we use to distinguish input symbols, little a, 59 00:05:33,372 --> 00:05:39,076 little b and so on from strings w, x and so on. But they're really important as a 60 00:05:39,076 --> 00:05:45,137 reminder of the roles different components play and it is something worth committing 61 00:05:45,137 --> 00:05:50,913 to memory early on. First, we use capital letters at the beginning of the alphabet 62 00:05:50,913 --> 00:05:57,467 as variables. However S is also normally a variable. In fact it will be the start 63 00:05:57,467 --> 00:06:04,461 symbol in many grammars. On the other hand, lower case letters at the beginning 64 00:06:04,461 --> 00:06:08,730 of the alphabet are terminals. This convention agrees with an earlier 65 00:06:08,730 --> 00:06:13,742 convention that made these letters stand for input symbols. Since there is a good 66 00:06:13,742 --> 00:06:21,651 analogy between the inputs of an atomaton and the terminals of a grammar. Capital 67 00:06:21,651 --> 00:06:26,577 letters near the end of the alphabet are used for symbols that can be either 68 00:06:26,577 --> 00:06:31,886 terminals or variables, we typically don't know which. Lower case letters at the end 69 00:06:31,886 --> 00:06:36,684 of the alphabet stand for strings of terminals only. Again, this matches our 70 00:06:36,684 --> 00:06:41,714 earlier convention. And we use Greek letters at the beginning of the Greek 71 00:06:41,714 --> 00:06:46,715 alphabet for strings that may consist of both terminals and variables mixed. We'll 72 00:06:46,715 --> 00:06:51,411 design a grammar for the language of strings with the form zero to the n, one 73 00:06:51,411 --> 00:06:59,412 to the n. The terminal alphabet is zero and one of course. We need only one 74 00:06:59,412 --> 00:07:06,983 variable in this case, we'll call it S. S will be the start symbol. There's no other 75 00:07:06,983 --> 00:07:13,562 option since it is the only variable. And here are the productions which we 76 00:07:13,562 --> 00:07:19,004 explained earlier in our informal discussion. The first production generates 77 00:07:19,004 --> 00:07:24,298 only the string zero one and the second production puts a zero and one at the 78 00:07:24,298 --> 00:07:28,981 beginning and end respectively of a shorter string in the language. A 79 00:07:28,981 --> 00:07:34,411 derivation consists of a sequence of strings that typically have both terminals 80 00:07:34,411 --> 00:07:39,909 and variables, although they can have only one kind of symbol or even be empty. We 81 00:07:39,909 --> 00:07:45,406 start with the string consisting of just the start symbol. At each step, we find a 82 00:07:45,406 --> 00:07:50,090 variable to replace, say A. The productions for A, or the A productions, 83 00:07:50,090 --> 00:07:55,424 are those that have A on the left side, that is the head of production. We replace 84 00:07:55,424 --> 00:08:00,561 this A by the right side or the body of the production. We can then repeat the 85 00:08:00,561 --> 00:08:05,829 process as many times as we like until we are left with only terminals at which 86 00:08:05,829 --> 00:08:11,296 point no replacement is possible and we have in fact generated a string that is in 87 00:08:11,296 --> 00:08:16,795 the language of the grammar. Here's what the replacement looks like. You take any 88 00:08:16,795 --> 00:08:22,279 string that has an occurrence of some variable A and anything to the left and 89 00:08:22,279 --> 00:08:28,035 right. Terminals and or variables, which is what the alpha and beta we have here, 90 00:08:28,035 --> 00:08:34,725 suggest you take an A production who's body is gamma and you replace the A by 91 00:08:34,725 --> 00:08:41,930 gamma. Here is an example derivation. We always start with the string that has Just 92 00:08:41,930 --> 00:08:50,680 the start symbol s. We choose the second production so S replaced by zero, S, one. 93 00:08:52,760 --> 00:09:00,796 Now, we replace the s, again using the second production. And again we replace 94 00:09:00,796 --> 00:09:05,449 the s, but this time we use the first production whose body is just zero one. We 95 00:09:05,449 --> 00:09:09,983 are now left with a string that has only terminals and this string cannot be 96 00:09:09,983 --> 00:09:14,636 subjected to further replacements. You should be aware that in more complicated 97 00:09:14,636 --> 00:09:18,817 grammars, variables can be replaced by strings that contain two or more 98 00:09:18,817 --> 00:09:23,293 variables, and when that happens we have lots of choices of what variable to 99 00:09:23,293 --> 00:09:28,853 replace at each step. And derivations can be much more complicated. The double arrow 100 00:09:28,853 --> 00:09:34,030 symbol represents single steps of a derivation and just as we extend the delta 101 00:09:34,030 --> 00:09:39,272 the many steps for atomaton, we need to have a notation that means any number of 102 00:09:39,272 --> 00:09:47,120 steps for a derivation. This is the arrow star symbol, and we define it inductively. 103 00:09:47,520 --> 00:09:56,484 The basis is that in zero steps the string can't change, so any string goes to star 104 00:09:56,484 --> 00:10:02,190 itself. The inductive step lets us get from alpha to beta using any number of 105 00:10:02,190 --> 00:10:08,111 steps using any numbers of steps including zero steps. That's that. And then with one 106 00:10:08,111 --> 00:10:15,369 more step, get from beta to gamma. The conclusion is then that alpha goes on some 107 00:10:15,369 --> 00:10:21,756 number of steps to gamma. Here's an example using the same grammar as before 108 00:10:21,756 --> 00:10:28,647 and the same derivation sequence we just discussed. We can conclude that s goes to 109 00:10:28,647 --> 00:10:35,034 itself in zero steps, that is we just start here, don't go anywhere. So we can 110 00:10:35,034 --> 00:10:46,090 conclude that s goes to star itself. Then inductively, S goes to 0,1, zero, S, one 111 00:10:46,090 --> 00:10:56,497 rather 00, S1 and 000111. Notice also that we don't have to start with the start 112 00:10:56,497 --> 00:11:06,552 symbol. It is also true for example, that 0S1 goes to star Zero, zero, zero, one, 113 00:11:06,552 --> 00:11:17,316 one, one. That is anything along the derivation sequence goes to any later, 114 00:11:17,316 --> 00:11:29,745 goes to itself and any later position along the sequence. A sentential form for 115 00:11:29,745 --> 00:11:38,158 a grammar is any string derivable from the start symbol. That is, s arrow star the 116 00:11:38,158 --> 00:11:47,308 sentential form. The [inaudible] form can consist of any mix of terminals and non 117 00:11:47,308 --> 00:11:55,767 terminals. The language of the grammar G is the set of terminal strings W such as 118 00:11:55,767 --> 00:12:03,809 the start symbol G derives W. Here's an example grammar that's just a little 119 00:12:03,809 --> 00:12:09,966 different from the previous one. Here the basis rule is that S goes to Epsilon 120 00:12:09,966 --> 00:12:16,362 rather than S goes to 01. Note that Epsilon is a perfectly legitimate body for 121 00:12:16,362 --> 00:12:21,382 production and is effective to make the variable on the left disappear. As a 122 00:12:21,382 --> 00:12:26,927 result, this grammar can derive the empty string along with all the other strings 123 00:12:26,927 --> 00:12:32,432 that have some numbers of 0s followed by the same number of 1s. That is, the 124 00:12:32,432 --> 00:12:37,254 language of this grammar is this set of zero to the n, one to the n, such that n 125 00:12:37,254 --> 00:12:42,014 is at least zero while the previous grammar had at least one as the condition 126 00:12:42,014 --> 00:12:50,606 on n. The class of language is called context-free languages consists of all 127 00:12:50,606 --> 00:12:55,206 those languages that are defined by some context-free grammar. We now see that 128 00:12:55,206 --> 00:13:00,159 there are context-free languages that are not regular languages, such as the zero to 129 00:13:00,159 --> 00:13:05,954 the n, one to the n example just given. However there are languages that are not 130 00:13:05,954 --> 00:13:12,263 context free and the intuition is that context free grammars can count two things 131 00:13:12,263 --> 00:13:18,187 but not three. Thus an example of a language that is not context free is zero 132 00:13:18,187 --> 00:13:26,665 to the N, One to the n, two to the n, say, such that n is equal to a greater than 133 00:13:26,665 --> 00:13:35,935 one. That is, you cannot match both the zeroes against the ones and the ones 134 00:13:35,935 --> 00:13:42,562 against the two at the same time. I want to introduce a notation called BNF or 135 00:13:42,562 --> 00:13:48,137 Bacchusnaur Form, which you may have seen in manuals for programming languages, and 136 00:13:48,137 --> 00:13:53,577 which is closely related to context-free grammars. Bnf has a number of extensions 137 00:13:53,577 --> 00:13:59,219 of the grammar notation we use, and these extensions are useful in manuals but don't 138 00:13:59,219 --> 00:14:04,629 add any power. B and F style notations were used for two of the original 139 00:14:04,629 --> 00:14:09,874 programming languages. John Bacchus used it in the original description of Fortran 140 00:14:09,874 --> 00:14:17,838 and Peter Nouer used it in the original description of [inaudible]. In B and F you 141 00:14:17,838 --> 00:14:22,861 usually use a word to describe a variable for example, statement for the intent is 142 00:14:22,861 --> 00:14:27,393 that this variable will generate all the strings that valid statements of 143 00:14:27,393 --> 00:14:32,538 programming language. Conventionally these words are put in triangular brackets that 144 00:14:32,538 --> 00:14:41,058 tell us they are variables rather than terminals. Some terminals for programming 145 00:14:41,058 --> 00:14:46,070 language are single characters just like in our formal context free grammars. For 146 00:14:46,070 --> 00:14:51,020 example, the plus sign or semi-colon are commonly terminals in the grammar for a 147 00:14:51,020 --> 00:14:57,272 programming language. However other terminals are really reserve words like if 148 00:14:57,272 --> 00:15:03,232 or while. And we see these shown either in bold or underline depending on the style 149 00:15:03,232 --> 00:15:08,905 used to remind us that they are single symbols. In B and F we often colon, colon 150 00:15:08,905 --> 00:15:14,721 equals used in productions rather than arrow. We also find a vertical bar used to 151 00:15:14,721 --> 00:15:20,681 list several production bodies that have the same head. That is a useful convention 152 00:15:20,681 --> 00:15:27,458 that we should use in formal conventions as well. For example, our original grammar 153 00:15:27,458 --> 00:15:32,431 can be written with s ones on the left side and the bodies of the two s 154 00:15:32,431 --> 00:15:39,866 productions listed with the bar connecting them. We follow a symbol or bracketed 155 00:15:39,866 --> 00:15:47,237 expression by dot, dot, dot. So here's an example. We have one variable digit with 156 00:15:47,237 --> 00:15:53,664 the obvious ten productions all grouped together with the bars. Then we have one 157 00:15:53,664 --> 00:16:00,563 production for the variable unsigned integer. With the right side digit, ... , 158 00:16:00,563 --> 00:16:07,861 that is one or more digits. In general, we can replace alpha ... By two productions. 159 00:16:07,861 --> 00:16:17,118 Let A be a new variable that generates all sequences of one or more alphas. A has two 160 00:16:17,118 --> 00:16:23,829 productions with bodies a alpha and just alpha. They are. You should be able to see 161 00:16:23,829 --> 00:16:30,540 how a can be replaced of n alphas. Just use the first production n minus one times 162 00:16:30,540 --> 00:16:37,005 and then the second production once. For example, here is a grammar for unsigned 163 00:16:37,005 --> 00:16:42,897 integers, where the BNF d dot, dot, dot has been replaced by the these two 164 00:16:42,897 --> 00:16:52,393 productions. These generate any sequence of one or more D's. Then D generates each 165 00:16:52,393 --> 00:16:58,777 of the ten strings consisting of a single digit. That's, of course, that. We can 166 00:16:58,777 --> 00:17:05,653 make part of a production body be optional if we surround it by square brackets. For 167 00:17:05,653 --> 00:17:12,004 example, many programming languages have both an if then and an if then construct 168 00:17:12,004 --> 00:17:18,040 for statements. We can see this as an if then statement with an optional else 169 00:17:18,040 --> 00:17:24,077 clause. In B and F we put brackets around the else clause to make it optional. 170 00:17:24,077 --> 00:17:29,912 That's essentially this stuff there. We can replace an optional alpha by a new 171 00:17:29,912 --> 00:17:35,291 variable a. This variable has two productions, one with a body alpha and the 172 00:17:35,291 --> 00:17:41,101 other with the empty string for a body. Thus the alpha can be either there or not, 173 00:17:41,101 --> 00:17:48,749 when we expand the new variable a. Here we're using I for IF, T for then E for 174 00:17:48,749 --> 00:17:54,815 else and the semicolon is another terminal standing for itself. S is the start symbol 175 00:17:54,815 --> 00:18:00,240 standing for statement. And C is another variable representing conditions. We 176 00:18:00,240 --> 00:18:05,592 really need to add production for conditions but I haven't done so in this 177 00:18:05,592 --> 00:18:11,586 fragment of a real grammar. Notice the a is a variable standing for an optional 178 00:18:11,586 --> 00:18:17,534 else clause. Okay. It can be replaced by a semi colon and else and another statement 179 00:18:17,534 --> 00:18:23,410 if we want to have the else clause there. Or by epsilon if we just want an if/then 180 00:18:23,410 --> 00:18:29,900 statement with no else clause. Curly braces are used in B and F for grouping 181 00:18:29,900 --> 00:18:35,347 several different elements. You need this for example if you want to have a 182 00:18:35,347 --> 00:18:41,157 repeating group of elements and a ... Or one or more construct. For example, it is 183 00:18:41,157 --> 00:18:46,676 common if programming languages to allow a statement list to be one or more 184 00:18:46,676 --> 00:18:52,196 statements. The statements are separated by semicolons so there is one fewer 185 00:18:52,196 --> 00:18:57,788 semicolon than statements. That is a statement list consists of one statement, 186 00:18:57,788 --> 00:19:05,334 that's this. Followed optionally by one or more groups consisting of a semi-colon and 187 00:19:05,334 --> 00:19:20,762 a statement There. The Brackets form a group and then finally the ... Applying to 188 00:19:20,762 --> 00:19:31,343 the group says one or more of these groups. Finally you have the braces And 189 00:19:31,343 --> 00:19:38,702 those braces say that the whole thing is optional. To translate groups to our 190 00:19:38,702 --> 00:19:44,126 original notation, just create a new variable a for the group. A has one 191 00:19:44,126 --> 00:19:51,370 production whose body is the group. Here's an example of a production that uses all 192 00:19:51,370 --> 00:19:57,025 three BNF features, one or more optional and grouping. It says a list of statements 193 00:19:57,025 --> 00:20:02,474 L is a single statement S optionally followed by one or more groups each group 194 00:20:02,474 --> 00:20:08,703 consisting of a semi-colon and a statement. The first thing we'll do is 195 00:20:08,703 --> 00:20:15,112 replace the group semicolon S by a new variable A. A has one production in which 196 00:20:15,112 --> 00:20:22,401 it is replaced by thing it stands for, that's this guy right here. And next we'll 197 00:20:22,401 --> 00:20:28,590 introduce a new variable B which stands for the optional A ., ., ., . B has two 198 00:20:28,590 --> 00:20:34,778 productions, it is replaced either by the A... , that would be choice, or, it is 199 00:20:34,778 --> 00:20:43,718 replaced by the empty string. Finally, we replace the A... And the B productions by 200 00:20:43,718 --> 00:20:52,333 the new variable C. The productions for C, which are here, allow C to be replaced by 201 00:20:52,333 --> 00:21:00,804 any sequence by one or more As. When a sentential form has a number of variables, 202 00:21:00,804 --> 00:21:05,745 we can replace any one of them at any step. But what string of terminals a 203 00:21:05,745 --> 00:21:10,752 variable ultimately gets replaces by is independent of what else is in the 204 00:21:10,752 --> 00:21:16,426 sentential form. That's actually where the term context free comes from. As a result, 205 00:21:16,426 --> 00:21:22,717 we have many different derivations of the same string of terminals. We can restore 206 00:21:22,717 --> 00:21:27,966 some order to the world by requiring that a particular variable be replaced at each 207 00:21:27,966 --> 00:21:32,840 step. Although we cannot demand that any particular production be used for the 208 00:21:32,840 --> 00:21:38,524 replacement. One reasonable rule is to require that the leftmost variable be the 209 00:21:38,524 --> 00:21:44,112 one replaced at each step. This rule restricts us to what are called leftmost 210 00:21:44,112 --> 00:21:50,063 derivations. Similarly we could require that the rightmost variable be replaced at 211 00:21:50,063 --> 00:21:56,014 each step, and that gives us the rightmost derivations. The double arrow with an LM 212 00:21:56,014 --> 00:22:05,966 subscript That's this, represents one step of a left most derivation. That is, on the 213 00:22:05,966 --> 00:22:14,679 left of the arrow LM we must have a string in the form WA Alpha. Here you see one. 214 00:22:14,958 --> 00:22:22,112 That is since W by our convention has terminals only, A must be the left most 215 00:22:22,112 --> 00:22:29,453 variable in this string. On the right is the same string with the body, say Beta, 216 00:22:29,453 --> 00:22:37,636 is that, of some A production replacing it, A. The symbol consisting of the double 217 00:22:37,636 --> 00:22:43,530 arrow with the subscript LM and the star. That's this, means becomes by a sequence 218 00:22:43,530 --> 00:22:48,659 of zero or more leftmost derivation steps. Let's introduce another very simple 219 00:22:48,659 --> 00:22:54,051 grammar that generates a language that is not a regular language. This grammar has 220 00:22:54,051 --> 00:22:59,180 only one variable but unlike the zero to the n one to the n grammar there are 221 00:22:59,180 --> 00:23:05,616 productions with more than one variable in the body. This grammar generates strings 222 00:23:05,616 --> 00:23:12,060 of balanced parentheses. Those strings that are legal and arithmetic expressions. 223 00:23:12,060 --> 00:23:18,925 The last production, here's it's body. Says that a pair of matching parentheses 224 00:23:18,925 --> 00:23:24,500 is balanced. Of course the left parenthesis must come first. The middle 225 00:23:24,500 --> 00:23:30,473 production, this, says that if we put matching parentheses around a balanced 226 00:23:30,473 --> 00:23:36,207 string it is still balanced. And the first production, this, says that the 227 00:23:36,207 --> 00:23:42,033 concatenation of two balanced strings is balanced. We need to prove that every 228 00:23:42,033 --> 00:23:47,101 string of balanced parentheses can be generated by this grammar. The proof is 229 00:23:47,101 --> 00:23:52,301 not too hard but we're not going to do it here. Here's an example of a leftmost 230 00:23:52,301 --> 00:23:59,993 derivation. We start with just S, so that is the leftmost variable. We replace it by 231 00:23:59,993 --> 00:24:06,408 two S's at the first step. There we go. Next, the first of these S's must be 232 00:24:06,408 --> 00:24:12,823 replaced in a leftmost derivation. We choose the second production for the 233 00:24:12,823 --> 00:24:19,064 replacement. There we are. At the third step, the first of the S's must be 234 00:24:19,064 --> 00:24:25,479 replaced and here we choose the last production in the replacement, that's 235 00:24:25,479 --> 00:24:33,566 giving us that. [sound] At the last step, we have only one S and that naturally is 236 00:24:33,566 --> 00:24:39,457 the one we replace. We've chosen to replace Using the last of the three 237 00:24:39,457 --> 00:24:45,696 productions so now we have a terminal string and are done. The arrow star 238 00:24:45,696 --> 00:24:51,098 leftmost notation can be used to express zero or more leftmost derivation steps. So 239 00:24:51,098 --> 00:24:56,305 for example, S is related to the terminal string by this relationship. It is also 240 00:24:56,305 --> 00:25:01,838 related to itself and all the other steps in the derivation by the same relationship 241 00:25:01,838 --> 00:25:09,401 and in fact each step is related to itself and all the following steps. Here is an 242 00:25:09,401 --> 00:25:16,245 example of a derivation that is not left most. The problem is that if the second 243 00:25:16,245 --> 00:25:24,657 step, the second S, rather than the first, was replaced. Rightmost derivations are 244 00:25:24,657 --> 00:25:31,148 defined quite analogously to leftmost derivations. The arrow with an RM 245 00:25:31,148 --> 00:25:37,806 subscript, this. Means that the rightmost variable is replaced at the step. Notice 246 00:25:37,806 --> 00:25:44,803 that the string on the left which is this, has W which must be a string of only 247 00:25:44,803 --> 00:25:50,864 terminals following the variable A that it's replaced. Thus A is surely the 248 00:25:50,864 --> 00:25:58,625 rightmost variable. And the arrow with the RM and a star means zero or more steps in 249 00:25:58,625 --> 00:26:05,904 the rightmost derivation. Here's a balanced parenthesis grammar again. Now we 250 00:26:05,904 --> 00:26:13,183 have a rightmost derivation of the same string as before. Notice that at the 251 00:26:13,183 --> 00:26:19,887 second step, okay, the second S got replaced rather than the first. S is 252 00:26:19,887 --> 00:26:27,137 related to the terminal string by the arrow star rightmost operator. And as the 253 00:26:27,137 --> 00:26:32,299 leftmost derivations each step in the rightmost derivation is related by the 254 00:26:32,299 --> 00:26:37,864 operator to itself and all the steps that follow. Here's an example of a derivation 255 00:26:37,864 --> 00:26:42,891 that is neither a leftmost nor a rightmost. See how at the third step The 256 00:26:42,891 --> 00:26:51,213 middle S is replaced. Also notice that the second step is correct but the ambiguous 257 00:26:51,213 --> 00:26:59,334 in a concerning way. One of the [sound], here, one of the these got replaced by two 258 00:26:59,334 --> 00:27:02,543 [sound] but we don't know which.