In this case it is clear why the conflict arose. The keyword if predicts that you can use either rule. That is, if you look in row <statement> column if of the LL(1) table we will find15. <statement> --> if <expression> then <statements> endif 16. <statement> --> if <expression> then <statements> else <statements> endif
| if | ||||
| <statement> | 15,16 | |||
The problem is that the right hand sides of both of these rules with <statement> on the left have a common prefix. Both start with
if <expression> then <statements>
The else or endif that would determine which rule to apply is arbitrarily far away, so this grammar is not LL(k) for any k. We change the grammar by eliminating the common prefix problem by removing the above rules and replacing them with
In the new LL(1) table for this changed grammar the old problem has disappeared.15. <statement> --> if <expression> then <statements> <optional_else> 16. <optional_else> --> else <statements> endif 17. <optional_else> --> endif
| if | endif | else | ||
| <statement> | 15 | |||
| <optional_else> | 17 | 16 | ||
When <statement> is to be expanded with lookahead if, only rule 15 applies. When <optional else> is to be expanded, token else implies that we apply rule 16, whereas token endif implies that we apply rule 17.
The example above illustrates just one case that causes a grammar to be not LL(1). There are many other possibilities. In this section we examine different cases and their causes.
If we want to use LL(1) techniques to parse, it is likely that we will not have an LL(1) grammar to start with. There will be conflicts in the table. We might try to clean up the grammar a bit and then regenerate the table, only still to find some conflicts. Some things we can do to clean up a grammar are to
Statement --> ProcedureStatement
AssignmentStatement --> VariableIdentifier ":=" Expression
ProcedureStatement --> ProcedureIdentifier OptionalActualParameterList
OptionalActualParameterList --> "(" ActualParameter ActualParameterTail ")"
--> l
Statement --> AssignmentStatementStatement --> ProcedureStatementAssignmentStatement --> VariableIdentifier ":=" Expression
ProcedureStatement --> ProcedureIdentifier OptionalActualParameterList
OptionalActualParameterList --> "(" ActualParameter ActualParameterTail ")"
--> l
We can always remove left recursion and common prefixes from a grammar to get a new grammar that produces the same language. However, this does not imply that the new grammar is LL(k). It will be a lot closer to LL(k), though, and is the way to start if the grammar is to be used in LL(1) or recursive descent parsing strategies.
The only difference here is that there is no endif keyword. That is, there is no special marker that indicates when an if statement is finished. Pascal and C are languages like this. We can still get rid of the common prefix by using the tack taken above.15. <statement> --> if <expression> then <statement> 16. <statement> --> if <expression> then <statement> else <statement>
What does the LL(1) table look like for this?15. <statement> --> if <expression> then <statement> <optional_else> 16. <optional_else> --> else <statement> 17. <optional_else> --> l
| if | else | |||
| <statement> | 15 | |||
| <optional_else> | 16,17 | |||
Notice that the token else predicts the use of rule 16, and that it also predicts the use of rule 17. To see that it predicts the use of rule 17, notice that we have to compute Follow(<optional_else>) to determine what tokens predict the use of rule 17, which in turn means from rule 15 that we need to compute Follow(<statement>>, and that in rule 15, <optional_else> is right behind <statement> and First(<optional_else>) contains else.
There are many other tokens that will predict the use of rule 17. Tokens that can follow <statement> are tokens that begin other statements, such as id (start of an assignment statement), while (start of a loop), if (start of an if statement), end (no more statements), and so forth.
| if | else | end | ||
| <statement> | 15 | |||
| <optional_else> | 17 | 16,17 | 17 | |
The problem is that this modified grammar, in which we have removed the common prefix of the original rules 15 and 16 is now ambiguous. Consider the following input stream from the scanner:
Consider the following (abbreviated) parse:if id < intlit then if id = intlit then write ( id ) else write ( id )
<statement>
|
___________________________________
| | | | |
if <expression> then <statement> <optional else>
|
______________________________________
| | | | |
if <expression> then <statement> <optional else>
Suppose that everything has been expanded now except the two <optional
else> nonterminals. Further, assume that the lookahead is else. Then we could choose to expand either <optional else> to l
and the other to else <statement> and get two different
valid parse trees.
<statement>
|
___________________________________
| | | | |
if <expression> then <statement> <optional else>
| |
| _________
| | |
else <statement>
______________________________________
| | | | |
if <expression> then <statement> <optional else>
|
l
and
<statement>
|
___________________________________
| | | | |
if <expression> then <statement> <optional else>
| |
| l
______________________________________
| | | | |
if <expression> then <statement> <optional else>
|
_________
| |
else <statement>
In case 1, the else associates with the first then. In case 2, the else
associates with the second then, So, in the LL(1) table for this grammar in row
<optional else> column else, either rule 16 or 17 applies.
There is no way to fix the grammar to make it LL(1). The language just doesn't
allow it (the language is inherently non LL(k)---there is no LL(k) grammar for it).
We can find a non ambiguous grammar for the language, but we cannot find an LL(1) grammar for it. The following grammar is non ambiguous:
However, this grammar is not LL(k). It is, however, LR(1). It is possible to parse strings according to this grammar bottom up. The reason is that we find entire right hand sides before reducing them to the left hand side. So, if we were parsing an if statement and we had found if <expression> <then> <other statement> and the lookahead is else, we know that we still don't have the right hand side. Otherwise, we do.<statement> --> <if_then_statement> <if_then_statement> --> if <expression> then <if_then_statement> <if_then_statement> --> <if_then_else_statement> <if_then_else_statement> --> if <expression> then <if_then_else_statement> else <statement> <if_then_else_statement> --> <other_statement>
But we don't need to go through all of these contortions for either LL(1) or LR(1) parsing. Go ahead and use the original grammar. If its only ambiguity is in the if statement, this means that the LL(1) table will have a conflict in row <optional_else> column else. Both rules 16 and 17 will show up there. In such a case, the language definition will include, in English, a paragraph that states that any else must be matched with the most recent then. This just means that we remove rule 17 from row <optional_else> column else, leaving
| if | else | end | ||
| <statement> | 15 | |||
| <optional_else> | 17 | 16 | 17 | |