March 3

Three Ways to Parse Top Down

and

Resolving Conflicts in LL(1) Tables

Three Different Top Down Parsing Strategies

1. "Pure" CFG approach

This is the approach you are required to use.

1.  <start>           --> <expression> eof
2.  <expression>      --> <term> <expression_tail>
3.  <expression_tail> --> + <term> <expression_tail>
4.  <expression_tail> --> - <term> <expression_tail>
5.  <expression_tail> --> e
6.  <term>            --> <factor> <term_tail>
7.  <term_tail>       --> * <factor> <term_tail>>
8.  <term_tail>       --> / <factor> <term_tail>
9.  <term_tail>       --> e
10. <factor>          --> <primary> <factor_tail>
11. <factor_tail>     --> ^ <primary> <factor_tail>
12. <factor_tail>     --> e
13. <primary>         --> identifier
14. <primary>         --> integer_literal
15. <primary>         --> ( <expression> )

procedure term_tail is
  begin
    case Lookahead is -- Lookahead is a global variable
      when ?? ==>    -- 7. <term_tail> --> * <factor> <term_tail>
        Match('*');
        Factor;
        Term_Tail;
      when ?? ==>    -- 8. <term_tail> --> / <factor> <term_tail>
        Match('/');
        Factor;
        Term_Tail;
      when ?? ==>    -- 9. <term_tail> --> e
        null;
      when others ==>
        Error;
    end case;
  end term_tail;
Here, the ?? are to be replaced by the appropriate tokens determined in the LL(1) table. For example, the first when clause is used to expand <term_tail> by rule number 7. So, we would look in the row of the LL(1) table labeled with <term_tail>, find every occurrence of rule number 7 in that row, and place the corresponding tokens that appear as the column headers in the LL(1) table in place of the ??. In this case, "*" would be the only token that predicted using rule 7.

2. EBNF approach

This is a common approach used by professional compiler writers who are doing a hand-build compiler. (For consistency's sake, you are to use approach 1, not this one, in your compiler project.)

Consider the following four rules in the above grammar

6.  <term>            --> <factor> <term_tail>
7.  <term_tail>       --> * <factor> <term_tail>>
8.  <term_tail>       --> / <factor> <term_tail>
9.  <term_tail>       --> e

The EBNF for this might look like

term :=  factor { mulop factor }

A procedure built around this particular rule might look like

procedure term is

   begin
     factor;
     while lookahead = "*" | "/" loop
        match;
        factor;
     end while;
   end term;

This approach still requires that one understand the how lookaheads drive the parse and thus still requires the construction of the LL(1) table. It isn't always so clear just how to build the parser directly from the EBNF and the LL(1) table, however.

3. A Table-Driven (Stack-Based) Approach

Table Driven LL(1) Parsing

A table driven parser works very much like a pushdown automaton. In fact, a pushdown automaton can be constructed quite easily to parse an LL(1) grammar. (Remember that for every context free grammar there is a pushdown automaton that recognizes the language of that grammar). To construct a table driven LL(1) parser we need:

An LL(1) table
A driver program (similar to the finite control of a pushdown automaton)
A stack abstract data type (the pushdown store of a pushdown automaton)

An Example

Let's look at the LL(1) grammar for Micro, the toy programming language in the book:

1.  <program>         --> begin <statement list> end
2.  <statement list>  --> <statement> <statement tail>
3.  <statement tail>  --> <statement> <statement tail>
4.  <statement tail>  --> l
5.  <statement>       --> ID := <expression> ;
6.  <statement>       --> read ( <id list> ) ;
7.  <statement>       --> write ( <expr list> ) ;
8.  <id list>         --> id <id tail>
9.  <id tail>         --> , id <id tail>
10. <id tail>         --> l
11. <expr list>       --> <expression> <expr tail>
12. <expr tail>       --> , <expression> <expr tail>
13. <expr tail>       --> l
14. <expression>      --> <primary> <primary tail>
15. <primary tail>    --> <add op> <primary> <primary tail>
16. <primary tail>    --> l
17. <primary>         --> ( <expression> )
18. <primary>         --> ID
19. <primary>         --> INTLIT
20. <add op>          --> +
21. <add op>          --> -
22. <system goal>     --> <program> $

The LL(1) table for this grammar is

	id	intlit	:=	,	;	+	-	(	)	begin	end	read	write	$
<program>										1
<statement_list>	2											2	2
<statement>	5											6	6
<statement_tail>	3										4	3	3
<expression>	14	14						14
<id_list>	8
<expr_list>	11	11						11
<id_tail>				9					10
<expr_tail>				12					13
<primary>	18	19						17
<primary_tail>				16	16	15	15		16
<add_op>						20	20
<system_goal>										22

An example

Use the above grammar and LL(1) table to parse

begin
  A := Fred - 314 + A;
end $

Do both a parse tree and a parse with a table driven method.

......

Having gone through an example of parsing using a stack and the LL(1) table we can now write a stack-based table driven top-down parser by hand. You will notice that the algorithm essentially implements a pushdown automaton.

push <system_goal> onto the stack
get lookahead_token

while the stack is not empty loop
  if the stack top is a nonterminal then
    rule_to_apply := LL_Table(stack_top, lookahead_token)
    if rule_to_apply /= null then
      pop stack
      push right hand side of rule_to_apply
    else
      error;
    end if
  else -- the stack top is a token
    match(stack_top); 
  end if
end while

if end of file then
  accept; -- the input string parses
else
  reject; -- the input string does not parse (and therefore is not in the language
end if;