if statements

If Statements: Boolean Expression
and
Control Structure Semantics

Read and Write Statements

Consider the following rules.

44:<ReadStatement> -> MP_READ MP_LPAREN <ReadParameter> <ReadParameterTail> MP_RPAREN 
45:<ReadParameterTail> -> MP_COMMA <ReadParameter> <ReadParameterTail> 
46:<ReadParameterTail> -> lambda 
47:<ReadParameter> -> <VariableIdentifier> 
48:<WriteStatement> -> MP_WRITE MP_LPAREN <WriteParameter> <WriteParameterTail> MP_RPAREN 
49:<WriteParameterTail> -> MP_COMMA <WriteParameter> <WriteParameterTail> 
50:<WriteParameterTail> -> lambda 
51:<WriteParameter> -> <OrdinalExpression>

In parsing rule 44, it is known after the expansion of <ReadParameter> that the identifier matched is in a Read statement, so we can at that point generate code to read a value into that identifier. The same holds true for rule 45.

Similarly, in rules 48 and 49, after parsing <WriteParameter> we know that code has been generated that will leave the value of <OrdinalExpression> on top of the stack, so we can generate code to pop and print the top of stack value (in a stack based architecture) or that code has been generated that will leave the value of <OrdinalExpression> in a temporary location whose attributes are in the WriteParameter semantic record, so code can be generated at that point to print the value of the temporary.

Control Structure Semantics -- the if statement

In translating an if statement, we need to be able to generate code that will evaluate Boolean expressions and we need to be able to generate code that will effectively perform branches to the correct parts of the if statement based on the value computed at run time for the Boolean expression.

Expression Types

There are three types of expressions that can occur in most modern programming languages:

Arithmetic Expressions: These are expressions incorporating the standard arithmetic operations, such as

a * (b + c) / Fred

Operands for arithmetic expressions must be values of a compatible numeric type in most programming languages.

Relational Expressions: These are expressions that use the relational operators, such as < , <=, and so forth. The operands of relational operators must be compatible as defined by the language, but may be numeric values, characters, strings, or even enumerated type values in most programming languages. For non-numeric operands, their ASCII values are used for comparison in the case of characters and strings, and underlying numeric values for enumerated types. The result of a relational expression is Boolean. An example is

a < Fred * Mary

Logical Expressions: These are expression whose operators are AND, OR, and NOT. The operands of a logical expression must be Boolean, and the result type is Boolean. An example is

0 <= score AND score <= 100

Notice that in choosing precedence for operators, it might be good to put the arithmetic operators at the highest level, the relational operators below that, and then the logical operators lower than the relational operators in the order NOT, AND, and then OR. There seems to be a wide variation in operator precedence in programming languages which respect to the relational and logical operators, though, so one needs to check the language carefully when writing a compiler for a new language. In general, programmers should be sure to use parentheses to indicate the order in which they want the operators to be performed.

Boolean Expression Evaluation

There are some special aspects to the translation of expressions that result in Boolean values (True or False) that arise because most computer architectures have no specific values designated as True and False. Thus, the compiler writer must represent these values in some way, say with the integers 0 (false) and 1 (true), and then treat these values consistently throughout the compiled program.

Real architectures do usually have such logical operations as AND, OR, NOT, and XOR (exclusive or). The ways these are implemented normally is that the operands must have the same size (number of bits) for binary operations, and the logical operations are done bitwise (one bit at a time, with 1 representing true and 0 false). So, for example,

0101 AND 1110 = 0100

0101 OR 1100 = 1101

0101 XOR 1100 = 1001

NOT 0101 = 1010

These machine language instructions can be used to perform the desired NOT, AND, and OR logical operations of high level programming languages if the compiler writer represents, say, true as the byte 11111111 and false as 00000000 throughout your program (one bit would be enough, but it might not be feasible or efficient to deal with individual bits in a real computer).

For IR that is not machine specific, we can simply represent the fact that we are doing various relational and logical operations in the code and leave the messy details of implementation to the code generation phase (where IR is translated into actual target machine code).

For your projects, you have aspects of real machine code to deal with. So remember the following:

you must decide how to represent True and False (e.g., use 0 for false and 1 for true)
you must use existing instructions in the virtual machine when generating code to perform relational and logical operations in a consistent manner
microPascal is not defined like C; you cannot perform arithmetic on Boolean values, nor can you assign a Boolean value to an integer variable..

Translating If Statements

A Hand-Translation Example

Suppose we are parsing using the rules

<statement> --> if <expression> then <statements> <opt_else> endif
<opt_else>  --> else <statements>
            --> l

A sample statement being parsed might be

if a < b
  then
    read(a);
    c := a + 1
  else
    write(b)
endif

The most important thing to note about translating any control structure is that it is the statements for performing the control that are the result of this translation. That is, branch and label instructions are the primary instructons produced by the translation of a control structure, such as an if or a loop. The call to <expression> in the parse produces the code for evaluating the Boolean expression of the if statement. The two calls to <statements> produces the code in the then and else parts. The only code left to be produced by semantic calls in the if statement is the code for branching to to the proper parts of the if statement.

This is demonstrated below, where the black parts are generated by calls to <expression> and <statments>, and the blue parts must be generated by semantic calls in the if statement. Here we are supposing that a is located at (0)D0, b at (4)D0, and c at (8)D0.

;if
  ;generated by call to <expression>
  push 0(D0) ;push a
  push 4(D0) ;push b
  cmplts     ;pop the top two stack values, compare for less than,
             ;push the result (True or False) onto the stack
  branchf L1 ;branch to the optional else if the top of stack value is false
;then
  ;generated by call to <statements> in then clause
  read 0(D0) ;read the next value in the input into D0(0)
  push 0(D0) ;push a
  push '1'   ;push 1
  adds       ;add the top two stack values, leaving the sum on the stack top
  pop 8(D0)  ;pop the top of stack value into c
  branch L2  ;branch around the optional else part
;else
  label L1   ;optional else part label
  ;generated by call to <statements> in else part
  push 4(D0) ;push b
  writes     ;write the top of stack
;endif
  label L2   ;endif label

As described in the first part of this lecture, you will need to do more work in most target machine languages in <expression> to generate the proper code for handling Boolean values.

Modifying the Compiler for Translation of if Statements

What are some of the issues involved in upgrading a working parser to generate code for an if statement? Remember that the only code we need to generate while compiling an if statement directly are the branch and label instructions, as illustrated in the above example. Indirectly, of course, the code for evaluating the Boolean expression, the statements in the then clause and the statements in the else clause, but this code is generated during calls related to <expression> and <statements>. So, let's see what we must do to generate labels and insert branch and label instructions into the translated code.

A New Label Object

One thing we need is some way to generate labels. The best way to do this is to have a new class in our semantic analyzer with a getNewLabel method that always generates a unique label each time it is called. This object will be private to the semantic analyzer, because only the semantic analyzer needs it. That is, when it is time to generate code, the parser calls an action method in the semantic analyzer; these actions may then need to call the label generator to obtain new labels for the code about to be generated. The new label class will keep a numeric count of the number of labels generated so far. For example, each time the new label object is called, it could add one to an internal variable count that has been initialized to zero, and create and return the string L || To_String(count), which is just L concatenated with the string version of the value in count.

Then, the value of the labels generated must be used in jump and label statements. Since we must insert the code to jump to a label before the label instruction is actually inserted into the code, we must keep labels in proper semantic records and pass them to actions symbols as necessary.

Recursive Descent Parser Example

After thinking about the if statement for a while, we migh come up with the following augmentation of the grammar rules for the if statement:.

<statement> --> if #process_if(if_rec) <expression> #if_test(if_rec, exp_rec)
                   then <statements> <opt_else> endif #finish_if(if_rec)
<opt_else>  --> else #process_else(if_rec) <statements>
            --> l #process_no_else(if_rec)

Thus, the code in a recursive descent parser for this if statement should look something like

procedure Statement;

  if_rec : semantic record -- (has room for the else and endif labels)
  expression_rec : semantic record -- (contains the type of expression)

begin -- Statement
  case Scanner.lookahead is
    when if ==>
      Match(if);

      -- The next call constructs an if_rec that contains the two labels
      -- needed to construct an if statement.
      Analyzer.process_if(if_rec);
      Expression(expression_rec);

      -- The next call to test_if first checks to see if the type in
      -- expression_rec is proper for a conditional expression and then, if so,
      -- generates code to branch on false to the second label in the if rec
      Analyzer.if_test(if_rec, expression_rec);
      Match(then);
      Statements;
      Opt_Else(if_rec);
      Match(endif)

      -- The call to finish_if inserts the second label from the if_rec
      -- into the code.  This is the label used to branch around the
      -- statements in the else clause for execution paths that go through
      -- the then clause.
      Analyzer.finish_if(if_rec);
    when id ==>
      . . .
  end case;
end Statement;

The call to Analyzer.process_if must generate the labels for the optional else part and for the endif and put them in the if_rec semantic record.

The call to Analyzer.process_if_exp gets the if_rec semantic record with its two labels, and the expression_rec. This routine must

check the type listed in expression_rec. If it is not Boolean, there is something wrong with the expression, and translation cannot continue (if expressions are allowed to be only Boolean, except in some dufus languages, like C).
If the expression type is incorrect, an error must be reported.
If the expression is of the proper type, code must be generated to branch on false around the then part to the optional else part.

The call to Analyzer.finish_if drops the label for the endif (the second label in parameter if_rec).

The code that manages control for the optional else part is done in the call to <opt_else>:

procedure Opt_else(opt_else_rec: in semantic_record);
  -- opet_else_rec is a copy of the if_rec generated in the if
  -- statement

begin -- Opt_else
  case lookahead is
    when else ==>
      Match(else);

      -- The next analyzer call generates the unconditional branch
      -- around the statements in the else part for execution code
      -- that comes through the then part, and then generates the
      -- else label for execution that branches to the else clause
      Analyzer.process_else(opt_else_rec);
      Statements;
    when endif ==>
      -- The next analyzer call inserts the first label in the if_rec
      -- into the code to be sure that this label exists, because
      -- code has already been generated to branch to it.  However,
      -- since there is no else part, no branch statement is inserted
      -- jump around the else part if execution comes through the
      -- then part.
      Analyzer.process_no_else(opt_else_rec);
      null;
    when others ==>
      Syntax_error;
  end case;
end Opt_else;

Procedure Analyzer.process_else must insert an unconditional branch to the endif label in opt_else_rec followed by a label statement for the else label in opt_else_rec.

Procedure Analyzer.process_no_else must generate a label statement for the else label in opt_else_rec, but it does not need to first generate an unconditional branch to the endif label, because there is no else code to branch around.

Note that in this example, the branch at the end of the then clause that would cause a jump around the else code is actually generated by the Opt_else routine rather than the if statement. This just saves one instruction in the case that there is no else clause.

If there is no else part, the code is not generated for jumping around the else part, but code must be generated for the else label, because a jump on false to this label would have been produced right after checking the expression type.

Notice, too, that even if the then or else clauses contain other if statements (i.e., there are nested ifs), this still all works correctly!

Finally, remember that there are variations on how the desired code can be generated. We have just given you one way.

The Analyzer Routines for if Statements

For the approach we have given above, there will be four methods needed in the semantic analyzer plus a label generating class. The label generating class will also be used for generating labels needed in other control structures, such as loops and case statements.

Let's look first at method process_if in the analyzer, which just needs to ensure that two labels are ready for use for generating code for the if control structure.

procedure process_if(if_rec: out semantic record);

  begin -- process_if
    if_rec.label_1 := labelGenerator.getNewLabel;
    if_rec.label_2 := labelGenerator.getNewLabel;
  end. -- process_if

We can look next at the call to if_test

procedure if_test(if_rec, conditional_rec: in semantic record)

  begin if_test

    -- Code goes here to check whether the type of the value is
    -- proper for a conditional expression.  If it isn't, an error
    -- must be generated here, akin to "Expecting type Bookean for 
    -- the conditional expression, found type character."

    -- If the type of the conditional expression is compatible, the
    -- following line is executed.
    output(IR_File, "jumpf " || if_rec.label_1)

  end if_test

The other needed analyzer methods will be similar.

If Statements: Boolean Expression and Control Structure Semantics