if and case statements

Control Structure Semantics Continued

Short Circuit Evaluation and Case Statements

The General Form of Control Structure Translation

As seen last time, the primary aspects of the translation of control structures are

the generation of unique labels on demand
insertion of labels (label statements) into proper control points in the code
the generation of conditional and unconditional jumps in the code in the proper places to effect the control structure

Short Circuit Evaluation

For if statements the question arises as to whether the compiler should generate code to do short circuit evaluation. For example, in evaluating the logical expression

a < b AND b < c

it would appear that there would be no reason to evaluate b < c if a < b is false (the entire expression will be false in any case). Again, this is not up to the compiler writer to decide. It is specified in the language whether short circuit evaluation is to be performed or not. Ada, for example, has special syntax for the programmer to use if short circuit evaluation is desired. So does Java. Side effects are allowed in some languages, so, for example,

a < b AND get(b) < c

requires that the get operation be performed as part of evaluating this expression. The programmer may want this to happen regardless of whether a is less than b or not. (Side effects are in general not good things, but some languages allow them.)

Short circuit evaluation is often desirable from the programmers point of view. For example, one might have a loop in which an array is being examined, and the loop should terminate if either one of two conditions holds:

the end of the array is reached
the array element just beyond the one currently being considered is greater than or equal to the current array element.

This might be written in a loop in a fashion similar to

repeat

   . . .

until i = n OR a[i] >= a[i+1]

Notice that if i = n, the loop should terminate regardless of whether a[i] >= a[i+1] evaluates to TRUE or FALSE. A compiler can be written that generates the code for i = n, and (since it encounters an OR) also generate code that checks the result of evaluating the expression i = n, including a branch around the code generated for a[i] >= a[i+1] if the result of the evaluation of i = 1 yields true. This is called short circuit evaluation. It is actually more work for the compiler writer to generate code properly for short circuit evaluation than to just generate code that evaluates the entire expression in every case.

Programmers like short circuit evaluation in some instances. In the program fragment above, some languages would raise a constraint error if the size of the array was from 1 to n if the loop went as far as i = n, because, without short circuit evaluation, the part a[i] >= a[i+1] would also be checked, resulting in a constraint error (i+1 would be larger than n in this case). What the programmer would like is for i = n to be checked first, and not to have a[i] >= a{i+1] checked if i = n is true. Not all languages allow for this, though. You need to check the language specifications to determine what to do.

If Statement with Elsif Statements

Suppose that we had a programming language with If statements that had the elsif option. The grammar rules might look like

<statement> --> if <expression> then <statements> <if_tail> endif
<if_tail>   --> elsif <expression> then <statements> <if_tail>
            --> else <statements>
            --> l

The way to decide how to translate such a control structure is to determine by hand where the jumps and labels need to go. So, an example might be:

if 90 <= score and score <= 100
  then 
    write('A');
  elsif 80 <= score and score <= 89 then
    write('B');
  elsif 70 <= score and score <= 79 then
    write('C');
  elsif 69 <= score and score <= 69 then
    write('D');
  else
    write('F')
endif;

We next give a possible translation, where the code from processing the if statement itself is given in blue. The code in black is what is done by calls to <expression> and <statements>. Rather than using the display with offset notation, we have just put the names of the variables directly into the code (this, of course would not be the case normally). Also, we have given descriptive label names, like "else1" which would also not be generated by an automatic label generator.

push 90
push grade
leqs
push grade
push 100
leqs
ands
bf else1
push 'A'
writes
b endif

label else1
push 80
push grade
leqs
push grade
push 89
leqs
ands
bf else2
push 'B'
writes
b endif

label else2
push 70
push grade
leqs
push grade
push 79
leqs
ands
bf else3
push 'C'
writes
b endif

label else3
push 60
push grade
leqs
push grade
push 69
leqs
ands
bf else4
push 'D'
writes
b endif

label else4
push 'F'
writes
label endif

Looking at this tells us that we have a few decisions to make. At what points do we make calls to the semantic analyzer to generate the labels? At what point do we make calls to generate the proper code? Let's look at our rules again. We can certainly generate the first else label (else1 in the example) and the endif label as soon as we match the if token, which tells us we are in an if statement. We can keep these labels in the if_rec semantic record.

At each elsif and the else, we need to insert the proper label statement for the earlier branch to that point. This is also the best place to insert the branch to endif for the preceding then or elseif clause, since it is at this point that we know that there is code to branch around. For example, if there is no elsif or else, no jump around will be inserted. This means we need the following semantic actions.

<statement> --> if #start_if <expression> #if_test
                then <statements> <if_tail> endif #finish_if
<if_tail>   --> elsif #start_elsif <expression> #elsif_test
                then <statements> <if_tail>
            --> else #start_else <statements>
            --> l

Method Start_If will have the form:

procedure Start_If (if_rec out semantic_record);

and will be responsible for obtaining two new labels, one for the first elseif and one for the endif, and putting these into the if_rec semantic record.

Method If_Test will have the prototype

procedure If_Test(if_rec : in semantic_record);

It must

check to see if expression_rec (returned from the call to expression) has type Boolean and generate an error if it isn't
generate the line "Branch-on-false Ln", where Ln is the label for the else clause in the if_rec semantic record

In some grammars, such as the one you have for mPascal, the rule specified <boolean-expression> rather than <expression> in the if statement. In these cases, the check for whether the type of <boolean-expression> really is Boolean can be done as a call to the semantic analyzer in the procedure corresponding to <boolean-expression> rather than in the call corresponding to #if_test, as this check will actually need to be made in numerous places where Boolean expressions can appear.

Method Finish_If is responsible for dropping the endif label in the if_rec semantic record.

Method Start_Elseif will need to insert an unconditiional branch to the endif label, drop the current else label, and generate a new else label (for the next else clause).

Method Elsif_Test will need to ensure that expression_rec has type Boolean and insert a "Branch_on_false" line to the next else label if so (and generate an error if not).

Method Start_Else will need to insert an unconditional branch to the endif label and drop the current else label. However, it will not need to generate a new else label, because there can be no new else clauses following this one.

Compiling Case Statements

Case statements could be compiled the same way that the if with elseifs is compiled. That seems unsatisfactory, because we really would like to avoid having the compiled program check each case; instead we would like to somehow have the compiled program be able to jump directly to the correct case.

In fact, there is a way to do this. The way to do this is to build something called a "jump table." The idea is to evaluate the value of the case expression and then use this value to look up in a table which label to jump to. If the range of all possible case values is relatively small, this method can be efficient, otherwise it can be quite inefficient (in terms of space).

Suppose we are compiling the following case statement:

case choice is
  when 1 ==>
   write(a);
  when 4, 7 ==>
   write(b)
  when 3 ==>
   write(c);
  when others ==>
   write(d);
end case;

-- assume that choice has type integer with range 1..8

This could be translated as

;case
push 0(D0)    ;push choice onto the stack (done by expression evaluator)
pop r0        ;load register the top stack value (where the result of the
              ;case expression will be -- in this case, choice)
mul r0,'4',r0 ;multiply r0 by 4 (4 bytes), the jump instruction length
jump L1+r0    ;jump to location L1 (start of jump table) + value in r0 
              ;the computed value of the case expression to get to the correct
              ;jump to take us to the proper when clause in the case

;when 1 ==>
label L3
push 4(D0)  ;push a
writes
jump L2     ;jump to end case

;when 4, 7 ==>
label L4
push 8(D0)  ;push b
writes
jump L2     ;jump to end case

;when 3 ==>
label L5
push 12(D0) ;push c
writes
jump L2     ;jump to end case

;when others =>
label L6
push 16(D0) ;push d
writes
jump L2     ;jump to end case

;jump table
label L1
jump L3  ;choice 1 is processed in the case labeled L3
jump L6  ;choice 2 is not represented, so is done in the others clause (at label L6)
jump L5  ;choice 3 is processed in the case labeled L5
jump L4  ;choice 4 is processed in the case labeled L4
jump L6  ;choice 5 is not represented, so is done in the others clause (at label L6)
jump L6  ;choice 6 is not represented, so is done in the others clause (at label L6)
jump L4  ;choice 7 is processed in the case labeled L4
jump L6  ;choice 8 is not represented, so is done in the others clause (at label L6)

;end case
label L2

Note 1: In the virtual machine language for our project, we always put in text labels. Of course, this cannot really be done in machine code. Instead there must be actual (virtual) addresses generated. That is, there really are no labels in machine code.

Note 2: Notice how big the jump table would be if the range of possible choices were large. If it can be determined that the table will be "too large" the same approach can be taken to compile jump tables as for if with elseif clauses.

Note 3: The jump table goes at the end, because it is at that point during compilation when it is known how many cases there are and which labels should be jumped to in the table.

Note 4: There are other approaches that can be taken for building a jump table. The one above is a simple example of the general approach. For example, one can branch to the "others" clause from the top if the value of the case expression is smaller or larger than the greatest value found in any of the the various when clauses. However, at the start of compiling the case statement, the compiler doesn't yet know the values that will be found in each when. One way to handle this is to put incomplete instructions into the code and then return later to finish the instructions when enough information has been accumulated. This is called "backpatching." For example, the instruction

push X

can be placed temporarily in the code to stand for pushing the minimum value found on any of the cases onto the stack,, where x is just a dummy argument. At the point when compilation reaches the end case, the compiler knows all of the values that appeared in the when clauses, including the minimum value (e.g., 1), so the jump table can be constructed to start with the minimum value found and end with the maximum value found. The above unfinished line of code in the IR file can also be changed to push the now-known minimum value, as in

push '1'

Compiling Loops

Having seen the examples for compiling if statements and case statements, you should be able to figure out how to compile loops on your own.