case, for, and classes

Control Structure Semantics Continued

Case Statements (continued) and For Loops

The General Form of Control Structure Translation

As seen last time, the primary aspects of the translation of control structures are

the generation of unique labels on demand
insertion of labels (label statements) into proper control points in the code
the generation of conditional and unconditional jumps in the code in the proper places to effect the control structure

The Approach

Always do some hand translations first. If you can see how to translate the structure by hand, you will get some great insight into how to translate it with a compiler.
Carefully check the language definition. Especially with the For and Case statements, there is a wide diversity of implementation among the different programming languages.

With respect to this final point consider the following examples.

The Case statement:

The original Pascal had a case statement, but no others clause. This made the case statement pretty worthless, because case statements seldom handle every case individually, and therefore need an others clause. To get around this problem, Case statements in Pascal had to be embedded in an If statement which checked the case expression to see if it was one of the handled cases. The case statement was then put in the Then clause of the If statement, while the Others part was put in the else part, a very awkward construct.
In C like languages, each case in a switch statement must be ended with a break statement. This annoying aspect of C is actually very dangerous, because it is easy to forget the break statement, leading to incorrect programs.
In most languages, if no case handles the computed case value, execution just continues after the end of the case.
In Ada, all possible case values must be handled by one of the case clauses (whens) or the others clause.

The important thing to remember is that although we can study some of the basic aspects of translating For and Case statements, many details will be different based on the specifications of the programming language.

Translating Case Statements

Case statements require more thought to translate well. The first approach that comes to mind is to translate the Case statement the same way an If with Elsif clauses is translated. That would mean that every case clause (e.g. each When) is evaluated, and if the case value doesn't match one of the current when values, a jump to the next When clause is inserted. It would be better if it were possible to jump straight to the right case without being required to check each case clause individually. This can be accomplished with a jump table.

Consider the following case statement:

Case Choice is
  when 3 =>
    Statements 1
  when 6 ==>
    Statements 2
  when 4,7 => 
    Statements 3
  when others =>
    Statements 4
end Case

Suppose that Choice is of type Integer with range 1..10 (that is, the only values Choice can have in it are 1 through 10). The translated case body might look like:

Code to evaluate case expression (Choice)
Code to jump to the proper when clause

Label L1
Code for Statements 1
Branch Out

Label L2
Code for Statements 2
Branch Out

Label L3
Code for Statements 3
Branch Out

Label L4
Code for Statements 4
Branch Out

Label Out

This looks fairly straightforward except for one thing. How can we generate code that will jump to the proper label based on the value in the case expression? For instance, in our example, if Choice evaluates to 7, we would need to have code that would jump to label L3. The trick is to build into the code something we call a jump table. The jump table could have entries for every possible case, as in the following

Jump L4 ;case 1
Jump L4 ;case 2
Jump L1 ;case 3
Jump L3 ;case 4
Jump L4 ;case 5
Jump L2 ;case 6
Jump L3 ;case 7
Jump L4 ;case 8
Jump L4 ;case 9
Jump L4 ;case 10

Notice that for all 10 possible values for Choice (1 through 10) there is an unconditional Jump command that jumps to the proper label for handling that particular case. So, for example, the 4th entry in this jump table corresponds to the case that Choice = 4, and that entry says to Jump to label L3. That's the place in the code where the case that Choice = 4 is handled. Where do we put this jump table? Well, during parsing, we won't know the names of the labels or the values that trigger each When clause until we are done parsing the entire Case statement, so it looks like it must go at the end of the case statement translation. This gives

Code to evaluate case expression (Choice)
Jump L5 + Choice ;Code to jump to the proper when clause

Label L1
Code for Statements 1
Branch Out

Label L2
Code for Statements 2
Branch Out

Label L3
Code for Statements 3
Branch Out

Label L4
Code for Statements 4
Branch Out

Label L5
Jump L4 ;case 1
Jump L4 ;case 2
Jump L1 ;case 3
Jump L3 ;case 4
Jump L4 ;case 5
Jump L2 ;case 6
Jump L3 ;case 7
Jump L4 ;case 8
Jump L4 ;case 9
Jump L4 ;case 10

Label Out

Notice the green changes. We put in a jump to label L5 + Choice, and we make L5 the label of the start of the jump table. After evaluating Choice, the result can be added to the label address for L5 to get to the proper position in the jump table. Once there, that entry causes a second jump to the proper label for handling that case. Notice that Label Out is after the jump table for obvious reasons.

It looks like the semantic analyzer must do a few different things to build the jump table properly. First, each time a When clause is encountered, a label must be generated, and then this label must be kept in a list of labels associated with the values triggering that When clause. When the end case statement is encountered, the semantic analyzer must use this list to generate the table as shown.

The table, of course, could get quite large, especially if Choice could take on any integer value in our example. In this case, the table is just trimmed down to hold only those values between the maximum and the minimum values that actually show up in one of the When clauses. If choice evaluates to a value outside of this range, the others clause is selected automatically. This would give:

Code to evaluate case expression (Choice)
Code to Compare Choice with MinWhenValue
Code to Jump L4 if Less
Code to Compare Choice with MaxWhenValue
Jump L4 if Greater
;Code to jump to the proper when clause
Jump L5 + size_of_jump_instruction * (Choice - MinValue) 
Label L1
Code for Statements 1
Branch Out
Label L2
Code for Statements 2
Branch Out
Label L3
Code for Statements 3
Branch Out
Label L4
Code for Statements 4
Branch Out
Label L5
Jump L1 ;case 3
Jump L3 ;case 4
Jump L4 ;case 5
Jump L2 ;case 6
Jump L3 ;case 7
Label Out
Some issues

We might not know MinWhenValue and MaxWhenValue until the entire Case statement has been parsed. One way to handle this is shown below. Another way is to perform what is called "backpatching." Backpatching refers to "going back in the translated code and patching up or patching in some statements that could not be completed until more information about the program is uncovered as the compilation progresses."

We have inserted enough extra checks that for small case statements, it might be better to translate them as we do if-then-elsif-else statements.

In some cases the table could still be very large. In that case we might want to follow the if model of translation.
 
Code to evaluate case expression (Choice)
Jump L0 
Label L1
Code for Statements 1
Branch Out
Label L2
Code for Statements 2
Branch Out
Label L3
Code for Statements 3
Branch Out
Label L4
Code for Statements 4
Branch Out
Label L5
Jump L1 ;case 3
Jump L3 ;case 4
Jump L4 ;case 5
Jump L2 ;case 6
Jump L3 ;case 7
Label L0
Code to Compare Choice with MinWhenValue
Code to Jump L4 if Less
Code to Compare Choice with MaxWhenValue
Jump L4 if Greater
;Code to jump to the proper when clause
Jump L5 + size_of_jump_instruction * (Choice - MinValue)
Label Out
 

Notice the parts in red. We have inserted code to check the evaluation of Choice with the minimum and maximum values that appears in any of the when clauses. If the value is outside this range, we just jump to the others clause directly. Otherwise, we jump to the Jump table, which has now been trimmed to have rows only for each value between the minimum and maximum represented in the When clauses.

Translating For Loops

The For statement:

Some languages allow For statements to have floating point index values
Some languages allow For statements to have incrementing values different than 1
Some languages require that the upper bound expression for the loop be recomputed each time through the loop, whereas others require that this upper bound expression be computed only once, at the start of the loop.
In some languages, the loop index variable must be declared as a regular variable, and its value is known outside the loop. In other languages, the loop index variable is automatically declared upon entrance to the loop, with the loop being a new scope for that variable; the variable ceases to exist outside the loop, so its value is not known there.

Suppose we were translating the loop:

for I in Start..Start*10 loop

  statements

end loop;

Suppose that this programming language spedifies that the loop termination condition (Start*10) be evaluated only once. Then we might have code similar to

code to compute starting expression (Start)
code to compute ending expression (Start*10), save in end_value
code to compare Start > Start*10 
Branch on true to Out
code to set index to starting value expression (I <-- Start)
Label Top_Of_Loop

  code for statements

Check Index for equality with end_value
Branch on True to Out
Code to increment the index
Branch Top_Of_Loop
Label Out

Besides generating this code, the compiler would need to generate a new symbol table for this loop with the loop index (I) and its type as the only entry, if this language specifies that the loop index variable is declared at the beginning of the loop and only exists during the loop. At the end of the loop, this symbol table would have to be destroyed. The semantic analyzer would also have to check that the type of I is proper for a loop.

You can see what changes would need to be made for different language specifications. For example, if the ending value for the loop had to be recomputed each time through the loop, the code for recomputing the expression would need to be included inside the loop rather than outside the loop.

Classes and Objects

Consider a typical class:

public class CounterClass
{
  private int counter = 0;

  public static final char BOY  = 'M';
  public static final char GIRL = 'F';

  public CounterClass2()
  {
    // This constructor does nothing.  Use it when you want to start
    // the counter at 0.
  }

  public CounterClass2(int startingValue)
  {
     counter = startingValue;
  }

  public void plusOne()
  {
    counter++;
  }

  public void minusOne()

  {
    counter--;
  }

  public int getCounter()
  {
    return counter;
  }

Now consider a use of this class in declaring objects.

public class Countem
{
  public static void main (String[] args) throws Exception
  {
    CounterClass numberOfBoys  = new CounterClass();
    CounterClass numberOfGirls = new CounterClass(10);
    char gender;

    while (true)
    {
      System.out.print("Please enter M for a boy or F for a girl > ");
      gender = BasicIo.readCharacter();
      if (gender == 'M')
      {
        numberOfBoys.plusOne();
      }
      else
      {
        numberOfGirls.plusOne();
      }
      System.out.println("Boys  = " + numberOfBoys.getCounter());
      System.out.println("Girls = " + numberOfGirls.getCounter());
      System.out.println("Total = " + numberOfBoys.getCounter() + numberOfGirls.getCounter());
    }
  }
}

How do you suppose classes and objects might be compiled?