procedures and functions

The Semantics of Procedures

Points of Interest

Procedure bodies are translated just like program bodies. So, the new things to learn about procedures have to do with the following points reached during compiling:

Point 1 -- Branch Around Procedures: Since code is generated in sequential order, there must be a branch inserted to get around the code for procedures and functions when the main program first starts executing. It is assumed that program execution will start from the top when the translated program is actually run, so a jump instruction must be executed to force program execution to continue with the code that is the translation of the begin block for the main program.

Point 2 -- Procedure Declaration: The point at which a procedure definition is first encountered requires symbol table calls to put the name of the procedure and its attributes into the symbol table and then to create a new symbol table on the top of the stack for the new procedure. All of the new procedure's parameters and variables (along with their attributes) must be placed into this new table.

Point 3 -- Activation Record Initialization Time. Consider the begin block of a procedure. This is the entry point to the procedure, the code that is to start executing when this procedure is called. At this point the compiler must generate special IR related to starting a procedure running, such as dropping the label for the procedure and completing the setup of the activation record on the stack for this procedure (the initial part of the activation record setup for this procedure is done at the point of call).

Point 4 -- Procedure End: When the end of a procedure is reached, actions must be taken to begin the process of removing the activation record from the stack for this procedure (the rest is done at the point of call) and returning to the point of call.

Point 5 -- Main Program Code Start: The label generated in point 1 must be dropped here. This is to ensure that a jump can be made from the start of the translated code around any intervening procedure and function code to this part of the program, which is to be where execution starts The activation record for the main program must also be constructed at this point..

Point 6 -- Procedure Call: A procedure call requires semantic actions that generate code to set up the calling sequence properly, to start the construction of the activation record for the called procedure, to put the actual parameters into the activation record for the called procedure, and then to make the actual jump to the procedure.

Point 7 -- Procedure Return: Since code to place the actual parameters onto the run time stack is placed just before a call to a procedure, it makes sense that this part of the run time stack be handled by code that is inserted at the point of return from the call. In some languages this code will take care of copying the values in the formal parameters back into the actual parameter locations as necessary (this won't be necessary in your project, because mPascal has only "copy" (non VAR) and "reference" (VAR) parameters). The "copy" parameters don't need copied back into the actual parameters, because the actual parameters are not intended to change. The "reference" parameters don't need copying, because the original actual parameter values were changed each time the corresponding formal parameter values were changed, because the address of the actual parameter was supplied rather than its value at call time.

Perhaps more than any other point in the development of a compiler you need to be very disciplined in your mind with respect to the compiling process when considering procedures. The reason why is that in translating procedures we need to be considering three issues simultaneously:

symbol table issues (parse time)
how to generate code to correctly deal with procedure call mechanisms (semantic analysis time)
how the activation records will appear on run time stack when the translated code actually runs (run time)

It is easy, for example, to get the symbol table and the activation records of the run time stack confused as if they are the same thing (they are not), so just keep a clear head.

Sample Program

Consider the following program for the rest of this discussion.

program Fred;
  -- Point 1: Output code to jump around possible intervening procedures and/or functions

  var
    a, b :  Integer;
    x    :  Real;

 -- Point 2:  Procedure definition: symbol table issues
 procedure Mary(a, m: in Integer; y: in out Real);

    var
     c: character;

    -- Point 3: Procedure code begin; drop label, generate code for completing AR setup
    begin -- Mary
     <statements>
    --Point 4: Procedure end; generate code for starting AR cleanup and return
    end;

  -- Point 5:  Deop label for Fred's code start
  begin -- Fred
    <statements>
    -- Point 6:  Procedure call -- generate code to start AR setup, to put actual
    --    parameters on the stack in the AR, and a jump to subroutine instruction
    Mary(3, a*b, x);
    -- Point 7:  Procedure return -- generate code to complete parameter transfers
    --    if necessary and to remove the rest of the AR from the run time stack
   <statements>
  end.

An Overview

Before going into detail about each of the five points let's give an overview to help map out where we are going. We will discuss these points with respect to the sample program above.

Point 1: Since code is generated on the fly as the compiler moves through the program sequentially, the code for procedure Mary will be produced and inserted into the translation file (IR_file) before the code for Fred. When our translated program begins running, though, we want it to begin with Fred's translated code. So, we must generate a label and output a jump instruction to that label to force execution to go around the code for Mary. The label needs to be dropped at the beginning of Fred.

Point 2: During parsing, the name Mary must be inserted into Fred's symbol table along with the fact that Mary is a procedure at nesting level 1. The order, type, and mode (but not the name) of each parameter must be included as attributes of the name Mary. Finally, there must be a label generated and inserted into the table that will be the branch point for calls to Mary. A new symbol table must be then created for Mary. In Mary, the names, types, modes, and offsets of the parameters must be included, too. In Mary, the parameters must be able to be accessed in a manner similar to variables, so they have offsets.

Point 3. At the beginning of the procedure for Mary, a number of things must be done. The label in the symbol table entry for Mary (in Fred) must be inserted at this point, so that the code for Mary can be jumped to properly when a call to Mary is executed. Code must be output during semantic analysis that, when executed at run time, sets up an activation record for Mary on the stack. The actual parameters must be associated properly with Mary's formal parameters.

Point 4. At the end of procedure Mary, code must be generated that, when executed, will remove Mary's activation record from the stack and reset the run time stack properly. Depending on how parameters are handled, some code might need to be inserted to ensure that the actual parameters have the proper values. Finally, the last instruction of Mary's code must be a jump that returns control to the point in the code from which Mary was called (Mary can be called from numerous different places with different actual parameters). See Point 6.

Point 5. At the start of Fred's code, the label generated in point 1 must be inserted so that the jump described in Point 1 will work as intended.

Point 6. At the point of the procedure call, the parser must determine that the call to Mary is proper (by looking in the symbol table to see if Mary is there, and if the number, types, and modes of the Mary's formal parameters match the symbol table information about Mary. If this check is ok, code must be generated by the semantic analyzer that provides the actual parameters of the call in a place where they can be accessed by the translated procedure Mary. Then, code must be generated that jumps to Mary. Usually there is a special instruction that can be inserted at this point (e.g., jsr L3, or "jump to subroutine located at label L3") that pushes the value of the PC onto the stack and then jumps to label L3). The saved PC value is the return address to be used by Mary when Mary's code has finished executing to return to the proper place from which Mary was called this time

Now, let's begin looking at these points in order.

Point 1 -- Inserting a Jump Around Intervening Procedures and Functions

At this point in the compiler, the semantic analyzer must be called to do three things

generate a new label
save this label in a semantic record, or in Fred's symbol table
output the line "jump Ln" to the IR file, where Ln is the label just generated.

The label needs to be saved, because when the begin block for Fred is encountered, this label will need to be dropped into the IR file so that the jump is executed properly at run time.

Point 2 -- Procedure Definition -- Constructing the Symbol Table

There are three distinct semantic actions to take at this point:

Insert information about Mary into both Fred's symbol table and Mary's symbol table.

Consider the symbol table. What needs to go into Fred's symbol table (at Point 1) is enough information to allow calls to be made to Mary (at Point 3) properly. Notice that calls can be made from many different locations, including from within Mary.

Fred's symbol table after variables a, b, and x have been compiled:
 ________________
| Fred | 0 | L1  |        
|______|___|_____|___ ______
| a    | Integer | 0 | VAR  |
|______|_________|___|______|
| b    | Integer | 4 | VAR  |
|______|_________|___|______|
| x    | Real    | 8 | VAR  |
|______|_________|___|______|
The first line in the symbol table has entries corresponding to the name of this symbol table (Fred) its nesting level (0, for use as the display register number), and the label for the branch to the begin block for Fred (L1).

Fred's symbol table after the procedure definition for Mary has been compiled:
 ____________________ ______
| a    | Integer | 0 | VAR  |
|______|_________|___|______|
| b    | Integer | 4 | VAR  |
|______|_________|___|______|
| x    | Real    | 8 | VAR  |
|______|_________|___|______|____ ___    __ _______    __ _______    ___ ____
| Mary |         |   | PROC | L4 |  -|->|In|Integer|  |In|Integer|  |in |Real|
|______|_________|___|______|____|___|  |__|______-|->|__|______-|->|out|___-|-> //
The line for Mary in the symbol table gives the name of the identifier (Mary) no entry for the type or offset, because Mary is not a variable with a type or offset, the kind of identifier Mary is (PROC), the label for Mary (L4) so that branches to Mary can be made, and a linked list of each parameter along with its mode and type in the order in which the parameters appear.
We need at least this much information in Fred's symbol table to process calls to Mary. On the other hand, we also need just about the same information in Mary's symbol table to translate Mary. In Mary's case, we will also need the names of the parameters and where these parameters will be stored (their offset in Mary's activation record).

Mary's symbol table after the procedure statement is compiled:
 ________________
| Mary | 1 | L6  |        
|______|___|_____|___ ___________ ________
| a    | Integer | 0 | Parameter | In     | 
|______|_________|___|___________| _______|
| m    | Integer | 4 | Parameter | In     |
|______|_________|___|___________| _______|
| y    | Real    | 8 | Parameter | In Out | 
|______|_________|___|___________|________|
As with program Fred, the first line in the symbol table for Mary includes the name of this symbol table (Mary) , its nesting level (1, for use as the display register number), and a label (L6) for branching to the begin block for Mary. Remember that there may be intervening procedures and/or functions declared inside of Mary at nesting level 2.

Mary's symbol table after the variable declaration for variable c has been compiled:
 ________________
| Mary | 1 | L6  |        
|______|___|_____|___ ___________ ________
| a    | Integer | 0 | Parameter | In     | 
|______|_________|___|___________| _______|
| m    | Integer | 4 | Parameter | In     |
|______|_________|___|___________| _______|
| y    | Real    | 8 | Parameter | In Out | 
|______|_________|___|___________|________|
| C    | Char    | 12| VAR       |        |
|______|_________|___|___________|________| 

Notice that as far as the code in Mary is concerned, the parameters look just like variables. One difference is that in generating code referring to identifier y in Mary, the semantic analyzer must check the symbol table to discover that y is really an in out parameter, which means that its location, D1(8) contains an address, not a value (there are other ways to translate in out parameters, for example by copying the value of the actual parameter into the formal parameter at call time, and then recopying the formal parameter value into the actual parameter value upon return). Thus, generated code referring to y must used indirect addressing. Semantic action calls must therefore be placed appropriately in a procedure definition to carry out the above actions, including:

Creating a new symbol table for Mary
Inserting the name Mary into the symbol table for Fred
Inserting all of the attributes of Mary's parameters in both the symbol table for Fred (in the linked list of parameter attributes), and in the symbol table for Mary as shown, more like regular variables.

Point 3 -- Procedure Begin

Code must be generated here, including

creating a new label that indicates where Mary starts, so that jumps to Mary can be made wherever a call to Mary occurs in the source code (this label should be in the symbol table for Mary)..
instructions for saving the old D1 display register
setting D1 to point to the correct place on the stack
modifying the stack pointer properly for the new activation record for Mary
sometimes, a special IR instruction is inserted here: STARTSUBPROG. This IR instruction is an indication that when actual machine code is generated from this IR, any special instructions that apply especially to procedure and function invocation, such as an instruction that saves all registers (so that they can be used afresh in Mary), will be used.

Point 4 -- Procedure End

When the end of a procedure is encountered during a parse, semantic actions must be included to generate IR that

Restores the display register from its saved value
Moves the stack pointer down (popping the current activation record)
Returns to the place in the program where this procedure was called from. This address changes from call to call, so the return address must somehow be passed to the called routine at the time of the call.

Point 5 -- Begin Block

Here,

the label for procedure Fred must be dropped into the IR File
code to set up the activation record for Fred must be inserted

Point 6 -- Procedure Call

When a procedure call is made, as the case for point 3 in the example above, where Mary is called, semantic actions must be included that

ensure that the call is proper by making sure that the identifier is a procedure in the symbol table and that the number, types, and modes of the parameters all agree between the call and the procedure definition (in the symbol table).
generate IR to transmit the parameters (e.g, by putting them or their addresses, depending on the parameter modes, on the run time stack)
passing to the called procedure the return address (e.g., by putting it on the run time stack).
branching to the start of the procedure (there is usually a single instruction in most machine languages that does both of these last two steps in a single instruction, such as JSR L5, which pushes the current PC value onto the stack and then replaces the PC value with the address of label L5)