March 17

Monday, March 18

Scope and Symbol Table Construction

We give here an example of symbol table construction. The approach is to build a new symbol table for each scope encountered, stacking new symbol tables on top of the old ones, and popping (destroying) symbol tables when a scope is left.

An Intuitive Example

To understand what goes into symbol table construction, an example is warranted.

Consider the following program skeleton.

program Fred;
  var A, B, C : Integer;

  procedure A1(A : in Integer);
     var Q, X : Real;
         Ch   : Character;

     function Snort(W : in Integer) return Integer;
       var X : Integer;
     begin -- Snort
       X := A + C;
     end Snort

  begin -- A1
    Write (Snort(C));
  end A1;

  procedure Sort(A : in out List; N in Integer);
  begin
    X := 4.5;
    Sort(A, N/2);
  end;

  begin --Fred
    Write(A, X);
  end.

Constructing the Symbol Table -- An Example

An Example

Look at the above program. As we start to parse, we encounter the three variables, A, B, and C in the declaration:

    var A, B, C : Integer;

They should each be placed into Fred's symbol table, as shown.

At this point the parser encounters the declaration of procedure A1 in the line

    procedure A1(A : in Integer);

The name of this procedure belongs in Fred's symbol table, along with its attributes. The attributes include the number, types, and modes of the parameters.

At this point, the parser drops into procedure A1 and begins parsing there. This procedure represents a new scope, and one good way to handle an new scope is to create a new symbol table for it. Following the example given in constructing Fred's symbol table, this yields the following.

Notice that we have linked A1's symbol table to Fred's symbol table. This takes care of the fact that all identifiers inside A1 are in scope inside A1 as are all of the identifiers in Fred, unless the same identifier name is used in both scopes. In the latter case, the first identifier that matches, starting with the innermost scope and moving out (i.e., starting with the top of the stack and moving down) is used.

Now parsing begins inside Snort. Following the same procedure, we create a new symbol table for Snort, make that symbol table the top of stack symbol table, and link it with the containing scopes, as shown.

Inside Snort, we encounter the statement:

       X := A + C;

Notice how each identifier is resolved. We look for X in Snort, and we find it. We look for A in snort, and we do not find it, so we follow the link to the next symbol table in scope and find A there. We then look for C in Snort's symbol table. We don't find it, so we follow the link to A1's symbol table, and we don't find it there either. So, we follow the link from A1's symbol table to Fred's symbol table, and we find C there. To successfully translate

       X := A + C;

our program would need to be sure that all of the types are compatible with the operations to be performed (a topic to be covered later). If an identifier is not found in this search of scopes, it hasn't been declared and an error must be generated.

At this point we are done parsing function Snort. The action taken is to remove Snort's symbol table (i.e., pop it off the stack), because it is no longer needed. None of its identifiers will ever be in scope for any other part of the program. This leaves the symbol table structure looking like:

At this point, we begin parsing the body of procedure A1, where we encounter

    Write (Snort(C));

Notice that identifier Snort is in A1's scope, so we can generate code to make the call to Snort. We would have to ensure that the actual parameter (C) is declared and of the right type for the type and mode of the formal parameter. We find C in scope in Fred's symbol table and see that it indeed is a proper match for the formal parameter of Snort, whose description we find in A1's symbol table.

Once we are done parsing the body of procedure A1, we pop its symbol table, leaving the symbol table structure looking like:

Another curious thing happens at this point. We encounter a new procedure declaration (procedure Sort) in Fred's scope. This means that we have to add a new entry for Sort in Fred's symbol table as well as a new symbol table for Sort, as shown below:

This should help us understand why we need prototypes. Even though by the end of the program, Sort is in Fred's scope, we don't encounter it until we reach Sort in our parse. This means that although procedure A1 should technically be able to call Sort, it can't, because when we are parsing A1, Sort is not in Fred's symbol table.

What if we had a situation in which A1 needed to call Sort and Sort needed to call A1? A solution to this problem would be to include a prototype of Sort early on, if the programming language allows it. That is, a statement similar to

prototype procedure Sort(A : in out List; N in Integer);

could be included in the same location where Fred's variables are declared. The compiler could put all of the necessary information into the symbol table for Sort at that point. Later, when the complete procedure for Sort is encountered, its body can be translated at that time.

Back to the situation we are in in our example, once the compilation of procedure Sort is completed, we can discard Sort's symbol table, leaving:

Now we can compile the body of Fred.

  begin --Fred
    Write(A, X);
  end.

At this point the compiler would uncover an error. The variable X is not in Fred's symbol table, so the compiler would print a message similar to "Variable X on line 34, column 5 is undefined."

Summary

We have just completed a long walkthrough of how the symbol table structure would be constructed as an example program was parsed. The main things to note are:

A new symbol table was constructed for each scope. In this program, this meant that as each procedure or function was encountered, a new symbol table was constructed and stacked on top of the current symbol table.
As variables are entered into the symbol table, their attributes must also be entered. These can be quite complex for array, record, and other advanced types.
If a program has user defined types allowed, then the type names and attributes must be entered into the symbol table.
If a program is object oriented, then the class names and attributes must go into the symbol table (because objects can be declared to be of the class type).
The name of the function or procedure that starts a new scope and hence a new symbol table goes in the current symbol table, not the new symbol table. All of the attributes of the procedure or function must also go in the current symbol table: parameter modes and types, the location for calling the procedure or function, and the return value for functions. After this is done, the new symbol table for the new scope becomes the current symbol table.
The new current symbol table must also have the names and attributes of the parameters of the function or procedure for which this symbol table is being constructed, but not the name of the procedure or function. The name of the procedure or function is in the outer scope, so that it can be called from that scope.
When a scope is left (e.g., at the bottom of a procedure or function) the symbol table is eliminated.
When checking for (finding) a name in the symbol table, the current table is searched, then each succeeding symbol table on the stack until a match is found, at which point those attributes are used for the name. If no match is found, there is an error (undeclared identifier, for example).