Once we have completed work on the parser, it is time to begin including semantics in our compiler. Remember, "semantics" means "meaning." What we are supposed to do in the semantic analyzer is to determine the meaning of the source program based on the parse tree produced by the parser and then translate it into a program in a different form (e.g., IR) with the same meaning. Eventually, the translated executable program we generate must run exactly as the original source program was intended to run. There are two parts to determining the meaning of a program and then translating that into a different form: construction of the symbol table (which contains the "meanings" of all the identifiers in the program), and the generation of IR.
We have discussed some of the issues surrounding the construction of a symbol table. Action symbols are inserted into grammar rules at points where we believe we should know enough information to insert an id into the table. These action symbols then become calls to public methods in the symbol table module. These action routines check that an identifier to be inserted is not already in the table and then to insert it and its attributes if not, or issue an error message if so.
The second responsibility of the semantic analyzer is to generate code. This is also one in similar fashion. We find places in our grammar where we should know enough information to actually generate a section of code, and then we modify our parser with corresponding calls to the Semantic Analyzer. These public methods in the Semantic Analyzer must then be written and included for this to work.
Semantic Records and Semantic Actions
In order to construct a symbol table and to generate code we will need some help. We use semantic records to hold semantic information about the section of the program we are currently parsing (e.g., the lexeme and type of a variable) so that we can update the symbol table (e.g., insert the lexeme and type of a variable into the symbol table) or generate code (e.g., check that variables in an arithmetic expression are type compatible for the operation being performed on them and then generating code to perform that operation).
In our recursive descent parser, the way we will determine the semantics (meaning) of various program pieces and carry out appropriate actions (e.g., perform a translation into the target code or insert information into the symbol table) is as follows:
First, we will focus on construction of the symbol table using semantic records and semantic actions, which in this case will be calls to the symbol table ADT.
Let's look an the same example. Suppose we had the following rules in our grammar.
<vardeclpart> --> var <vardecl> ; <vardecltail> <vardeclpart> --> l <vardecltail> --> <vardecl> ; <vardecltail> <vardecltail> --> l <vardecl> --> <idlist> : <type><idlist> --> id <idlistail> <idlistail> --> , id <idlistail> <idlistail> --> l<type> --> integer <type> --> fixed <type --> float
Recall again from last time that the rules for <idlist> can actually be reached from places other than the <vardecl> rules. This means that without other information, the parser procedure for <idlist> doesn't know whether or not the id's being found are part of a variable declaration or not. We discussed three solutions to this problem last time. We could:
- Change the rules of the grammar so that there are different id_list nonterminals for each different situation. For example, <var_id_list> could be a new nonterminal leading to rules that expand a list of identifiers that can only be encountered in a variable declaration section.
- Pass a semantic record to id_list whenever that procedure is called indicating from where the call was made. The information in this semantic record is called an "inherited attribute" of id_list, since it is inherited from rules farther up in the tree (e.g., the variable declaration rules).
- Just build a record that is a linked list of identifier lexemes, and pass this parameter back up to the procedure that called id_list. This record is the semantic information computed for nonterminal id_list. When the calling procedures get this information, they can then decide what to do with it (e.g., the variable declaration procedures would insert each lexeme in this linked list into the symbol table). The information in this semantic record is called a "synthesized attribute." This just means that the attribute was not inherited from above, but was constructed (i.e., synthesized) by this procedure or one below that was called by this procedure.
Let's explore solution 3.
We first remember that every nonterminal and token has an associated semantic record that we can use to pass information gathered in the parser and on to the symbol table or semantic analyzer. We will name these records the same as the nonterminal or token with a _rec suffix. So, for example, the name of the semantic record for token id is id_rec, and the semantic record for nonterminal <idtail> is idtail_rec.
An analysis of the grammar and leads to the insertion of the following semantic actions.
<vardeclpart> --> var <vardecl> ; <vardecltail> <vardeclpart> --> l <vardecltail> --> <vardecl> ; <vardecltail> <vardecltail> --> l <vardecl> --> <idlist> : <type> #symbol_table.insert(idlist_rec, type_rec)<idlist> --> id #analyzer.process_id(id_rec, idlist_rec) <idlistail> <idlistail> --> , id #analyzer.process_id(id_rec, idlist_rec) <idlistail> <idlistail> --> l<type> --> integer <type> --> fixed <type --> float
To explain, look at the rule
<idlist> --> id #process_id(id_rec, idlist_rec) <idtail>
After matching the id, we know enough to attach the lexeme of that id to the list of id's being constructed at this point. So, we make a new method in the Semantic_Analyzer module named process_id whose task is to take incoming id's and attach them to the list of id's. To accomplish this task, method process_id must be given the semantic record of the current id, which contains the lexeme matched, and the current list, in the semantic record for id_list.
The procedure for idlist would thus look like:
procedure idlist(out idlist_rec);var id_rec : semantic_record;begin -- idlist case lookahead is when id ==> Match(id, id_rec); analyzer.process_id(id_rec, idlist_rec); idtail(idlist_rec); -- must pass idlist_rec to idtail for further processing when others => Syntax_Error; end case; end idlist;
The procedure for idtail would be
procedure idtail(in out idlist_rec);var id_rec : semantic_record;begin -- idlist case lookahead is when comma ==> Match(comma); Match(id, id_rec); analyzer.process_id(id_rec, idlist_rec); idtail(idlist_rec); -- must pass idlist_rec to idtail for further processing when ... (follow set) => null; when others => Syntax_Error; end case; end idlist;
The procedure for
<vardecl> --> <idlist> : <type> #symbol_table_insert(idlist_rec, type_rec)
would look like
procedure vardecl; var idlist_rec : semantic_record initialized to null; type_rec : semantic_record initialized to "variable" for kind and null for type;begin -- vardecl; case lookahead is when id => idlist(idlist_rec); Match(colon); type(type_rec); analyzer.symbol_table_insert(idlist_rec, type_rec); when others => Syntax_Error; end case; end vardecl;
Notice that each time we find we need to gather information to pass to the semantic analyzer or to other procedures in the parser (for eventually submission to the semantic analyzer) we need to have semantic records associated with the appropriate tokens or nonterminals that can be filled up with the proper values. Every time we need to have an action carried out by the semantic analyzer (e.g., to build a list of identifiers or to insert values into the symbol table) we need to have a new public method in the analyzer to perform that task given the appropriately filled semantic records. Notice that the analyzer method for inserting values into the symbol table requires deconstructing the list of identifiers and calling the symbol table for each one to insert that identifier into the symbol table with its type. Of course it must also be determined whether this identifier already exists in this scope when the insert operation is performed.
The preceding sections described how semantic records and semantic actions can be included in the parser to cause variables to be inserted into a symbol table. Of course there are many different issues regarding symbol tables. Some of them are:
Let's consider these issues in more detail.
First, we assume that we all have decided how we are going to implement a symbol table. Each group has chosen a data structure, such as a hash table, a sequential list, or a binary search tree for implementing the symbol table.
Second, we assume that we all recognize that for scoping issues, we will need one symbol table per scope and that each new scope is stacked on the most recent one as a new scope (procedure or function declaration) is encountered, and that the topmost symbol table is popped (discarded) from the stack when the end of a scope (procedure or function end) is encountered during parsing.
Finally, we assume that we will all be implementing a common set of operations on symbol tables:
There may be one more operation on your symbol table depending on how you implement it:
Question: When should a new symbol table be created while parsing?
Answer: Whenever a new scope is encountered. (e.g., beginning of the program, beginning of each procedure and function, and for some languages in other places, such as a for loop in which the loop index variable is considered to be in scope only for the duration of the loop, or, as in C, the curly braces)
Question: When will the parser be able to determine during parsing that a new scope is encountered?
Answer:: The first scope, of course, will be for the program as a whole. Therefore, when it is clear that a program has been found, it is time to create the first symbol table. Other symbol tables will need to be created as the structures that indicate that a new scope is necessary are encountered, for example when the procedure, function, for, and other indicating tokens are found.
Instance 1
For mPascal, a reasonable place to include an action to create the first symbol table would be as soon as the program reserved word is encountered.
So, if we have a rule similar to
program_heading --> "program" program_identifier
then at the point in the parse when we have matched the reserved word program, we know that it is time to create a symbol table. We can therefore modify this rule to include an action symbol to reflect this fact, as in
program_heading --> "program" #create_symbol_table program_identifier
If we want to name this newly created symbol table with the program name, we will need to wait to create the symbol table until the program identifier has been parsed. This would give the following modification instead
program_heading --> "program" program_identifier #create_symbol_table(program_identifier_rec)
In this case we can create a symbol table and give it a name at the same time so that if we print the stack of symbol tables in existence at any point in a parse we can see (by also printing their names) which program, procedure, or function they belong to (this is the way you are to implement it).
Corresponding to this rule modification is a modification to procedure program_heading in the parser. It will look something like the following, where updates are shown in blue.
procedure ProgramHeading;variables program_identifier_rec: semantic_record;begin --ProgramHeading case lookahead is when reserved_program ==> match(reserved_program); -- procedure ProgramIdentifier, called next, must return the name of the program -- in program_identifier_rec ProgramIdentifier(program_identifier_rec); -- the next call causes a new symbol table to be created with the name given in -- program_identifier_rec. This is not an identifier to be inserted into the -- symbol table, but rather just used to give a name to the symbol table. SymbolTable.create(program_identifier_rec); when others ==> raise error exception end case;end ProgramHeading
Instance 2
There are other places where new symbol tables should be created, namely when a procedure or function is declared. Thus, a rule similar to:
ProcedureHeading --> "procedure" Identifier OptionalFormalParameterList
would be marked up as
ProcedureHeading --> "procedure" Identifier #create_symbol_table(identifier_rec)
OptionalFormalParameterList
and the procedure in the parser that handles nonterminal ProcedureHeading will have to be modified similar to the above. The rule given below will also need to be modified.
FunctionHeading --> "function" Identifier OptionalFormalParameterList
Look at the first part of this lecture fagain to see how to modify the grammar rules and the parsing procedures to insert values into the current symbol table. This is done in three places:
Variable Declaration Time: When variables are declared is one point at which identifiers and their attributes are placed into the current symbol table. The lecture from last time clearly shows how and where to do this.
Procedure Declaration Time: When a procedure declaration is encountered, the procedure name and attributes must be inserted into the symbol table. At this point we will consider parameterless procedures and functions only, so the only attribute associated with a procedure name is the fact that it is a procedure. We can handle this fact by modifying the rule given for procedure declarations given above further:
ProcedureHeading --> "procedure" Identifier #insert(identifier_rec, kind_rec) #create_symbol_table(identifier_rec)
OptionalFormalParameterList
Notice that there are now three semantic actions in this rule that will correspond to two different method calls in our compiler. The first, #insert, will cause the procedure name, found in identifier_rec, to be inserted into the current symbol table. The second, #create_symbol_table, will create a new symbol table for the new scope of this procedure.
Notice carefully here that in the expansion of OptionalFormalParameterList we need to
Of course, the issues that apply to procedures also apply to functions. In addition, the return type of the function must be associated with the function name in the scope in which the symbol table was declared.
When should a symbol table be destroyed? When the end of the scope is reached. That is, when the end of a program, procedure, or function is encountered during parsing, it is time to destroy the current symbol table. This would occur in rules like
ProcedureDeclaration --> ProcedureHeading ";" Block ";"
which would be modified as follows:
ProcedureDeclaration --> ProcedureHeading ";" Block #destroy ";"
The #destroy semantic action will be a call to pop the current symbol table from the stack, leaving the one below it on the stack as the new current symbol table.
This same thing will need to be done with respect to functions. For programs (as far as your project is concerned), it is not technically necessary to destroy the symbol table, because for microPascal, the end of the program is the end of the compilation process. This may not be true for programs that consist, say, of many different classes that are not embedded within each other.
Finally, in programming languages that allow for different scoping boundaries (for loops, braces (blocks), etc.) the end of the respective scoping construct is where the symbol table should be destroyed.
To determine whether an identifier found elsewhere in a program has indeed been declared, we simply need to find every place where an identifier is matched in a parse and include an action, say #lookup, that returns "true" if the identifier is found in the symbol table and "false" if it is not. Of course, in a real program much more will need to be done than this, but for now, we are considering just how to construct a symbol table and then to look up in it whether an identifier has been defined. We don't consider at this time whether identifiers are used properly for their type.
This process of looking at the grammar rules to see where actions go, in this case to create, insert values into the symbol table, look up values in the symbol table, and destroy a symbol table, is repeated for all semantic actions.