Scanner II, Partial Implementation
Posted January 29
Due at the start of your lab period the following week
Objectives
The objectives of this assignment are
- the implementation of a working scanner (with some advanced features omitted)
- the testing of the scanner against some prepared files
To Do
You are to implement many features of the scanner this week. The driver needs to
be upgraded as well to print a file of tokens returned from the scanner. In
particular, you need to do the following.
- Implement a driver called mp (for microPascal) that takes a command
line parameter that is the name of the source text to be scanned. For
example,
mp program1.mp
would be typed if the name of the file you want to scan is called program1.mp
- Implement your scanner in your chosen implementation programming language. The scanner should
be written so that the driver can access the following methods:
- open_file
- get_token
- get_lexeme
- get_line
- get_column
- The open_file method or procedure, when called by the driver, is
to take one
parameter, the name of the file to be scanned that was entered by the user
as a command line parameter (see the first bullet). The other interface components are
self-explanatory. Note only that method get_token essentially
does the job of the dispatcher (as described in the lectures).
- Upgrade the driver to continuously retrieve tokens
from the scanner and to print a token file. The token file is to be
a standard text file with one line per scanned token containing the
following information in this order, spaced nicely so that the output is
easy to read:
token 1 line number 1 column number 1 lexeme 1
token 2 line number 2 column number 2 lexeme 2
.
.
.
where token 1 is the first token scanned, line number 1 is the number of the line on
which the token was scanned, column number 1 is the column on that line where the token
begins, and lexeme 1 is the lexeme corresponding to the token, and so on for each line.
(Notice that scanner errors are not handled at this time.) For
example, the first two lines of the output file might read
mp_begin 1 3 begin
id 1 9 Number_Of_Students
- You must implement the scanner according to the design given in the lecture. The
scanner should have a standard dispatcher that first skips white space and then
examines (but does not consume) the first
non-white space character to select the proper finite state automaton for scanning the
token.
- Implement each finite state automaton (augmented for practicality) as a separate method.
- Each finite state automaton is to be written using the case form given in
class.
This is just the first part of scanner implementation. You will be testing your
program against files that only contain valid tokens at this point. In particular,
you are not required to do the following this week.
- handle scanner errors
- deal with comments
We will be talking about these issues shortly, and if you are running ahead
of schedule, you can implement these as we discuss them.
Special Requirements
The following things should be noted:
- lexeme length is unbounded
- you are to use the standard list of tokens posted on the class resource page to
complete the assignment, plus the new ones given in the lab for this week.
- you are to use the standard names given for the tokens when printing token names
(these names should also be used as enumerated type values in your program if that is how
you are doing your implementation)
To Turn In
- Your source and executable files must be available on esus ready for demonstration. In
other words, the instructors should be able to retrieve your executable file
mp and run it
against test files.
- You must be ready to run your program against a test file during the lab
in which this milestone is due.