Laboratory 4, February 5
Objectives
During one of our recent lectures, we discussed how the generation of the finite
state automata for each regular expression (for each token) is so repetitive,
mechanical, and, well, boring. The theory of finite state
automata and regular expressions tells us that there must be a way to design an
automatic scanner generator, because we saw the following theorems while
studying the theory
- There is an algorithm that turns any given regular expression into an
equivalent nondeterministic finite state automaton.
- There is an algorithm that turns any given nondeterministic finite state
automaton into an equivalent deterministic finite state automaton
- There is an algorithm that turns any deterministic finite state automaton
into an equivalent deterministic finite state automaton with the fewest
possible number of states
In addition, we have seen as we noted above that the process of turning a
finite state automaton into a procedure that does the work of that finite state
automaton is mechanical, so we could give our own new theorem:
- There is an algorithm that turns any deterministic finite state automaton
into a procedure in Ada (name your language here) that does the work of that
finite state automaton.
What does this all mean? It means that it should be possible to write a
program that takes a set of regular expressions as input and produces a program
as output that will scan an input stream for the tokens represented by the
regular expression:
A list of Scanner Scanner
-----------------------> Generator ------------>
regular expressions Program Program
One widely used scanner generator is the program lex (short for lexical
analyzer generator, which is another term for scanner generator). It
comes with most Unix distributions. We also have the gnu equivalent
available, called flex. Flex accepts a list of regular expressions
which it processes, producing a scanner in C. In order to use the scanner
generated by flex, it, of course, must itself be compiled by a C compiler.
The objective today is to learn how to use a scanner generator.
Turning Your Project In
Turn your project in according to the TA's instructions.
Preparing for Lab Exercises
Follow the TA's directions for preparing your workstation for running flex.
To Do
Do a "man" on flex to learn about flex. As seniors you should now be
able to read, understand, and apply information that you read about tools such
as flex.
Remember that this is a working laboratory. You should be actively collaborating with
other team members. You will need to follow your TA's
instructions about how to prepare a file for input to flex.
Do this in increments. First get flex to generate a scanner for something
simple, such as a file of identifiers. Try it out on some short files of
identifiers. Then, as you get more confidence, add more tokens.
Work with flex
- Use the flex-token.h file, the flex-driver.c file, and the tokens.html
file (just called tokens)
available on the web Resources page to finish
this lab exercise.
- Create input for flex for mPascal as described
by your TA.
- Test your flex-generated scanner with the flex-driver.c provided on the Resources
page. Do this in stages. For example, just include enough to scan simple tokens to start
with, such as identifier, and then test whether your flex-generated scanner can
handle input files consisting just of identifiers. Then incrementally add other tokens and
test the generated scanner with them.
You should study the flex driver to see
how a parser could be written that calls the scanner generated by flex.
To Turn In
With 10 or 15 minutes to go in lab, build a small text file that has
representatives of all the tokens you are now able to scan with your flex
generated scanner. Turn a listing of this text file along with the
corresponding list output from a run of your flex generated scanner. Turn these
two lists in. As usual, include the
following information on the standard cover page (see the previous lab for a
reminder).
- Your group number
- Lab 4
- Today's date
- Your team names and esus e-mail addresses