Friday, January 16
Reasons for compilers and high level languages
Computers today are similar in one respect to all but the very earliest
computers. They all have a machine language that is a set of instructions
that are expressed in binary (usually written in hexadecimal for brevity).
An instruction might look something like
7C0300000010
where 7C is the operation code for the instruction (perhaps meaning LOAD), 03
is the register, and 00000010 is the address. In this case the instruction
might mean that the hardware is to load register 3 with the contents of memory
location 00000010.
The very earliest computers were essentially "rewired" when they
were to execute a different program. Patch panels were used for this
purpose. Once the idea of the stored program was formulated (usually
attributed to John von Neumann, although it is said that Ada Agusta, Countess of
Lovelace, daughter of the English poet Lord Byron, actually proposed the idea to
Charles Babbage a century earlier), computers no longer had to be rewired to be
reprogrammed. Instructions were stored in memory just like data and were
executed by bringing them from memory into the processor one at a time and
decoded and executed there.
The problem was writing a program using numeric codes. It was
difficult and error prone. It wasn't long before assembly languages were
invented as a way to circumvent this problem. An assembly language
instruction for the above might be
LOAD R3, X
where X is a symbolic name for the address 00000010. Note that assembly
language is
- much easier to write
- much easier to understand
- has a 1-1 correspondence with the numeric machine instructions
In order for assembly language to work, though, a program called an assembler
is needed. The assembler translates the mnemonic assembly instructions
into the binary number form of the actual hardware instructions.
Still there are serious drawbacks to assembly language.
- It is still difficult to write large programs in assembly language
- It is still difficult to read assembly language programs
- There is no structure to assembly language that enforces good program style
- It is platform dependent
High level languages were thus developed. This was quite a leap, as it
was widely believed that the translators for high level languages would produce
such inefficient code as to be nearly useless. The first high level language
was Fortran (for FORmula TRANslator), which proved that the worries about
inefficiency were largely unfounded. New high level languages
proliferated, and with them was born the study of compilers, needed to effect
the translations of the high level languages into the languages of the computers that were targeted.
Notice that high level programming languages really are abstract
languages. No computer "understands" them. To get a
concrete program that a computer can execute, a compiler must be written to make
this translation from the abstract to the concrete.
The objective of a compiler
Recall the objective of a compiler. Given a program in a source programming language, translate the program into
the machine language of some target computer and operating system.
Source Program
Target Machine
---------------------> Compiler ----------------------->
File
Code File
There are a number of modules to the compiler.
Source Token Parse Intermediate
----------> Scanner ----------> Parser -------> Semantic Analyzer --------------->
Program File Tree Code (IR) | |
_________________________________________________________________________________v
v
| Optimized Target Machine
>----------- Optimizer -------------> Code Generator ------------------>
IR Code
The portions are usually referred to as the front end of the compiler,
whereas the green portions are referred to as the back end.
Reasons for a front end and a back end
As discussed in the last lecture, the front end of a compiler serves a number
of purposes:
- The front end is target platform independent. That is, the process
of finding tokens in a generic ascii file (scanning), determining whether the
source program token stream represents a syntactically correct program in the
source language (parsing), and conversion of the source language program into
some intermediate form that is closer to the machine language of most extant
computers (semantic analysis) can all be done without concern about which real
platform is the eventual target.
- The same front end can thus be used for a particular high level
programming language regardless of how many real computer platforms might be
targeted. This an efficient way to design compilers.
- The intermediate language program that is produced by the front end
represents a program for a virtual (i.e., nonexistent) computer. Thus,
the potential exists for constructing a virtual machine (a program that
interprets the intermediate program) that can be compiled and run on many
different computers. This is how the Java Virtual Machine (JVM) works.
The JVM is a program that must be compiled on each different real computer
platform in order to make it possible to run "byte codes" programs those
computers.
Similarly, the back end serves a number of purposes.
- Since the back end translates an intermediate form program into an
executable program on some particular hardware/operating system platform, it
needs no knowledge of the original source program. Thus, works to
complete the compiler for any front end for any source language.
- Having an existing backend for a particular intermediate code makes the
development of compilers for this target machine much quicker, as only a new
front end for the new language needs to be developed.
An Example of a Complete Translation
An example of a Pascal program that has been translated into the machine code of the
DEC Alpha (esus) is given here.
- The Pascal source program
- The intermediate form output by the semantic analyzer is often in a form
that is not easily printable, such as an "abstract syntax
tree." For debugging purposes the pc compiler does allow you to
see an assembly language form of the program (which is technically not an
intermediate form, but is not yet in executable form).
- The hexadecimal version
of the executable (translated) program can be viewed to see what it looks
like. Note that the numbers you see here do not represent ascii
characters. This type of file is called a "binary file" (even though
all files are binary) because the information is stored as binary numbers,
not as binary codes for characters. That is, you really cannot make
sense of this file unless you know what the binary numbers are for the
various op codes, registers, and so forth are for this machine language.
- An ascii character set table for making
conversions between ascii characters and hexadecimal codes and vice versa.
You can use it to see the contents of the hexadecimal version of the source
program.