Suppose that in a certain program, called Main, we are compiling (for simplicity's sake we will consider a program that has no functions or procedures) the following variables have been declared:
variables
N, I, J, X : integer;
A : array[-5..100,0..1000] of integer;
Z : float
Then the symbol table for this program would look like
Name Nest Size Link Main 0 //
lexeme kind type size (bytes) offset N var int 4 4 I var int 4 8 J var int 4 12 X var int 4 16 A array int 424424 20 indices --> -5,100 --> 0,1000 -->// Z const real 8 424444
As the translated code for Main begins executing, the run time stack will look like:
space for Z space for A
space for X space for J space for I space for N space for old D0 program code
Remember that at this point during run time D0 points to the start of the AR for main (i.e., it points to the location that now holds the old D0) and that SP points to the first free location on the top of the run time stack (just above the space for Z).
Consider the following code inside Main:
Read(X);
Read(N);
for I in 1..N loop
for J in 1..N loop
A(I,J) := A(I+3,J) * X / 5 * I;
end loop;
end loop;
As discussed before, one might be tempted to try to make this program faster by moving invariant calculations outside the loop in which they are invariant. For example, the expression X / 5 above does not change in either loop, so it could be moved to just outside the outer loop:
Read(X);
Read(N);
Temp1 := X / 5;
for I in 1..N loop
for J in 1..N loop
A(I,J) := A(I+3,J) * Temp1 * I;
end loop;
end loop;
We see that there are two invariant expressions with respect to the J loop: I+3 and Temp1*I. We could move these to just outside the J loop.
Read(X);
Read(N);
Temp1 := X / 5;
for I in 1..N loop
Temp2 := Temp1 * I;
Temp3 := I + 3;
for J in 1..N loop
A(I,J) := A(Temp3,J) * Temp2
end loop;
end loop;
The problem with this approach is that the code becomes practically unreadable. It is also, therefore, prone to error. It defeats the purpose of high-level language programming to force the programmer to go through such contortions to try to make a program run more efficiently. Actually, a programmer who uses a good compiler with excellent optimization capabilities doesn't need to do this, because:
A good optimizer can locate and move invariant code outside of loops automatically.
What this means is:
A programmer should always program in the clearest manner and leave code improvement up to the optimizer.
Of course, if the programmer can find a new algorithm for accomplishing the same task, an algorithm with a better time complexity, then the programmer should implement the better algorithm.
Good optimizers are even better than one might expect when writing in a high level language. For example, consider the original statement:
A(I,J) := A(I+3,J) * X / 5 * I;
The first section of code that we need to generate is the calculation of the address (location) on the run time stack of A(I+3,J) so that this value can be pushed. Notice that the symbol table places I at 8(D0) and J at 12(D0). Notice also that the lower bound for index I is -5 and the upper bound is 100, and that the lower bound for index J is 0 and the upper bound is 1000. Then we might have code similar to the following to compute this location.
-- Check that the first subscript value is at least as large as its lower bound Push 8(D0) Push 3 Adds Push -5 CompareGE BranchFalse BoundsError -- Check that the first subscript is at least as small as its upper bound Push 8(D0) Push 3 Adds Push 100 CompereLE BranchFalse BoundsError -- Check that the second subscript is at least as large as its lower bound Push 12(D0) Push 0 CompareGE BranchFalse BoundsError -- Check that second subscript is at least as small as its upper bound Push 12(D0) Push 1000 CompereLE BranchFalse BoundsError -- Compute where the I+3rd Row is on the run time stack Push 8(D0) -- push I Push 3 -- push 3 Adds -- I+3 on top of stack Push -5 -- push lower bound of first index Subs -- stack top is now the row of A normalized to 0 Push 1001 -- push the number of elements in each row of A Muls -- stack top now contains the current row offset into A Push 4 -- 4 bytes per integer location of A Muls -- compute byte offset of current row of A -- the stack top now contains the offset in bytes from the start of A -- to the I+3rd row of A -- Now compute the offset to the Jth column of A in this row Push 12(D0)-- push J Push 0 -- push the lower bound for J Subs -- normalize this value to zero Push 4 -- push number of bytes per element of A Muls -- compute the offset in bytes of J in a row -- The second stack element now contains the byte offset to the I+3rd row -- normalized to zero, and the stack top contains the byte offset to the -- Jth element (in any row) normalized to zero. -- Now add these two offsets to get the offset into A of -- element A[I+3,J] Adds -- At this point, the top of the stack contains the offset on -- the run time stack to A(I+3,J) from the start of A -- To get to the actual location of A(I+3,J) in the activation -- record for this procedure on the run time stack, the starting -- offset of array A must also be added to this value. Push 20 -- A starts at offset 20 from D0 Adds -- stack top contains offset from AR start to A(I+3,J) Pop T1 -- pop the offset to A(I+3),J into register T1 Push T1(D0) -- push the value at A(I+3,J) onto the stack -- Whew! At this point the value in A(I+3,J) has finally been pushed -- onto the stack! Push 16(D0) -- push X Muls -- A(I+3,J)*X on top of stack Push 5 Divs -- A(I+3,J)*X/5 on top of stack Push 8(D0) Muls -- A(I+3,J)*X/5*I on top of stack -- at this point, we need to generate code to -- pop the top of the stack (the result of the expression -- evaluation) into A(I,J). This in turn will require -- that we generate code that will check whether I and J -- are within range and code to calculate the location -- of A(I,J) on the run time stack.
Notice that all of this code is buried inside the inner nested J loop in the translation. Whew!
A good optimizer will notice that the code for checking that I is within the bounds is invariant in the inner J loop and move it out of the J loop. It will also notice that the calculation of the displacement to the Ith row is invariant in the J loop and move that code out of the J loop. Since there are two times when A is accessed in the loop, there are about 20 lines of code that are moved in this case out of the J loop . If we suppose that the outer loop runs 1000 times and the inner loop 1000 times, this will result in a savings of 19,080,000 instruction executions for just the calculation of the addresses of A(I+3,J) and A(I,J) alone!
The optimizing portion of a compiler will also take care of moving the x/5 out of both loops and moving the * I part out of the J loop. Some substantial savings will occur.
You can see why some compilers don't do range testing on array indices.
It's expensive. But it is also unsafe not to do it. So the Ada way of allowing the
programmer to turn range checking code generation on or off in the compiler is a
nice compromise.