J. Neal Richter
Montana State University
Doctoral Student, Computer Science
Home CS 550  
Copyright 2003. All rights reserved. Contact: richter@cs.montana.edu

Compiler Optimizations for Modern Hardware Architectures - Part 1

Neal Richter

CS 550 (Fall 2003) Class Presentation

The presentation will be held on Wednesday November 26, 2003 in ???? room.
(I am co-presenting with Bob Wall - see part 2 of the presentation here).


The topic of optimization in modern compilers is a broad one. Google handily offers up the following Web definition:
compiler optimization - (n.) Rearranging or eliminating sections of a program during compilation to achieve higher performance. Compiler optimization is usually applied only within basic blocks and must account for the possible dependence of one section of a program on another.

There is an abundance of ways in which this process can be done (many of which can be combined to further increase overall performance); we found it particularly interesting to focus on optimization techniques that are aimed at improving the performance of modern computing systems. Certain architectural advancements in systems, and in CPUs in particular, realize higher performance gains if the code they are executing is arranged to take advantage of hardware features.

These hardware features include the following:

  • cache organization - most modern systems have a multi-level organization of caches to help mitigate the difference in speed between the processor and main storage (disk or memory). Anything the compiler can preserve locality of reference between successive memory accesses will help to maximize use of the cache. This is especially significant in terms of the compiler for the instruction cache.

  • large register files - some modern processors have a very large set of registers; many RISC architectures included 32 registers, and the new Itanium architecture will have 128 integer and 128 floating-point registers. While this will simplify some register allocation problems, the large number may complicate the process.

  • pipelining - the logic required to execute even extremely complicated instructions can often be generated such that it can execute in a single clock cycle. For example, you could construct a block of logic that could add 16 16-bit integers simultaneously. Unfortunately, the complexity of the logic would severely limit the maximum clock speed. Luckily, it is possible to break up the logic for many operations into smaller pieces that can be executed sequentially; since each of these blocks is smaller and simpler, it can operate much faster. The trick is to take a stream of instructions and feed it into this pipeline, so that in each clock cycle a different sub-component of each instruction is being executed.

  • multi-instruction (or fine-grain) parallelism - many modern processors are capable of issuing or executing multiple instructions in parallel (in addition to the partial instructions being executed simultaneously in the pipeline). There are different techniques for doing this, including superscalar, superpipelined, and very large instruction word (VLIW) architectures.

We will present an overview of these hardware features and why they impose (or at least suggest) demands on the compiler's optimizer, and look at some ways in which these optimizations can be performed.

Featured Materials

The Presentation (800x600)

The Presentation (1024x768)

Required Reading (please peruse them): Other Interesting Links:


For the papers, we have tried to provide links to the URLs from which they can be downloaded. Note that many of them go through the MSU Library's Electronic Journal Finder to the ACM Digital Library - this means that in order to follow the link, you will be prompted to enter your ID number and password.

Jong-Jiann Shieh and Christos A. Papachristou, "On reordering instruction streams for pipelined computers," in Proceedings of the 22th Annual International Workshop on Microprogramming and Microarchitecture, Dublin, Ireland, pp. 199--206, August 1989. SIGMICRO Newsletter, 20(3), September 1989.

Phillip B. Gibbons and Steven S. Muchnick. "Efficient instruction scheduling for a pipelined architecture. SIGPLAN Notices, 21(7):11--16, July 9 1986. Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction.

John L. Hennessy and Thomas Gross Postpass Code Optimization of Pipeline Constraints, ACM Transactions on Programming Languages and Systems, Volume 5 , Issue 3 (July 1983), Pages: 422 - 448 .

Allen, Randy and Ken Kennedy. Optimizing Compilers for Modern Architectures, Morgan Kauffman Publishers, San Francisco, CA, 2002.

A whole book on this very topic - half the book surveys modern architectures and describes the challenges they impose for compilers. They focus on the concept of using the analysis of data dependence to direct the optimization. They then look at dependence-based methods applied to superscalar and VLIW architectures.

Muchnick, Steven S. Advanced Compiler Design and Implementation, Morgan Kauffman Publishers, San Francisco, CA, 1997.

This seems to be a very good presentation of topics related to the "back end" of the compiler - intermediate representation, run-time support, and in particular optimization. I primarily used material from the chapters on register allocation and code scheduling.

Hennessy, John L., David A. Patterson, and David Goldberg, Computer Architecture: A Quantitative Approach, Morgan Kauffman Publishers, San Francisco, CA, 1990.

The seminal book on advanced computer architectures - a must-have for the hard-core bit head. (The third edition was published in 2002.)

Patterson, David A. and John L. Hennessy. Computer Organization & Design: The Hardware / Software Interface, Morgan Kauffman Publishers, San Francisco, CA, 1994.

A companion to the Hennessy / Patterson book - actually, more of a precursor or introduction to topics that are covered in more detail in the other book. This is an excellent starting point for learning about computer architectures. (The second edition was published in 1997.)

Here's an amusing little site that spells out the differences between the Hennessy/Patterson book and the Patterson/Hennessy book.

Fox, Armando, Michael Hsiao, James Reed, and Brent Whitlock. "A Survey of General and Architecture-Specific Compiler Optimization Techniques."

The featured paper for the presentation - a good overview of various optimization techniques.

Schneck, Paul B. "A Survey of Compiler Optimization Techniques," ACM/CSC-ER Proceedings of the Annual Conference, 1973, pp. 106-113.

An older paper on the topic - it discusses optimizing FORTRAN, which gives you an idea of how old it is! The list of references containes references to much of the pioneering work.

Bacon, David F., Susan L. Graham, and Oliver J. Sharp. "Compiler Transformations for High-Performance Computing," ACM Computing Surveys, Vol. 26, No. 4, Dec. 1994, pp. 345-420.

A much more extensive survey on the topic of compiler optimization, with more attention to hardware-specified optimizations. The reference list is enormous.

Jouppi, Norman P. and David W. Wall. "Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines," in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Mass., 1989, pp. 272-282.

An early paper exploring the impact of emerging superscalar architectures on compiler optimization. (Both Jouppi and Wall were very involved with the design of the original superscalar processors.)