May 12, 2004
This article was contributed by Steven Bosscher and Diego Novillo.
Earlier this week, the first bits a major compiler internals overhaul
have been merged into the development mainline of the
GNU Compiler Collection
(GCC) for inclusion in the next release.
GCC is used as the system compiler for GNU/Linux and many other
operating systems.
The system compiler is one of the components of an operating system
that has a massive impact on the performance of the system as a whole.
From the kernel to productivity applications, from the C library to
even the compiler itself, almost all executable binaries are compiled
with the system compiler, so it has to be stable and produce good code.
It is therefore not surprising that major changes to the internals of
a stable compiler almost never happen.
But computer architectures change, so at some point an aging compiler
will have to undergo big surgery or risk becoming irrelevant.
And GCC is aging.
While GCC produces reasonably good code for a large number of
architectures, even its most recent version essentially builds on the
compiler framework started by Richard Stallman in the early 1980's.
In this framework, code improving transformations are performed on an
intermediate representation called Register Transfer Language (RTL),
an architecture independent, lisp-like assembly language.
Older versions of GCC used this framework mostly for local
optimizations,
but such limited optimizations are insufficient for modern architectures
with RISC-like properties and a significant difference between the speed
of the chip and of memory access.
So, with the release of GCC 3.0, a number of global optimizations acting
on RTL were introduced.
Unfortunately, for many code transformations, RTL is not a suitable and
effective representation because it is too close to the actual machine
language.
This hinders several of the high-level analyses performed by modern
compilers.
It has become more and more obvious that a new, high-level intermediate
representation needs to be added to GCC.
The
Tree SSA project has been started to address this need.
The goal of the Tree SSA project is to build a completely
machine-independent optimization framework based on the Static
Single Assignment (SSA) form.
SSA is an intermediate representation (IR) that is becoming increasingly
popular because it allows efficient implementations of data flow analysis
and optimizing transformations.
In SSA form, every temporary variable is only assigned a value once.
Actual programs are seldom in SSA form initially, because variables tend
to be assigned multiple times, not just once.
An SSA-based compiler modifies the program representation so that every
time a variable is assigned in the original program, a new version of the
variable is created.
Different versions of the same variable are distinguished by subscripting
the variable name with its version number.
Variables used in the right-hand side of expressions are renamed so that
their version number matches that of the most recent assignment.
It is not always possible to statically determine what is the most recent
assignment for a given use.
These ambiguities are the result of branches and loops in the program's
flow of control.
To solve them, the SSA form introduces a new type of operation called
PHI functions, these merge multiple incoming assignments to generate a
new definition; they are placed at points in the program where the flow of
control causes more than one assignment to be available.
 |
 |
| Figure 1 | Figure 2 |
For example, consider the code fragment in Figure 1,
where it may not be known at compile time
which of the branches will execute.
The USE-DEF chains for 'x' are drawn in the figure.
In the second 'switch', the compiler has to assume that any of the
assignments to 'x' in the first switch may have been executed.
In this case, the SSA conversion process will introduce a PHI function
for 'x' to create the needed unique definition,
as shown in figure 2.
Notice that PHI functions are an artifact used internally by the SSA
form and are never emitted in the final code.
The PHI function that defines 'x_4' in the previous example simply means
that 'x_4' can take the value of 'x_1', 'x_2', or 'x_3' at run time.
Once the program is in SSA form, flow of control and USE-DEF chains
are explicitly represented in the intermediate representation,
giving almost instantaneous information to passes like constant
propagation and folding.
The properties of the SSA form greatly simplify data flow analysis,
and indeed many traditional compiler optimizations, such as constant
and copy propagation and also some forms of common subexpression
elimination, are relatively straightforward and fast on functions
and even whole programs represented in SSA form.
Before work on these optimizations could start, a whole new optimization
framework had to be implemented:
- A new intermediate representation.
GCC already constructed each function as an abstract syntax tree (AST),
but there was no single AST representation in GCC.
Instead, each language defined its own trees which were translated
piecewise to RTL and then optimized in the old framework.
With Tree SSA, two new language independent representations have been
added to resolve this issue.
All the language front-ends now emit a very high-level IR called GENERIC.
Each function is handed over to the language independent parts of the
compiler as a tree in GENERIC form.
Next, this tree is lowered to GIMPLE form, another new IR derived from
the SIMPLE representation proposed by the McCAT project out of McGill
University.
The GIMPLE representation looks like three-address code. All side
effects are explicit so that a function in GIMPLE form is ready for
analysis.
Most of the existing front-ends have been modified to emit GENERIC so
that they can be optimized using Tree SSA.
The next release will also include a Fortran 95 front-end, which is
the first front-end built directly to emit GENERIC.
- Analyses for rewriting the GIMPLE representation in SSA form.
In the old framework, no optimizations were performed on the AST.
This meant that there was no need for a control flow graph, or for
data flow analysis to be performed.
All of this is now necessary before a representation can be rewritten
into SSA form.
To avoid unnecessary code duplication, a lot of effort was spent on
rewriting the old framework so that it was possible to share many of
the basic control flow graph manipulations between the old and the new
framework.
Data flow analyses had to be implemented from scratch.
One particularly interesting analysis is alias analysis.
GCC now implements several types of alias analysis:
type-based flow-insensitive analysis,
flow-insensitive points-to analysis,
and flow-sensitive points-to analysis.
Most analyses are currently intra-procedural, although some
inter-procedural analyses are partially implemented or planned.
- Passes for performing the actual code optimizations.
Passes that have already been implemented include sparse conditional
constant propagation, partial redundancy elimination, dead code and
dead store elimination, and scalar replacement of aggregates. Also, a
lot of dominator tree based optimizations and some conditional
execution conversions have been implemented.
Many of these passes replace equivalent passes that work on RTL.
All the new parts together account for about 100,000 lines of new code,
not including the many changes to existing parts of the compiler.
The framework implemented as part of the Tree SSA project adds a whole
new path to the compilation process, while no RTL passes have been
disabled yet.
Still, a compiler with the Tree SSA passes enabled is not significantly
slower than the recently released GCC 3.4.0, and a number of very
expensive passes in the RTL framework have already been subsumed by
Tree SSA passes.
Once these RTL passes have been disabled and removed, the resulting
compiler will be a lot faster than GCC 3.4.0, while the generated code
is at least as good, and often better.
(
Log in to post comments)