By Jonathan Corbet
April 21, 2012
Concurrency tends to make programming hard. Kernel development obviously
involves dealing with a lot of concurrency, but there is also a lot of
multi-threaded user-space development that suffers from the same issues.
It would be nice if the computer could help developers avoid race
conditions and other problems that arise in concurrent environments. Some
developers at Google have been working on just such a project for some
time, but they have just relocated the project from GCC to the LLVM Clang
compiler, saying that GCC is not suited to the work they want to do. The
result has been a sort of wake-up call for GCC developers. Is the GCC
compiler suite not well suited to the creation of static analysis tools?
Like kernel programming, multi-threaded user-space programming involves
creating and using locks to prevent concurrent access to shared data. In a
properly designed and implemented locking scheme, code will never see
inconsistent views of shared data, and that data will not change when the
code is not expecting changes. Getting to that point is hard, though, and the
bugs that result from locking mistakes can be hard to reproduce and hard to
diagnose. There are few things more frustrating than a highly intermittent
bug that seemingly makes no sense and defies efforts to track it down.
In 2008, Google developer Le-Chun Wu announced a
project to add support for "thread safety annotations" in C and C++
programs to the
GCC compiler. Since then, work has progressed to the point that the
developers have a useful system that is employed by a number of internal
projects. The ideas are relatively
straightforward. Shared data requiring a lock is annotated with something
like:
Mutex data_mutex;
struct shared_data Data GUARDED_BY(data_mutex);
Functions that manipulate locks are annotated with:
class some_class {
Mutex mutex;
void get_lock() EXCLUSIVE_LOCK_FUNCTION(mutex) { ... }
void put_lock() UNLOCK_FUNCTION(mutex) { ... }
/* ... */
};
There are also annotations for shared locks and "trylock" functions that
may not succeed. If a function expects a given lock to be held when it is
called, it can be annotated with EXCLUSIVE_LOCKS_REQUIRED(); there
is also a LOCKS_EXCLUDED() annotation for functions that will
acquire non-nesting locks themselves. Finally, this construction can be
used for lock ordering constraints:
class some_class {
Mutex m1;
Mutex m2 ACQUIRED_AFTER(m1);
/* ... */
};
Some other annotations exist; they can be seen in this slide
deck [PDF], which is where your editor stole the above examples from.
The GCC implementation sets itself up as an early optimization pass. It
builds a representation of the code from the GIMPLE internal
representation, tracking which locks are held at each point through the
function. When problems or inconsistencies are found, the compiler raises the
alarm, hopefully causing the problem to be fixed before it gets into
production code and bites somebody.
This code is available in a branch in the GCC repository and appears to be
useful, so it came as a bit of a surprise when, on April 19,
Google developer Diego Novillo announced
that the project had been terminated, and that the group was now working to
implement the same functionality in the LLVM Clang compiler instead. When
asked why the group was making this change, developer Delesley Hutchins responded:
The gcc version has been difficult to support and maintain, due
mainly to the fact that the GIMPLE intermediate language was never
designed for static analysis. The abstract syntax tree provided by
Clang is an easier data structure to work with for front-end
analyses of this kind.
The response added that the GCC implementation has "some
issues" that would make it hard to merge into the GCC mainline,
while the Clang implementation has been in the trunk all along.
The GCC developers, naturally, would like to understand what it is about
their compiler that made this project hard. The Google team was hesitant
to respond, seemingly unwilling to criticize GCC or to cause a compiler
flame war. Eventually, though, Delesley posted a more detailed description of the kinds of
difficulties they had run into. There is a lot of information there, but
it seems to come down to two crucial points:
- The GIMPLE representation loses information about the structure
of the original program that the static analysis pass really
needs to
have. LLVM, instead, builds an abstract syntax tree that much more
closely matches the original C/C++ syntax; additional structures are
then based on that tree. LLVM's tree is evidently well suited to the
task of static analysis.
- The GIMPLE representation changes significantly from one compiler
release to the next, causing a lot of things to break. That adds a
maintenance cost that the Google developers are unwilling to pay -
especially if basing on LLVM instead makes that cost go away.
There were other things that were better for their purposes in GCC, but, in
the balance, the developers concluded that LLVM is the better platform for
their work. They seem determined to make the shift and feel that attempts
to fix the problems they encountered in GCC would create difficulties
elsewhere in the compiler. So this move appears to be a done deal.
In one sense, this decision lacks serious implications for the free
software community. LLVM, too, is free software and the static analysis
code will be part of it. That said, it is worrisome if GCC's internal
structure truly turns out to be poorly suited to static analysis tasks.
There is a lot of interesting work being done in the static analysis area;
it offers the prospect of finding large numbers of bugs early in the
development process. Static analysis tools will almost certainly be an
increasingly important part of many developers' tool boxes. If those tools
cannot easily be implemented with GCC, that compiler's future may not be as
bright as it otherwise should be.
That said, it is dangerous to extrapolate from one project to the whole
field of static analysis tools. The GCC plugin mechanism is just beginning
to mature; it really has not had the time to turn into a platform that
complex tools can be built on top of. So, while this episode is a warning
that the GCC developers should hear (and evidently have heard quite well),
it should not be seen as evidence of fatal design flaws. Likely as not, it
is just an indication of a problem in need of solution - something the GCC
community is good at.
(
Log in to post comments)