By Jonathan Corbet
June 22, 2011
It has been clear for some years now that static analysis tools have the
potential to greatly increase the quality of the software we write.
Computers are well placed to analyze source code and look for patterns
which could indicate bugs. The "Stanford Checker" (later commercialized by
Coverity) found a great many defects in a number of free software code
bases. But within the free software community itself, the tools we have
written are relatively scarce and relatively primitive. That situation may
be coming to an end, though; we are beginning to see the development of
frameworks which could become the basis for a new set of static analysis
tools.
The key enabling changes have been happening in the compiler suites.
Compilers must already perform a detailed analysis of the code in order to
produce optimized binaries; it makes sense to make use of that analysis for
other purposes as well. Some of that already happens in the compiler
itself; GCC and LLVM can produce a much wider set of warnings than was once
possible. These warnings are a good start, but much more can be done.
That is especially true if projects can create their own analysis tools for
project-specific checks; projects of any size tend to have invariants and
rules of their own which go beyond the rules for the source language in
general.
The FSF was, for years, hostile to the idea of making it easy to plug
analysis modules into GCC, fearing that a plugin mechanism would enable the
creation of proprietary modules. After some years of deliberation, the FSF
rewrote the license exception for its runtime modules in a way that
addressed the proprietary module worries; since then, GCC has had plugin
module support. The use of that feature has been relatively low, so far,
but there are signs that the situation may be beginning to change.
An early user of the plugin mechanism was the Mozilla project, which
created two modules (Dehydra and Treehydra)
to enable the writing of analysis code in JavaScript. These tools have
seen some use within Mozilla, but development there seems to have slowed to
a halt. The mailing list is moribund and the software does not appear to
have seen an update in some time.
An alternative is GCC MELT. This
project provides a fairly comprehensive plugin which allows the writing of
analysis code in a Lisp-like language. This code is translated to C and
turned into a plugin which can be invoked by the compiler. MELT is
extensively documented; there are also slides from a couple of tutorials on
its use.
MELT seems to be a capable system, but there do not appear to be a lot of
modules written for it in the wild. One does not need to look at the
documentation for long to understand why; the "basic hints" start with:
"You first need to grasp GCC main internal representations (notably
Tree & Gimple & Gimple/SSA)." MELT author Basile
Starynkevitch's 130-slide
presentation on MELT [PDF] does not get past the introductory GCC material
until slide 85. MELT, in other words, requires a fairly deep
understanding of GCC; it's not something that an outsider can pick up
quickly. The lack of easy examples to work from is not helpful either.
More recently, David Malcolm has announced
the release of a new framework which enables the creation of plugins as
Python scripts which run within the compiler. His immediate purpose is to
create tools for the development of the Python system itself; the most
significant checker in his code tries to ensure that object reference
counts are managed properly. But he sees the tool as potentially being
useful for a number of other projects and even for prototyping new features
for GCC itself.
At a first glance, David's gcc-python-plugin mechanism suffers from the
same difficulty as MELT - the initial learning curve is steep. It is also
a very young and incomplete project; David has, by his own admission, only
brought out the functionality he had immediate need for. The analysis code
seems more approachable, though, and the mechanism for running scripts
directly in the compiler seems more natural than MELT's compile-from-Lisp
approach. It may be that this plugin will attract more users and
developers than MELT as a result.
Or it may just be that your editor, being rather more proficient in Python
than in Lisp, naturally likes the Python-based solution better.
In any case, one conclusion is clear: writing static analysis plugins for
GCC is currently too hard; even capable developers who approach the problem
will need to dedicate a significant chunk of time to understanding the
compiler before they can begin to achieve anything in this area. The
efforts described above are a big step in the right direction, but it seems
clear that they are the foundations upon which more support code must be
built. It's hard to say when it will reach the tipping point that inspires
a flood of new analysis code, but it's easy to say that we are not there
yet.
GCC is not where all the action is, though; there is also an interesting static analysis
tool which has been built with the LLVM clang compiler. Documentation
of this tool is scarce, but it appears to be capable of detecting some
kinds of memory leaks, null pointer dereferences, the computation of unused
values, and more. Some patches have been posted to add a plugin feature to
this tool, but they do not seem to have proceeded very far yet.
Back in May, John Smith ran the checker on
several open source projects to see what kind of results would emerge.
Those results have been posted on the net;
they show the kind of potential problems that can be found and the nice
HTML output that the checker can create. Some of the warnings are clearly
spurious - always a problem with static analysis tools - but others seem
worth looking into. In general, the clang static analyzer seems, like the
other tools mentioned here, to be in a relatively early state of
development. Things are moving fast, though; this tool is worth keeping an
eye on.
Actually, that is true of the static analysis area in general. The lack of
good analysis tools has been a bit of a mystery - given the number of
developers we have, one would think that a few would scratch that
particular itch. Your editor would not have minded living in a world with
one less version control system but with better analysis tools. But the
nature of free software development is that people work on problems that
interest them. As the foundations of our static analysis tools get better,
one can hope that more developers will find those foundations interesting
to build on. The entire development community will benefit from the
results.
(
Log in to post comments)