Tracing is a hot issue in the Linux community, mostly as the result of the
actions of an allegedly friendly company: Sun Microsystems has been putting
a lot of marketing energy into telling customers that DTrace makes Solaris
a better system. The fact of the matter is that Linux does not lack
tracing tools, but it
seem to lack tools which are usable to
the wider community. The second day of the 2008 kernel summit started with
a pair of sessions dedicated to determining where the gaps are and trying
to figure out what to do about them.
James Bottomley started with a description of his experiences with
SystemTap, the utility which is most often cited as our answer to DTrace.
He had a lot of trouble getting it to work with his system. In his mind,
the root cause for all this trouble is the simple fact that nobody from the
development community is actually using SystemTap. A quick query of the
room suggested that about half of the developers present had tried using
SystemTap at one point or other; maybe 20% actually succeeded. So there is
a roadblock of sorts here; SystemTap needs attention from kernel developers
to progress, but those developers find it unsuited to their needs and
difficult to use, so they tend to ignore it.
But kernel developers are not the targeted user base for a tool like
SystemTap; it is aimed at end users and deployed systems. To help clarify
what those users need, Vinod Kutty from the Chicago Mercantile Exchange
took some time to talk about his needs for tracing tools. In general,
these users need a higher level of visibility into running, production
systems. They need to be able to track down slowdowns, look at the
environment in which processes are running, and, in general, to be able to
look in corners of the system which nobody will have anticipated in
advance. All of this has to happen while the system is running in
production; it is, he says, somewhat like needing to look under the hood of
a car while driving at 100mph.
Also useful is the ability to run tracing tools in a "flight recorder"
mode, where an administrator can look at historical data after something
goes wrong. And it is necessary to be able to look at user-space events as
well as those from the kernel. Events generated in user space are often
more meaningful to the people running the system. All of this is needed to
be able to communicate with distributors about where the problems come up,
so that the distributor can work toward a fix. Current tracing tools for
Linux are insufficient.
Linus asked: is tracing needed primarily to track down bugs or to find
performance issues? It turns out that performance problems are the big
issue. James asked what parameters were the most important; the answer
mentioned individual process I/O events, user-space events, and the ability
to map user events to kernel-level events.
Moving on, Vinod also noted that low impact is important; the tracing tool
cannot place a heavy load on the system. These tools need to start
quickly. Current tools are far too big. There is also a need for good
filtering; tracing tools can generate a lot of data. Administrators
and developers need a way to boil down all that data to an amount they can
deal with. Even better are tools which can spot problems in the trace
stream and raise red flags when they happen.
And, of course, tracing tools really cannot crash the system while they are
running. SystemTap still falls a little short in this area; it's not hard
to bring down a system while trying to trace it. Adding a DTrace-style
virtual machine was discussed; in theory, a VM can make the tracing tool
demonstrably safer. Vinod responded that it could be useful, but the proof
of maturity is in watching the software run for a while.
This is where Linus came in to proclaim that he hates every tracing tool he
has seen. SystemTap is far too complicated; these tools need to be
simpler. Adding a virtual machine to SystemTap would just make things more
complicated; that's not the way to fix its problems. According to Linus,
most of the problems solved by tracing come down to figuring out scheduling
issues, and we have the tools to do that now. We should be making better
use of the simple tools which are currently in the kernel before trying to
put more complicated
stuff in. We should, for example, make latencytop work better
and push to get it into the enterprise distributions. This "use the tools
we already have" suggestion came back many times during the session.
Christoph Hellwig brought up another recurring theme: while dynamic tracing
is nice, there is a lot of value to be had from well-placed static trace
points which are managed by the maintainer of the code. Matthew Wilcox
added that the user-space trace points (for DTrace) added to PostgreSQL have
proved to be highly useful for database administrators; people running
PostgreSQL now have a strong motivation to do so on Solaris systems. We
would do well to match that functionality on Linux.
A key component of SystemTap is the collection of "tapsets," scripts which
allow a user to look into the kernel for specific information. These
tapsets are a problem, though; they are tightly tied to a specific kernel,
but the kernel is constantly being changed. So tapsets go stale quickly.
Moving these tapsets into the kernel might help, but they will still be a
separate body of code which is prone to breaking. Static trace points,
which can be maintained directly with the code they monitor, are much more
likely to continue to work in the long term.
Martin Bligh noted that Google maintains a set of 20-30 static trace points
for use with the LTTng trace tool. This very small set of trace
points is sufficient to solve most problems that Google encounters. Martin
will, hopefully, be posting those trace points for inclusion into the
mainline, though Google's associated tools might not be available.
Vinod finished this portion of the session by stating the he likes the
tapset concept. It allows him (or somebody in his group) to write a script
aimed at a specific situation, and others can make use of it immediately.
There's no need to wait for the release of a more specialized tool.
Mathieu Desnoyers spent a few moments introducing the LTTng tracing package. LTTng is a static
tracing tool, depending on markers placed in the kernel itself. It has
been designed for high performance and simplicity; that, in turn, should
help to make it safe to use on production systems. All LTTng trace points
have to be in the kernel code itself and be maintained by the appropriate
The core kernel code includes a module for precise time stamping (needed to
preserve the ordering of events which go through different per-CPU relay
buffers), the relaying code, and a netlink-driven module to control
tracing. There is a user-space library and, of course, a set of analysis
tools. LTTng can support "flight recorder" mode which can initiate tracing
when a specific trigger situation comes about. There is also a mechanism
for putting markers into user space.
Frank Eigler spent some time talking about SystemTap; he used much of that
defend the design decisions which had been made. When the SystemTap
project started, the kernel had almost no tracing features at all, so
they had to pick a path that worked. Since there is a lot of hostility to
putting a virtual machine into the kernel, they had to go with code
generation instead. They used kprobes because that was the mechanism that
was available. And so on. In general, SystemTap has a lot of the same
objectives as LTTng, plus, of course, the dynamic tracing feature. There
are "some demos" showing working user-space tracing.
James stated that there was a real need for the users of these tools -
kernel developers, in this case - to provide input into how they work.
Frank responded that the SystemTap team has been crying out for people to
help. It's clear, though, that this particular user base is not
sufficiently engaged in the development process. It was said that the real
users of SystemTap are Red Hat consultants, who find that it works well
with the standard RHEL kernel. But people trying to use SystemTap with a
current mainline kernel have to download "a shaky weekly tarball" to try to
make it work. Until SystemTap is easier to use with the mainline kernel,
it will be a hard sell in the development community.
The problem there, of course, is that keeping SystemTap current while it is
out of the mainline tree is always going to be a struggle. Resolving that
problem will require getting more of that code merged. It seems that the
core SystemTap code is about 15,000 lines - small, according to Frank.
This could maybe go in, but Linus is resistant, saying that we need to get
the current, simple, in-kernel tracing tools into a usable state before we
try to add more of them.
Ted Ts'o remarked that there is a real difference with SystemTap: it is the
only Linux-based tracing package which, like DTrace, allows users to run code at the
trace points. Thus it is able to do more complicated triggering,
filtering, and analysis. Thomas Gleixner responded that this is all good,
but what is really needed is a simple trace package which does not require
the installation of a whole set of new tools. He does tracing (using
ftrace) on a number of platforms, including embedded systems, and he isn't
willing to deal with the hassle involved in adding another complicated set
After that the conversation wandered into various, relatively obscure
technical topics like the details of how buffering mechanisms should work,
who should really be managing trace points, who manages instrumentation as
a whole, and so on. But there was a general sense that the summit wasn't
the venue for that kind of low-level detail, which isn't where the real
problems are anyway. The tracing topic will be revisited at the Linux
Plumbers Conference, so it was decided to defer much of the discussion of
the details until then.
to post comments)