When Sun looks to highlight the strongest features of the Solaris operating
system, DTrace always appears near the top of the list. Your editor
recently had a conversation with an employee of a prominent analyst firm
who was interested, above all else, in when Linux would have some sort of
answer to DTrace. There is a common notion out there that tracing tools is
one place where Linux is significantly behind the state of the art. So
your editor decided to take a closer look.
The Linux tool which is most similar to DTrace is SystemTap. This development is
supported by a number of high-profile companies, including Red Hat, Intel,
IBM, and Hitachi. Most distributions have SystemTap packages somewhere in
their systems of repositories, making it readily available to Linux users.
DTrace supporters have been known to say that SystemTap is merely a knock-off of
DTrace, and a badly-done one at that. SystemTap proponents will
counter that it is an independent development which can hold its own.
Both tools are based on the insertion of probe points in the system
kernel. Whenever a thread of execution hits one of those probe points,
some sort of action - as described in the tool's C-like language - is run.
That action can be as simple as printing a message, or it can be
significantly more complicated than that.
DTrace comes with a large set of pre-defined probe points wired into the
Solaris kernel - seemingly tens of thousands of them. These points are well documented and
cover most of the kernel. Some simple wildcarding is implemented for the
selection of multiple probe points. It is claimed that the run-time
overhead of unused probe points is negligible. [Update: see the
comments for some useful clarification on the use of dynamic probe points
SystemTap, instead, does not depend on static probe points within the
kernel; that capability exists, but nobody has much interest in maintaining
all of those points. Instead, SystemTap uses dynamic probes (implemented
with kprobes) which
are inserted into the kernel at run time. A flexible language can enable
probes to be easily inserted anywhere in the kernel, with fairly complete
wildcard support which allows, for example, all functions within a source
file or subsystem to be instrumented with a single statement. Unused probe
points do not exist at all, and so cannot affect system performance.
There are a couple of advantages to the DTrace approach. The probe points
exist and can be easily found in the manuals; a SystemTap user, instead, is
required to have a certain amount of familiarity with the kernel source
code. DTrace probe points are fixed at locations where it is known to be
safe to interrupt the execution of the kernel. The SystemTap
documentation, instead, comes with warnings that placing probes in the
wrong places can cause system crashes and mutterings about the possibility of
implementing blacklists in the future. The number of "wrong places"
appears to be quite small, but that is of limited comfort for an
administrator trying to observe the operation of a production system -
something which is supposed to be possible with either system. There is a
set of predefined points provided in the "tapsets" packaged with SystemTap,
but it is small.
The "D" language provided with DTrace is more restricted than the SystemTap
language, though it does have a few features - like the ability to print
stack traces - which appear to be missing in SystemTap. The D language has
no flow control or looping constructs. Instead, the code associated with a
probe has a predicate expression determining whether that code is executed
when the probe is hit. Thus each selected probe point can be thought of
as having a single, controlling "if" statement around it, with no
further flow control possible afterward.
SystemTap's language, instead, has conditionals, loops, and the ability to
define functions. It also has, for those who like to live dangerously, the
ability to embed C code. There are clear advantages to a more powerful
scripting language, but hazards as well: SystemTap must, for example, carry
extra code to keep infinite loops in scripts from bringing down the system.
D is, like Java, compiled to a special virtual machine and interpreted at
run time. SystemTap, instead, compiles directly to C. So SystemTap code
may execute more quickly, but D may benefit from the additional safety
checks which a virtual machine allows.
DTrace has the ability to work with user-space probes. As with the kernel,
developers are required to insert the probe points before DTrace can use
them; it is not clear that large amounts of user-space code have been so
instrumented. There is clear elegance to the idea, though, and this
capability may prove genuinely useful in the future as more applications
are equipped with probe points. SystemTap does not currently have this
In practice, simply getting SystemTap to work can be a challenge - even
when a distributor-supported package is available. SystemTap is clearly
its own development which must be (somewhat painfully) integrated with a
specific kernel. DTrace can be expected to simply work out of the box.
And that is perhaps the biggest difference between the two tracing
systems. SystemTap would appear to have all of the capabilities it really
needs to be a powerful system tracing tool - at least on the kernel side.
DTrace features which are missing - speculative
tracing, for example - could certainly be added if there were demand
for it. Evidently user-space tracing is in the works.
But what SystemTap really needs is more basic than that. What's missing is
the degree of maturity exhibited by DTrace.
SystemTap needs to simply work on most systems - and be usable by the
system administrators. To a great extent, the "simply work" part is
something that the distributors must address. Current SystemTap packages
as tested by your editor have the look of an edge-of-the-repository
afterthought. They do not have the dependencies to bring in the needed
kernel information, requiring a fair amount of manual "what does it need
now?" administrative work.
Even then, performance is spotty at best; the SystemTap utilities just do
not have access to the sort of information (uncompressed kernel images, for
example) that they need to operate correctly. Until an administrator can
simply tell the package management system to install SystemTap and expect
to have it work thereafter, it will be hard to convince anybody that we
have a mature tracing tool.
On the development side, there should be an extensive set of
well-documented trace points which can be used without having to go into
the kernel source. Digging deeply into the system in a flexible way is
always going to require a certain amount of skill, but SystemTap all but
requires its users to be kernel hackers. The hard work of making a tool
which can match - and, in places, exceed - DTrace has been done. What
remains is a large (but relatively straightforward) job: making this tool
usable by a much wider set of system administrators. Until that is done,
DTrace envy will remain with us.
Comments (54 posted)
"Get 'em while they're young" should be the motto of Red Hat High (RHH), a summer
camp program, funded by Red Hat, to introduce junior high school students
to free software tools. Now in its second year, RHH has a curriculum
designed to get students using creative tools to produce tangible works
during the week-long camp. In addition to teaching 50 eighth and ninth
grade students about free software, the project seeks to expand its reach,
not by increasing its enrollment, but by exporting the concept to other
The students all came from schools in the area around Red Hat's North Carolina
headquarters, and each had to be nominated by one of their teachers. RHH was
looking for participants "that show great creative potential and an
interest in technology, but perhaps lack the resources to pursue it outside
of school." In addition to the technology focus, the camp also provided
other social events in the evenings, all free of charge to the
campers. The camp was held at North Carolina State University, allowing
the students to experience dormitory life a few years early.
The students could choose amongst five different tracks, each focused on a
The curriculum for each track had a specific goal, "create a Google Gadget"
or "create ten seconds of animation" for example. During the program,
the students would learn the tool from scratch, then, singly or
collaboratively, use it to create something.
Two student projects are highlighted in a Red Hat Magazine article
about RHH 2007. One is three minute audio clip, the other a fifteen
second animation - both are quite impressive for 8th and 9th grade
students. The organizers failed to get permission from all of the students
to share their work, so these are the only examples available - something
they hope to fix for next year's camp. By all
accounts, RHH was a success, with the students and their parents as well
as the organizers. But, just as important, the course content for each of the
tracks will be made available to other
projects with similar goals.
Camp field trips included the Red Hat campus to "experience life in a
technology company" as well as a visit to a college level 3D animation
class, where the "free beer" part of free software really hit home.
Project coordinator Greg DeKoenigsberg
describes the scene:
When the kids reached the 3D Animation classroom, they were very impressed
by Maya — until one of them asked for a free copy. 'A full license
of Maya costs $7000,' the instructor said, which elicited an outraged
reaction from the kids. 'But Blender is free!' they cried in unison.
Then the teacher started to show them some of the things Maya could do, and
he was clearly surprised at the kids' clueful responses. 'These are
vertices,' he'd say, and then they'd say 'yeah, we've done that.' 'Okay,
this is texturing.' 'Yeah, we've done that too.'
In many ways, RHH is a testbed for free software outreach to young people.
In the two years of the program, the organizers have learned what works,
now they are ready to export that knowledge to others. The first step is
to focus on tutorials for the various tools, by creating new versions
specifically packaged into curricula that teachers can immediately use.
DeKoenigsberg, puts it this way:
A strong community of teachers and free software enthusiasts should be able
to develop, validate, and license simple lesson plans, with the explicit
goal of teaching kids to do stuff that is both cool and immediately
useful. It's my hope that Red Hat High can serve as a model for that
Once the curricula exist, training teachers to use it in their classrooms
is the next step. The main barrier is teachers' time, but the way around that is
through the professional development programs that many school districts
have. Because professional development courses are often tied to their
earnings, formal training sessions that fulfill those requirements, will be
quite attractive to teachers that have an interest in free software, but
lack the time. In many districts, funding is available for
these kinds of training programs as well.
The project is a worthy one, even if it never escapes beyond the
Raleigh-Durham area. Even 50 students at a time, getting the word out
about free software is a good thing. If the project's larger goals can be
realized – spreading this knowledge far and wide – it can make
a huge difference.
Getting young folks hooked on expensive proprietary
software may be good for the bottom line at Adobe or Microsoft, but it is
not so good for the wallets of schools and parents. Free software is able
to replace an awful lot of proprietary packages, with no licensing hassles,
so that students can run it anywhere they can find an open computer. That
message has not, yet, been widely heard, but RHH hopes to change that.
Comments (2 posted)
O'Reilly's annual OSCON in Portland, Ore., is perhaps the only major
conference in North America that spans the entire spectrum of open-source
communities. This makes it a great opportunity to learn from people who may
be encountering the same sorts of problems in a vastly different
environment. Other events such as FOSDEM or LCA already provide this kind
of environment, but
for those of us who are US-based, it's helpful to have one with a lower
travel budget. I highly recommend giving a talk if you're going so you get
in free, though, since registration costs hover around US$1000 and up. It's
clearly not a nonprofit conference.
Numerous groups met preceding the main part of the conference, one of them a
group of people involved with running a variety of free/open-source
projects. At the foundations
summit, most of the discussion centered around dealing with the issues
facing nonprofits, such as trademarks, fundraising and bookkeeping. But in
the same way as a full conference, the "hallway track" here was the most
useful. As the number of people grows, the discussion gets slower and
slower, but meeting the people involved with other foundations is
invaluable. The summit ended Tuesday, and next day, the exhibit hall and
regular sessions began.
In his session, Arjan van de Ven talked about efforts to reduce power use,
focusing on a few main problems to avoid in your code. The first, not
surprisingly, was polling. There is no excuse for polling, with the advent
of things like inotify. He said, "Frequent polling causes spattergroit."
His second enemy was timers. It costs power to keep moving your CPU in and
out of idle states, so you want to group timer events together rather than
having them randomly spread throughout time by a number of programs. On the
kernel side, you can use round_jiffies() or
round_jiffies_relative(), and in
userland, you can use glib's g_timeout_add_seconds() —
not g_timeout_add(). Some work is underway to add this
functionality to glibc as well. You don't want the entire Internet doing
this at the same time, however, so each computer must group its events at a
slightly different time.
Arjan's final enemy was disk I/O. Since disks have moving parts, they consume
a lot of power (at least until solid-state disks grow more
common). High-speed links such as SATA and SCSI also eat power when not in
power-saving mode. Gotchas here include opening files, even when in cache,
because of the access time update (use the O_NOATIME flag to open() when
possible), and looking for files or directories that don't exist (even when
using inotify, this always goes to disk).
A special case of this is media playback. The key is avoiding constant
spinups of DVDs as well as hard drives by using large buffers — Arjan
suggested 20 minutes of video or a minute of audio. Also, decode in large
batches so you can be idle longer.
Tools such as powertop and strace are key in tracking down the
culprits. Powertop can tell you where to look, and strace can tell you more
about what any programs are doing. Near the end, Arjan showed a graph of how
tuning and recent fixes dropped a Fedora 7 default installation from a
power consumption of 21W down to about 15.5W. That just a few fixes dropped
it by so much shows how broken things were, but we're now on the right
track. A good goal is to aim for 50 or less wakeups a second, because
getting below that level generally doesn't gain you much more.
A man with the job title "Disruptive Innovator" gave a talk with about 550
slides in 45 minutes. Rolf Skyberg of Ebay applied Maslow's hierarchy of
needs to technology to try to explain how users behave. The first level is
survival, the second is security, and the third is belonging. Computer
programs apparently haven't managed to get any higher up on the scale
yet. In terms of programs, survival means the program runs without
segfaults; security means the program is useful; and belonging means the
program is pretty. The more energy users spend finding the basics (help,
logging in, etc.), the less they have to spend doing something useful. But
one thing worth remembering is that people using a program may have higher needs
than you expected. For example, the iPod isn't just useful, it's pretty. And
people really care about that prettiness despite the lack of features like
an FM transmitter, a recorder, etc. that many other, less popular MP3
Luke Kanies talked about Puppet, a server automation tool he wrote in
Ruby. It's a replacement for earlier popular tools such as cfengine. He
really promoted the architecture, because any component in the entire system
can be replaced and reused separately. Puppet's made of three main layers:
server, networking and client. The server layer contains a compiler, a
file server, a certificate authority and a report handler. The networking is
XMLRPC over HTTPS. The client layer includes a resource abstraction layer,
transactions and a resource server. Each of these individual components can
be ripped out and replaced if you don't like it. You could change the
configuration language, use a different method of communication, or whatever
else your heart desires.
The resource abstraction layer contrasts the most with other tools such as
cfengine. It abstracts all the concepts like "install a package," "add a
user," "add a group" and so forth so you can run Puppet on any Linux or
other Unix-like OS and retain a simple configuration file without
OS-specific details. The layer supports about 10 different distributions and
other operating systems, and it's not difficult to add more.
Work is underway to create a library of Puppet config files (or recipes) to
reduce all the duplication, and that should greatly ease adoption of
Puppet. Puppet seems like a well-thought-out and extensible tool, so it will
be interesting to watch where it goes.
Clinton Nixon talked about dealing with legacy PHP code, but many of the
points are generally applicable to refactoring any code. His three primary
suggestions were to separate the controller and the view, even if you don't
have a solid MVC architecture; to call methods instead of including code
that runs from the include file; and to get rid of global variables.
His rules for view code were that control structures, printing, and
display-specific, unnested functions were allowed, but assignment and other
function calls were prohibited. He suggested beginning by drawing a line at
the top of the code and adding a comment that says "view code below here,"
then gradually migrating controller code above the line until you can move
it to a separate file. For loops, encapsulate the variables in an
object. Once you've gotten to this point, you may find duplicated views that
you can factor out.
Untangling a web of included files is a process of figuring out the inputs
and outputs, wrapping the entire file in a method, then refactoring. The
nice part about this style of refactoring is that the code always
works. There's never a point where you check in the code and it's broken.
Finally, he recommended two books: Working effectively with legacy code, by
Michael Feathers, and Refactoring by Martin Fowler. Although the Fowler
book is a classic, he recommended the newer book by Feathers because it's
At the close of the sessions Thursday, Dave Jones gave his now-infamous
"User Space Sucks" talk. Since most people have gotten the basic idea of
this talk, I'm only going to mention the new information. Dave re-ran his
tests a week ago on Fedora 7 to look at disk I/O during the
bootstrap process, and he
found that it had actually gotten even worse since FC6. Counts of stat(),
open() and exec() calls had either increased or stayed the same. But the
problem has grown harder, because the offenders no longer stand out in the
same way as the originals.
OSCON always provides some entertaining and educational talks, provided
you've got a way to get into them. But its free content isn't too shabby
either. The exhibit hall, all of the BOFs and parties (of which there are
many), and the accompanying OSCAMP (like FooCamp, BarCamp, etc.) and FOSCON
(mostly about Ruby) are all gratis. It stands nearly alone in the U.S. as a
conference that spans across all of the open-source world, although a niche
certainly exists for a lower-margin meeting like FOSDEM or LCA on this side
of the ocean.
Comments (35 posted)
Page editor: Jonathan Corbet
Next page: Security>>