LWN.net Logo

Leading items

On DTrace envy

By Jonathan Corbet
August 7, 2007
When Sun looks to highlight the strongest features of the Solaris operating system, DTrace always appears near the top of the list. Your editor recently had a conversation with an employee of a prominent analyst firm who was interested, above all else, in when Linux would have some sort of answer to DTrace. There is a common notion out there that tracing tools is one place where Linux is significantly behind the state of the art. So your editor decided to take a closer look.

The Linux tool which is most similar to DTrace is SystemTap. This development is supported by a number of high-profile companies, including Red Hat, Intel, IBM, and Hitachi. Most distributions have SystemTap packages somewhere in their systems of repositories, making it readily available to Linux users. DTrace supporters have been known to say that SystemTap is merely a knock-off of DTrace, and a badly-done one at that. SystemTap proponents will counter that it is an independent development which can hold its own.

Both tools are based on the insertion of probe points in the system kernel. Whenever a thread of execution hits one of those probe points, some sort of action - as described in the tool's C-like language - is run. That action can be as simple as printing a message, or it can be significantly more complicated than that.

DTrace comes with a large set of pre-defined probe points wired into the Solaris kernel - seemingly tens of thousands of them. These points are well documented and cover most of the kernel. Some simple wildcarding is implemented for the selection of multiple probe points. It is claimed that the run-time overhead of unused probe points is negligible. [Update: see the comments for some useful clarification on the use of dynamic probe points in DTrace.]

SystemTap, instead, does not depend on static probe points within the kernel; that capability exists, but nobody has much interest in maintaining all of those points. Instead, SystemTap uses dynamic probes (implemented with kprobes) which are inserted into the kernel at run time. A flexible language can enable probes to be easily inserted anywhere in the kernel, with fairly complete wildcard support which allows, for example, all functions within a source file or subsystem to be instrumented with a single statement. Unused probe points do not exist at all, and so cannot affect system performance.

There are a couple of advantages to the DTrace approach. The probe points exist and can be easily found in the manuals; a SystemTap user, instead, is required to have a certain amount of familiarity with the kernel source code. DTrace probe points are fixed at locations where it is known to be safe to interrupt the execution of the kernel. The SystemTap documentation, instead, comes with warnings that placing probes in the wrong places can cause system crashes and mutterings about the possibility of implementing blacklists in the future. The number of "wrong places" appears to be quite small, but that is of limited comfort for an administrator trying to observe the operation of a production system - something which is supposed to be possible with either system. There is a set of predefined points provided in the "tapsets" packaged with SystemTap, but it is small.

The "D" language provided with DTrace is more restricted than the SystemTap language, though it does have a few features - like the ability to print stack traces - which appear to be missing in SystemTap. The D language has no flow control or looping constructs. Instead, the code associated with a probe has a predicate expression determining whether that code is executed when the probe is hit. Thus each selected probe point can be thought of as having a single, controlling "if" statement around it, with no further flow control possible afterward.

SystemTap's language, instead, has conditionals, loops, and the ability to define functions. It also has, for those who like to live dangerously, the ability to embed C code. There are clear advantages to a more powerful scripting language, but hazards as well: SystemTap must, for example, carry extra code to keep infinite loops in scripts from bringing down the system.

D is, like Java, compiled to a special virtual machine and interpreted at run time. SystemTap, instead, compiles directly to C. So SystemTap code may execute more quickly, but D may benefit from the additional safety checks which a virtual machine allows.

DTrace has the ability to work with user-space probes. As with the kernel, developers are required to insert the probe points before DTrace can use them; it is not clear that large amounts of user-space code have been so instrumented. There is clear elegance to the idea, though, and this capability may prove genuinely useful in the future as more applications are equipped with probe points. SystemTap does not currently have this capability.

In practice, simply getting SystemTap to work can be a challenge - even when a distributor-supported package is available. SystemTap is clearly its own development which must be (somewhat painfully) integrated with a specific kernel. DTrace can be expected to simply work out of the box.

And that is perhaps the biggest difference between the two tracing systems. SystemTap would appear to have all of the capabilities it really needs to be a powerful system tracing tool - at least on the kernel side. DTrace features which are missing - speculative tracing, for example - could certainly be added if there were demand for it. Evidently user-space tracing is in the works. But what SystemTap really needs is more basic than that. What's missing is the degree of maturity exhibited by DTrace.

SystemTap needs to simply work on most systems - and be usable by the system administrators. To a great extent, the "simply work" part is something that the distributors must address. Current SystemTap packages as tested by your editor have the look of an edge-of-the-repository afterthought. They do not have the dependencies to bring in the needed kernel information, requiring a fair amount of manual "what does it need now?" administrative work. Even then, performance is spotty at best; the SystemTap utilities just do not have access to the sort of information (uncompressed kernel images, for example) that they need to operate correctly. Until an administrator can simply tell the package management system to install SystemTap and expect to have it work thereafter, it will be hard to convince anybody that we have a mature tracing tool.

On the development side, there should be an extensive set of well-documented trace points which can be used without having to go into the kernel source. Digging deeply into the system in a flexible way is always going to require a certain amount of skill, but SystemTap all but requires its users to be kernel hackers. The hard work of making a tool which can match - and, in places, exceed - DTrace has been done. What remains is a large (but relatively straightforward) job: making this tool usable by a much wider set of system administrators. Until that is done, DTrace envy will remain with us.

Comments (54 posted)

Red Hat High teaches free software

By Jake Edge
August 8, 2007

"Get 'em while they're young" should be the motto of Red Hat High (RHH), a summer camp program, funded by Red Hat, to introduce junior high school students to free software tools. Now in its second year, RHH has a curriculum designed to get students using creative tools to produce tangible works during the week-long camp. In addition to teaching 50 eighth and ninth grade students about free software, the project seeks to expand its reach, not by increasing its enrollment, but by exporting the concept to other venues.

The students all came from schools in the area around Red Hat's North Carolina headquarters, and each had to be nominated by one of their teachers. RHH was looking for participants "that show great creative potential and an interest in technology, but perhaps lack the resources to pursue it outside of school." In addition to the technology focus, the camp also provided other social events in the evenings, all free of charge to the campers. The camp was held at North Carolina State University, allowing the students to experience dormitory life a few years early.

The students could choose amongst five different tracks, each focused on a particular tool:

The curriculum for each track had a specific goal, "create a Google Gadget" or "create ten seconds of animation" for example. During the program, the students would learn the tool from scratch, then, singly or collaboratively, use it to create something.

Two student projects are highlighted in a Red Hat Magazine article about RHH 2007. One is three minute audio clip, the other a fifteen second animation - both are quite impressive for 8th and 9th grade students. The organizers failed to get permission from all of the students to share their work, so these are the only examples available - something they hope to fix for next year's camp. By all accounts, RHH was a success, with the students and their parents as well as the organizers. But, just as important, the course content for each of the tracks will be made available to other projects with similar goals.

Camp field trips included the Red Hat campus to "experience life in a technology company" as well as a visit to a college level 3D animation class, where the "free beer" part of free software really hit home. Project coordinator Greg DeKoenigsberg describes the scene:

When the kids reached the 3D Animation classroom, they were very impressed by Maya — until one of them asked for a free copy. 'A full license of Maya costs $7000,' the instructor said, which elicited an outraged reaction from the kids. 'But Blender is free!' they cried in unison.

Then the teacher started to show them some of the things Maya could do, and he was clearly surprised at the kids' clueful responses. 'These are vertices,' he'd say, and then they'd say 'yeah, we've done that.' 'Okay, this is texturing.' 'Yeah, we've done that too.'

In many ways, RHH is a testbed for free software outreach to young people. In the two years of the program, the organizers have learned what works, now they are ready to export that knowledge to others. The first step is to focus on tutorials for the various tools, by creating new versions specifically packaged into curricula that teachers can immediately use. DeKoenigsberg, puts it this way:

A strong community of teachers and free software enthusiasts should be able to develop, validate, and license simple lesson plans, with the explicit goal of teaching kids to do stuff that is both cool and immediately useful. It's my hope that Red Hat High can serve as a model for that development.

Once the curricula exist, training teachers to use it in their classrooms is the next step. The main barrier is teachers' time, but the way around that is through the professional development programs that many school districts have. Because professional development courses are often tied to their earnings, formal training sessions that fulfill those requirements, will be quite attractive to teachers that have an interest in free software, but lack the time. In many districts, funding is available for these kinds of training programs as well.

The project is a worthy one, even if it never escapes beyond the Raleigh-Durham area. Even 50 students at a time, getting the word out about free software is a good thing. If the project's larger goals can be realized – spreading this knowledge far and wide – it can make a huge difference.

Getting young folks hooked on expensive proprietary software may be good for the bottom line at Adobe or Microsoft, but it is not so good for the wallets of schools and parents. Free software is able to replace an awful lot of proprietary packages, with no licensing hassles, so that students can run it anywhere they can find an open computer. That message has not, yet, been widely heard, but RHH hopes to change that.

Comments (2 posted)

A report from OSCON 2007

August 2, 2007

This article was contributed by Donnie Berkholz

O'Reilly's annual OSCON in Portland, Ore., is perhaps the only major conference in North America that spans the entire spectrum of open-source communities. This makes it a great opportunity to learn from people who may be encountering the same sorts of problems in a vastly different environment. Other events such as FOSDEM or LCA already provide this kind of environment, but for those of us who are US-based, it's helpful to have one with a lower travel budget. I highly recommend giving a talk if you're going so you get in free, though, since registration costs hover around US$1000 and up. It's clearly not a nonprofit conference.

Numerous groups met preceding the main part of the conference, one of them a group of people involved with running a variety of free/open-source projects. At the foundations summit, most of the discussion centered around dealing with the issues facing nonprofits, such as trademarks, fundraising and bookkeeping. But in the same way as a full conference, the "hallway track" here was the most useful. As the number of people grows, the discussion gets slower and slower, but meeting the people involved with other foundations is invaluable. The summit ended Tuesday, and next day, the exhibit hall and regular sessions began.

In his session, Arjan van de Ven talked about efforts to reduce power use, focusing on a few main problems to avoid in your code. The first, not surprisingly, was polling. There is no excuse for polling, with the advent of things like inotify. He said, "Frequent polling causes spattergroit."

His second enemy was timers. It costs power to keep moving your CPU in and out of idle states, so you want to group timer events together rather than having them randomly spread throughout time by a number of programs. On the kernel side, you can use round_jiffies() or round_jiffies_relative(), and in userland, you can use glib's g_timeout_add_seconds()not g_timeout_add(). Some work is underway to add this functionality to glibc as well. You don't want the entire Internet doing this at the same time, however, so each computer must group its events at a slightly different time.

Arjan's final enemy was disk I/O. Since disks have moving parts, they consume a lot of power (at least until solid-state disks grow more common). High-speed links such as SATA and SCSI also eat power when not in power-saving mode. Gotchas here include opening files, even when in cache, because of the access time update (use the O_NOATIME flag to open() when possible), and looking for files or directories that don't exist (even when using inotify, this always goes to disk).

A special case of this is media playback. The key is avoiding constant spinups of DVDs as well as hard drives by using large buffers — Arjan suggested 20 minutes of video or a minute of audio. Also, decode in large batches so you can be idle longer.

Tools such as powertop and strace are key in tracking down the culprits. Powertop can tell you where to look, and strace can tell you more about what any programs are doing. Near the end, Arjan showed a graph of how tuning and recent fixes dropped a Fedora 7 default installation from a power consumption of 21W down to about 15.5W. That just a few fixes dropped it by so much shows how broken things were, but we're now on the right track. A good goal is to aim for 50 or less wakeups a second, because getting below that level generally doesn't gain you much more.

A man with the job title "Disruptive Innovator" gave a talk with about 550 slides in 45 minutes. Rolf Skyberg of Ebay applied Maslow's hierarchy of needs to technology to try to explain how users behave. The first level is survival, the second is security, and the third is belonging. Computer programs apparently haven't managed to get any higher up on the scale yet. In terms of programs, survival means the program runs without segfaults; security means the program is useful; and belonging means the program is pretty. The more energy users spend finding the basics (help, logging in, etc.), the less they have to spend doing something useful. But one thing worth remembering is that people using a program may have higher needs than you expected. For example, the iPod isn't just useful, it's pretty. And people really care about that prettiness despite the lack of features like an FM transmitter, a recorder, etc. that many other, less popular MP3 players have.

Luke Kanies talked about Puppet, a server automation tool he wrote in Ruby. It's a replacement for earlier popular tools such as cfengine. He really promoted the architecture, because any component in the entire system can be replaced and reused separately. Puppet's made of three main layers: server, networking and client. The server layer contains a compiler, a file server, a certificate authority and a report handler. The networking is XMLRPC over HTTPS. The client layer includes a resource abstraction layer, transactions and a resource server. Each of these individual components can be ripped out and replaced if you don't like it. You could change the configuration language, use a different method of communication, or whatever else your heart desires.

The resource abstraction layer contrasts the most with other tools such as cfengine. It abstracts all the concepts like "install a package," "add a user," "add a group" and so forth so you can run Puppet on any Linux or other Unix-like OS and retain a simple configuration file without OS-specific details. The layer supports about 10 different distributions and other operating systems, and it's not difficult to add more.

Work is underway to create a library of Puppet config files (or recipes) to reduce all the duplication, and that should greatly ease adoption of Puppet. Puppet seems like a well-thought-out and extensible tool, so it will be interesting to watch where it goes.

Clinton Nixon talked about dealing with legacy PHP code, but many of the points are generally applicable to refactoring any code. His three primary suggestions were to separate the controller and the view, even if you don't have a solid MVC architecture; to call methods instead of including code that runs from the include file; and to get rid of global variables.

His rules for view code were that control structures, printing, and display-specific, unnested functions were allowed, but assignment and other function calls were prohibited. He suggested beginning by drawing a line at the top of the code and adding a comment that says "view code below here," then gradually migrating controller code above the line until you can move it to a separate file. For loops, encapsulate the variables in an object. Once you've gotten to this point, you may find duplicated views that you can factor out.

Untangling a web of included files is a process of figuring out the inputs and outputs, wrapping the entire file in a method, then refactoring. The nice part about this style of refactoring is that the code always works. There's never a point where you check in the code and it's broken.

Finally, he recommended two books: Working effectively with legacy code, by Michael Feathers, and Refactoring by Martin Fowler. Although the Fowler book is a classic, he recommended the newer book by Feathers because it's more approachable.

At the close of the sessions Thursday, Dave Jones gave his now-infamous "User Space Sucks" talk. Since most people have gotten the basic idea of this talk, I'm only going to mention the new information. Dave re-ran his tests a week ago on Fedora 7 to look at disk I/O during the bootstrap process, and he found that it had actually gotten even worse since FC6. Counts of stat(), open() and exec() calls had either increased or stayed the same. But the problem has grown harder, because the offenders no longer stand out in the same way as the originals.

OSCON always provides some entertaining and educational talks, provided you've got a way to get into them. But its free content isn't too shabby either. The exhibit hall, all of the BOFs and parties (of which there are many), and the accompanying OSCAMP (like FooCamp, BarCamp, etc.) and FOSCON (mostly about Ruby) are all gratis. It stands nearly alone in the U.S. as a conference that spans across all of the open-source world, although a niche certainly exists for a lower-margin meeting like FOSDEM or LCA on this side of the ocean.

Comments (35 posted)

Page editor: Jonathan Corbet
Next page: Security>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds